
Fig 1 – SimpleDB GeoRSS alternate language names.
SimpleDB alternate language names
One of the areas of interest using SimpleDB is the ability to add multiple attribute values. Here is the overview from Amazon’s Service Highlights.
“Flexible – With Amazon SimpleDB, it is not necessary to pre-define all of the data formats you will need to store; simply add new attributes to your Amazon SimpleDB data set when needed, and the system will automatically index your data accordingly. The ability to store structured data without first defining a schema provides developers with greater flexibility when building applications, and eliminates the need to re-factor an entire database as those applications evolve.”
As an extension to my previous blog post I decided to try adding alternative language names:
French – Albuquerque, Nouveau-Mexique
Portuguese – Albuquerque Novo México
Japanese ニューメキシコ州アルバカーキ
Chinese traditional 美國新墨西哥州阿爾伯克基
Arabic البوكيرك (نيو مكسيكو
Russian Альбукерке, Нью-Мексико
Using Amazon’s sdb library again, allows adding additional attributes to an individual item:
AmazonSimpleDB service = new AmazonSimpleDBClient(accessKeyId, secretAccessKey);
try {
HashMap hm = new HashMap();
List<ReplaceableAttribute> attributeListGeoName = new ArrayList<ReplaceableAttribute>(1);
attributeListGeoName.add(new ReplaceableAttribute(attrName, attrValue, false));
PutAttributesRequest request = new PutAttributesRequest(domainName, itemID, attributeListGeoName);
invokePutAttributes(service, request);
} catch (Exception e) {e.printStackTrace();}
After adding several languages for ‘Albuquerque, New Mexico,’ I am able to display them as GeoRSS and then as tooltip text in the Virtual Earth API viewer.
I had to add an explicit character encoding to my http response like this:
response.setCharacterEncoding(“UTF-8″);
Once that was done I could reliably get UTF-8 character strings in the GeoRSS xml returned to the viewer.
I am not multilingual, not even bilingual really, so where to go for alternate language translations? I had read about an interesting project over at Google: Language Tools
Here I could simply run a translate on my geoname for whatever languages are offered by Google. I cannot vouch for their accuracy, but I understand that Google has developed a statistically based language translation algorithm that can beat many if not all rule based algorithms. It was developed by applying statistical pattern processing to very large sets of “Rosetta stone” type documents, that had been previously translated. Because it is not rule based it avoids some of the early auto translation pit falls such as translating “hydraulic ram” as a “male water sheep.”
SimpleDB, with its free unstructured approach to adding attributes, let’s me add any number of additional alternateNames attributes in whatever language UTF-8 character set I wish.
Although this works nicely for point features, more complex spatial features are unsuited to SimpleDB. The limit of 256 attribute per item and 1024 byte per attribute precludes arbitrary length polyline or polygon geometry. Perhaps Amazon SimpleDB 2.0 will let attributes be arbitrary length, which means polyline and polygon geometries could be added along with a bbox for intersect queries.
Still it is an interesting approach for storing and viewing point data.