EC2 Peace of Mind

November 6th, 2008

Elastic Block Store, EBS, is a useful extension of the AWS services offered by Amazon’s cloud platforms. EBS provides a way to add storage in an EC2 instance independent mode. In other words, storage doesn’t have to be tied to the instance storage, but can exist independently as an external volume. The big deal here is the persistence of your data, even if an instance happens to get killed for some reason. An additional backup security is the ability to make snapshots of an EBS volume and store on the S3 service in S3 buckets.

 

The cost is not a great burden:

EBS Volumes

· $0.15 per GB-month of provisioned storage

· $0.10 per 1 million I/O requests

 

 

Amazon EBS Snapshots to Amazon S3

· $0.15 per GB-month of data stored

· $0.01 per 1,000 PUT requests (when saving a snapshot)

· $0.01 per 10,000 GET requests (when loading a snapshot)

 

The basic approach for creating an EBS volume:

1. start an instance

2. create an EBS volume

3. attach the volume to the instance

4. partition and format the volume

5. add data and services to the instance and its attached volume

6. bundle and register the instance as an AMI stored in S3

7. create a snapshot of the EBS volume

 

After this is complete there is a peace of mind knowing that the instance can be reconstructed from backup services.

 

Restore follows this path if the EBS volume is intact:

1. start a new instance from the AMI bundled previously

2. just attach the volume to the new instance

3. repoint the DNS to this new instance server

 

Restore follows this path if the EBS volume is also trashed:

1. start a new instance from the AMI bundled previously

2. create a new volume from the S3 snapshot

3. attach this new volume to the new instance

4. repoint the DNS to this new instance server

 

I have an Open Source GIS stack loaded on a windows ec2 instance and decided it was time to make a conversion to the security of an EBS volume.

The AWS details are here:

http://docs.amazonwebservices.com/AWSEC2/latest/DeveloperGuide/

 

First make sure the latest api_tools are installed - EC2 API version 2008-08-08:

 

Choose the availability zone that matches the zone of the instance you wish to use.

C:\EC2>ec2-create-volume –size 50 –availability-zone us-east-1b

VOLUME vol-******** 50 us-east-1b creating 2008-11-04T15:38:20+0000

 

Once the volume is created it will be noted as “available.”

C:\EC2>ec2-describe-volumes vol-********

VOLUME vol-******** 50 us-east-1b available 2008-11-04T15:38:20+0000

 

Now the volume can be attached to the instance you had in mind.

C:\EC2>ec2-attach-volume vol-******** -i i-******** -d xvdf

ATTACHMENT vol-******** i-******** xvdf attaching 2008-11-04T15:41:48+0000

 

Once the volume is attached, it’s time to ‘remote desktop’ to the windows instance.

Open the Disk Management tool:

Start/administrative tools/computer management /storage Disk Management

 

You should then see the attached EBS volume and be able to add it to the instance with appropriate partition and format.

Partition info:

http://technet.microsoft.com/en-us/library/cc738081.aspx

Partition walk thru:

http://www.bleepingcomputer.com/tutorials/tutorial116.html

 

Once this is done you have an additional drive available referenced to the external EBS volume. In my example the E: drive.

 


Fig 1 - Example of Disk Manager on an EC2 windows instance showing an EBS volume

 

Once you have a useable EBS, how would you go about making it useful to the GIS stack?

In my stack I am using:

PostgreSQL/PostGIS

Tomcat

Geoserver

 

This means I would like to move all of the PostgreSQL data, tomcat webapps, and the geoserver data to the new EBS volume. Then it will be available for snapshot backup.

 

Postgresql data:

Changing Postgresql data to a new location involves a change to the registry. Stop postgresql service, then change registry ImagePath, move the C:\Program Files\PostgreSQL\8.3\data subdirectory to its new EBS location, E:\postgresql_data, and finally restart the service.

Run regedit:

 “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\pgsql-some version” and on the ImagePath change the –D option to point to a location on your new EBS volume. Here is the wicki entry with details:  Postgresql Wicki for changing PGDATA.

 

Tomcat webapps:

Open %TOMCAT_HOME%/config/server.xml with a text editor. There should be an entry similar to this:

<Host name=”localhost” appBase=”E://tomcat_webapp”

unpackWARs=”true” autoDeploy=”true”

xmlValidation=”false” xmlNamespaceAware=”false”>

Here you can see that I’ve changed the appBase to point to a subdirectory on my E: drive, the EBS volume. Copy the existing webapp subdirectory to this EBS subdirectory.

 

Geoserver data:

Go to the geoserver webapp’s WEB-INF/web.xml and make sure that the GEOSERVER_DATA_DIR points at a location on the EBS volume. Remember to make the change to the web.xml found in the tomcat webapp directory on the new EBS volume. Copy the geoserver data to its new EBS subdirectory.

<context-param>

<param-name>GEOSERVER_DATA_DIR</param-name>

<param-value>E:\geoserver_data</param-value>

</context-param>

 

Now the data for PostgreSQL/PostGIS, Apache Tomcat, and Geoserver will be accruing on an external volume, safe from sudden EC2 instance death. Of course now that EC2 is no longer beta and the SLA agreement is available this should be a rare occurrence.

 

Now to make things even safer lets run a snapshot:

C:\EC2>ec2-create-snapshot vol-********

SNAPSHOT snap-******** vol-******** pending 2008-11-04T22:14:30+0000

 

C:\EC2>ec2-describe-snapshots snap-********

SNAPSHOT snap-******** vol-******** completed 2008-11-04T22:14:30+0000

 

At this point a snapshot of my volume is stored to S3 where I can use it to create a new volume for use in another instance. I can use the snapshot if I’m creating multi instance clusters or if I need to restore my instance.

 

Of course it would also be wise to make an AMI bundle to reflect the changes made to the basic instance, directory pointers regedits etc. Here is the ami bundle guide for windows instances:  AMI Bundle for windows info

You will first need to prepare a bucket on S3 to receive the AMI bundle.  S3 Info

 

 

C:\EC2 >ec2-bundle-instance i-******** -b ec2-windows-bucket -p ec2-windows_image -o <Amazon EC2 Key ID> -w <private access Key>

BUNDLE bun-******** i-******** norm-ec2-windows ec2-windows_image 2008-11-05T15:28:15+0000 2008-11-05T15:28:15+0000 pending

C:\EC2 >ec2-describe-bundle-tasks

BUNDLE bun-******** i-******** ec2-windows ec2-windows_image 2008-11-05T15:28:15+0000 2008-11-05T16:07:08+0000 complete

C:\EC2 >ec2-register ec2-windows/ec2-windows_image.manifest.xml

IMAGE ami-********

 

Summary:

Amazon cloud is now out of beta and comes with independent storage volumes and snapshot capability useful for backup and scaling functions. GIS open source stacks can make use of these options without a huge effort.

New things in Amazon’s Cloud

October 24th, 2008

Amazon AWS made a big announcement yesterday regarding Windows on EC2:
big-day-for-ec2.htm

There are now a number of Windows 2003 server ami options: Amazon Machine Images

Why does any of this matter to GIS markets? GIS distribution has been revolutionized by a battle of the titans Google Map vs Virtual Earth. The popularity of mashups and the continuing spread of location into enterprise business workflow has moved GIS into a browser interface model. However, the backend GIS is still there on servers. Utility cloud computing makes that back end service more affordable to businesses of all sizes, small to large. Even fortune 500 enterprises can make use of auto-scaling load balancing features for ad hoc distribution of location either internally or public facing.

Here are the Amazon Windows AMI offerings:

Amazon Public Images - Windows SQL Server Express + IIS + ASP.NET on Windows Server 2003 R2 (64bit)

Amazon Public Images - Windows SQL Server Express + IIS + ASP.NET on Windows Server 2003 R2 Enterprise Authenticated (64bit)

Amazon Public Images - Windows SQL Server 2005 Standard on Windows Server 2003 R2 Enterprise Authenticated (64bit)

Amazon Public Images - Windows SQL Server Express + IIS + ASP.NET on Windows Server 2003 R2 (32bit)

Amazon Public Images - Windows Server 2003 R2 (32bit)

Amazon Public Images - Windows Server Enterprise 2003 R2 (32bit)

Amazon Public Images - Windows Server 2003 R2 (64bit)

Amazon Public Images - Windows Server 2003 R2 Enterprise (64bit)

Amazon Public Images - Windows SQL Server 2005 Standard on Windows Server 2003 R2 (64bit)

Pricing:

Standard Instances Linux/UNIX Windows
Small (Default) $0.10 per hr $0.125 per hr
Large $0.40 per hr $0.50 per hr
Extra Large $0.80 per hr $1.00 per hr
High CPU Instances Linux/UNIX Windows
Medium $0.20 per hr $0.30 per hr
Extra Large $0.80 per hr $1.20 per hr

Windows prices are only slightly higher than the Linux counterparts and cheaper than GoGrid’s. The small windows instance at Amazon EC2 is $0.125/hr ($3 per day) and includes:

Small Instance (Default) 1.7 GB of memory, 1 EC2 Compute Unit 1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform

A similar GoGrid instance with 2Gb RAM + 160Gb Storage will run 2x$0.19 = $0.38/hr in the "Pay as You Go Pricing", considerably more than the Amazon instance. GoGrid does offer the newer Windows Server 2008 and prepaid plans are less expensive at $0.16 to $0.24 per hr for a similar configuration. Also the slick user interface at GoGrid shows the utility of a visual monitor.

Speaking of user interface, in addition to all of the Windows AMIs there are announcements of future features for EC2:
new-features-for-amazon-ec2

"Management Console - The management console will simplify the process of configuring and operating your applications in the AWS cloud. You’ll be able to get a global picture of your cloud computing environment using a point-and-click web interface."

"Load Balancing - The load balancing service will allow you to balance incoming requests and traffic across multiple EC2 instances. "

"Automatic Scaling - The auto-scaling service will allow you to grow and shrink your usage of EC2 capacity on demand based on application requirements."

"Cloud Monitoring - The cloud monitoring service will provide real time, multi-dimensional monitoring of host resources across any number of EC2 instances, with the ability to aggregate operational metrics across instances, Availability Zones, and time slots."

These will make EC2 easier to use. The Load Balancing and Management Console have been part of GoGrid’s cloud service for awhile now. They do make life easier. Auto-Scaling will be a great help too. Prior to this scaling has been a more or less manual process at EC2. The Windows market is not as used to command line Bash shell scripting so the introduction of visual UI monitor and control makes sense for this new cloud market.

Here is the lowest cost Windows AMI that will be popular with developers:
ami-3934d050
Amazon Public Images - Windows SQL Server Express + IIS + ASP.NET on Windows Server 2003 R2 (32bit)

It includes the basics for ASP .NET 2.0 web apps on IIS 6.0 with SQL Server Express 2005. Of course once on the system it is easy to upgrade to the newer .NET 3.5 and install all of the GIS stack items required.

Here is the procedure I followed for getting my first Windows ami started:

First sign up and pick up a private/public key along with X509 Certificate at EC2.

Download the latest version of ec2-api-tools

Installation includes setting additional environmental variable as described in the ec2 Getting Started Guide.

EC2_HOME
EC2_CERT
EC2_PRIVATE_KEY
EC2_URL

Add to path variable %EC2_HOME%\bin;

After installation verify that the correct ec2-api-tools are installed:
    ec2ver 1.3-26369 2008-08-08

Now we can use the ec2-api-tools:
   ec2-run-instances ami-3934d050 -k gsg-keypair
   ec2-describe-instances <resulting instance id>

Once your instance is running make sure remote desktop service port is open, at least for the default group:
   ec2-authorize default -p 3389

Also you will need to get the randomly assigned administrator password from the new instance using the instance id returned from ec2-describe-instances and the keypair generated earlier:
   ec2-get-password <your instance> -k <full pathname of the gsg_keypair file>

Now it is possible to Remote Desktop to the url furnished by ec2-describe-instances:
   ec2-**-**-**-*.compute-1.amazonaws.com
   User: administrator
   Pass: *********


Fig 1 - Amazon EC2 Windows 2003 basic instance

Summary:
Amazon continues to expand its utility computing cloud. Virtual Windows OS has been a big hole out there, and Linux has grabbed a big lead in that market between Google web compute engines and Amazon EC2. Windows on EC2 opens Amazon utility computing to a much broader segment of the market and pushes deeper into the small business community. The economic turmoil of the times and consequent cost savings imperatives should make utility computing even more attractive to businesses large and small. It remains to be seen if Microsoft’s RedDog announcement at PDC will open a new competitive front in the utility computing world.

A Look at Google Earth API

October 22nd, 2008

Google’s earth viewer plugin is available for use in browsers on Windows and Vista, but not yet Linux or Mac. This latest addition to Google’s family of APIs (yet another beta) lets developers make use of the nifty 3D navigation found in the standalone Google Earth. However, as a plugin the developer now has more control using javascript with html or other server side frameworks such as asp .NET. http://code.google.com/apis/earth/
Here is the handy API reference:
http://code.google.com/apis/earth/documentation/reference/index.html



Fig 1 - Google Earth API embedded in an EPA database viewer with 3D buildings

The common pattern looks familiar to Map API users: get a key code for the Google served js library and add some initialization javascript along with a map <div> as seen in this minimalist version.

<html>
  <head>
  <script src=”http://www.google.com/jsapi?key=…keycode…”
  type=”text/javascript”>
  </script>

<script>

var ge = null;
  google.load(”earth”, “1″);

function pageLoad() {
  google.earth.createInstance(”map3d”, initCallback, failureCallback);
  }

function initCallback(object) {
  ge = object;
  ge.getWindow().setVisibility(true);
  var options = ge.getOptions();
  options.setStatusBarVisibility(true);
  ge.getNavigationControl().setVisibility(ge.VISIBILITY_SHOW);
  }

function failureCallback(object) {
  alert(’load failed’);
  }
  </script>
  <head>
  <body onload=”pageLoad()”>
  <div id=’map3d’></div>
  </body>
</html>
Listing 1 - minimalist Google Earth API in html

This is rather straightforward with the exception of a strange error when closing the browser:

Google explains:
“We’re working on a fix for that bug. You can temporarily disable script debugging in Internet Options –> Advanced Options to keep the messages from showing up.”

…. but the niftiness is in the ability to build your own pages around the api. Instead of using Google Earth links in the standalone version, we can customize a service tailored to very specific desires. For example this experimental Shp Viewer lets a user upload his shape file with a selected EPSG coordinate system and then view it over 3D Google Earth along with some additional useful layers like PLSS from USGS and DRG topo from TerraServer.


Fig 2 - a quarter quad shp file shown over Google Earth along with a USGS topo index layer

In the above example, clicking on a topo icon triggers a ‘click’ event listener, which then goes back to the server to fetch a DRG raster image from TerraServer via a java proxy servlet.

  .
  .
case ‘usgstile’:
  url = server + “/ShpView/usgstile.kml”;
  usgstilenlink = CreateNetLink(”TopoLayer”, url, false);
  ge.getGlobe().getFeatures().appendChild(usgstilenlink);

google.earth.addEventListener(usgstilenlink, “click”, function(event) {
  event.preventDefault();
  var topo = event.getTarget();
  var data = topo.getKml();
  var index = data.substr(data.indexOf(”<td>index</td>”) + 20, 7);
  var usgsurl = server + “/ShpView/servlet/GetDRG?name=” +
  topo.getName() + “&index=” + index;
  loadKml(usgsurl);
  });
  break;
  .
  .
  .
  function loadKml(kmlUrl) {
  google.earth.fetchKml(ge, kmlUrl, function(kmlObject) {
  if (kmlObject) {
  ge.getFeatures().appendChild(kmlObject);
  } else {
  alert(’Bad KML’);
  }
  });
  }
Listing 2 - addEventListener gives the developer some control of user interaction

In addition to the ‘addEventListener’ the above listing shows how easily a developer can use ‘getKML’ to grab additional attributes out of a topo pologon. Google’s ‘fetchKML’ simplifies life when grabbing kml produced by the server. My first pass on this used XMLHttpRequest directly like this:

  var req;

function loadXMLDoc(url) {
  if (window.XMLHttpRequest) {
  req = new XMLHttpRequest();
  req.onreadystatechange = processReqChange;
  req.open(”GET”, url, true);
  req.send(null);
  // branch for IE6/Windows ActiveX version
  } else if (window.ActiveXObject) {
  req = new ActiveXObject(”Microsoft.XMLHTTP”);
  if (req) {
  req.onreadystatechange = processReqChange;
  req.open(”GET”, url, true);
  req.send();
  }
  }
 }

function processReqChange() {
  if (req.readyState == 4) {
  if (req.status == 200) {
  var topo = ge.parseKml(req.responseText);
  ge.getFeatures().appendChild(topo);
  } else {
  alert(”Problem retrieving the XML data:\n” + req.statusText);
  }
  }
  }
Listing 3 - XMLHttpRequest alternative to ‘fetchKml’

However, I ran into security violations due to running the tomcat Java servlet proxy out of a different port from my web application in IIS. Rather than rewrite the GetDRG servlet in C# as an asmx, I discovered the Google API ‘fetchKML’, which is more succinct anyway, and nicely bypassed the security issue.


Fig 3 - Google Earth api with a TerraServer DRG loaded by clicking on USGS topo polygon

And using the Google 3D navigation


Fig 4 - Google Earth api with a TerraServer DRG using the nifty 3D navigation

Of course in this case 3D is relatively meaningless. Michigan is absurdly flat. A better example of the niftiness of 3D requires moving out west like this view of Havasupai Point draped over the Grand Canyon, or looking at building skp files in major metro areas as in the EPA viewer.


Fig 5 - Google Earth api with a TerraServer DRG Havasupai Point in more meaningful 3D view

This Shp Viewer experiment makes use of a couple other interesting technologies. The .shp file set (.shp, .dbf, .shx) is uploaded to the server where it is then loaded into PostGIS and added to the geoserver catalog. Once in the catalog the features are available to kml reflector for pretty much automatic use in Google Earth api. As a nice side effect the features can be exported to all of geoserver’s output formats pdf, svg, RSS, OpenLayers, png, geotiff, but minus the inimitable background terrain of Google Earth.

ogr2ogr is useful for loading the furnished .shp files into PostGIS where it is accessible to Geoserver and the also very useful kml_reflector. PostGIS + Geoserver + Google Earth API is a very powerful stack for viewing map data. There is also a nice trick for adding Geoserver featureTypes to the catalog programmatically. I used the Java version after writing out the featureType info.xml to geoserver’s data directory. Eventually, work on the Restful configuration version of Geoserver may make this kind of catalog reloading easier.

Summary:
Wow I like 3D and I especially like 3D I have control over. Google Earth API is easy to add into an html page and event listeners give developers some control over features added to the 3D scene. Coupled with the usual suspects in the GIS stack, PostgreSQL /PostGIS and Geoserver, it is relatively easy to make very useful applications. I understand though that real control freaks will want to look at WWjava.

If anyone would like to see a prototype UI of their data using this approach please email

A Noob in Oracle Land

October 3rd, 2008

The announcement that Oracle is supporting AMIs in the Amazon cloud came as a surprise to me. I had heard that there was a teaser version of Oracle out there for developers, but had not expected Oracle to jump on the cloud side, especially after Larry Ellison’s recent diatribe against cloud computing.

   “It’s complete gibberish. It’s insane. When is this idiocy going to stop?”

Just curious about this oracle of gibberish, I went on a tour of Oracle Land, the Kingdom of Ellison. This is no small undertaking for an enterprise as ambitious as Oracle. There are endless products and sub-products. The base of the pyramid is the database server, but after buying 50 or more companies in the last year or so, the borders of the empire extend way beyond RDBMS.

The venerable RDBMS has come a long way since IBMs E.F. Codd introduced the concept back in the 70s. I vaguely remember Oracle breaking into the PC world shortly after Turbo Pascal. There was a single DB product for the DOS IBM PC, and documentation consisted of a couple of grayish paperback manuals. Shortly after this, late 80s, a small vendor introduced GeoSQL to hook AutoCAD to the GIS world through Oracle. This was my first introduction to the potential of spatial databases and Oracle. The empire of Ellison has grown since then, and now documentation would fill a library as well as Ellisons bank account.

As an aside, we live in an interesting age at the dusk of the great technology innovators. The infamous industrialists of the previous era now exist only as shadowy figures in history texts, but the business innovators of technology are still walking among us, Larry Ellison, Bill Gates, Steve Jobs. The multi-billion personal fortunes are just now entering the charitable fund phase where our grand children will know their names in some impersonally institutional mode such as the Gates Foundation.

First stop in Oracle Land was a download of the free, as in free beer, teaser version, OracleXE.

  • Total data stored in XE is limited to 4GB
  • XE is limited to 1GB of RAM
  • XE is limited to 1 processor

Since my entire interest in Oracle is the spatial side, my next stop was Justin Lokitz’s helpful article on integration with Geoserver. Leading to this:


Fig 1 -http://localhost:80/geoserver/wms?service=WMS&request=GetMap&format=
image/png&width=800&height=600&srs=EPSG:4326&layers=topp:COUNTIES
&styles=countypopdensity&bbox=-177.1,13.71,-61.48,76.63

Fig 2 - http://localhost:80/geoserver/wms/kml_reflect?layers=COUNTIES

Not a bad start. The Geoserver layer abstracts away the spatial guts of OracleXE. However, curiosity leads on. I found that OracleXE has some spatial components labelled ‘Locator’ as opposed to ‘Spatial’. Though only a subset of the extensive enterprise spatial version, geometry queries are possible. It took me a bit to find my way around.

Interestingly the open source world is generally more helpful in this respect. Although extensive, the forums of commercial software vendors are less friendly. For instance Paul Ramsey of Refraction fame is regularly present on the PostGIS forums, and Frank Warmerdam is always available to give a helping hand at the immensely useful www.gdal.org. But I doubt that I will ever run across a Larry Ellison post on the OracleXE forum. Many posts to commercial forums appear to languish unanswered, which is seldom the case in the OpenSource project forums I monitor.

It is worth noting that gdal’s ogr2ogr can be built with Oracle support on systems with Oracle Client libraries installed.

Oracle’s SDO_Geometry is present in a useful form letting users run geographic join queries like this:

   select c.COUNTY, c.STATE_ABRV, c.TOTPOP, c.POPPSQMI from states s, counties c where s.state = ‘California’ and sdo_anyinteract (c.geom, s.geom) = ‘TRUE’;

My next step was to look at SDO_Geometry in JDBC. Unfortunately Oracle’s JGeometry spatial library is not available for OracleXE, but the LGPL open source JTS library provides helpful OraReader and OraWriter classes. These encapsulate the SDO_GEOMETRY Struct translation to/from jts.geom.Geometry, where the rest of the JTS api can be applied.

logger.info(rsmd.getColumnName(i)+": "+rsmd.getColumnType(i));
st = (oracle.sql.STRUCT) rs.getObject(1);
//convert STRUCT into JGeometry not available in OracleXE
//JGeometry j_geom = JGeometry.load(st);

//JTS to the rescue
OraReader reader = new OraReader();
Geometry geom = reader.read(st);
Coordinate[] coords = geom.getCoordinates();
		.
		.

Next stop, Amazon AWS EC2. Here is a list of the public Oracle AMIs offered::
   Oracle Database 11g Release 1 Enterprise Edition - 64 Bit
   Oracle Database 11g Release 1 Enterprise Edition - 32 Bit
   Oracle Database 11g Release 1 Standard Edition/Standard Edition One - 32 Bit
   Oracle Database 10g Release 2 Express Edition - 32 Bit

The last in the list, OracleXE edition, is the one to experiment with, unless you have a spare Oracle license floating around.

Time to try it:
  C:\>ec2-run-instances ami-7acb2f13 -k gsg-keypair
  C:\>ec2-describe-instances i-??????

and login:

Use of this machine requires acceptance of
the following license agreements.
 1. Oracle Enterprise Linux
    http://edelivery.oracle.com/EPD/LinuxLicense/get_form?ARU_LANG=US
 2. Oracle Technology Developer License Terms
    http://www.oracle.com/technology/software/popup-license/standard-license.html
 Please enter the above URLs into your browser and review them.
To accept the agreements, enter 'y', otherwise enter 'n'.
Do you accept? [y/n]: y
Thank you.

You may now use this machine.
Welcome to Oracle Database on EC2!
This is the first time this EC2 instance has been started.

Please set the oracle operating system password.
	.
	.
Please specify the passwords for the following database administrative accounts:

SYS (Database Administrative Account) Password:
	.
	.

Now for the link to Apex on the new OracleXE instance:
  http://ec2-??-??-???-??.compute-1.amazonaws.com:8080/apex


Fig 3 - Oracle Apex running from an EC2 OracleXE instance

Looks like we have it.

Summary:
Oracle is the Big Daddy of spatial GIS. It is also the “Mother of all DBA complexity.” Running a spatial app with oracle in the background is not trivial, but it is getting easier. The EC2 OracleXE AMI makes starting an Oracle server instance a matter of minutes. Although lacking some of the capability of its free and open source competition, OracleXE can be useful for the garden variety web enabled spatial app. For the developer with lots of experience in Oracle, OracleXE provides a low cost entry onto the performance/price escalator.

Next on the agenda is adding SDO_GEOMETRY data along with some kind of real spatial rendering, which means in my case getting a tomcat server running with Geoserver on the same OracleXE instance. Alternatively it might be worth a try at installing the OracleXE .rpm on an AMI with a GIS stack already available. And, it will be useful to recompile ogr with oracle db support.

Of course the real mix and match challenge will be OracleXE on an EC2 (real soon now) Windows instance with Java, Tomcat, Geoserver serving a Google Map control coupled to Google Earth, OpenLayers, VirtualEarth. But really EC2 Windows will probably come preconfigured with the new MS SQL Server 2008 and all the promised geospatial goodies including Linq potential.

After just a short trip into the Ellison Empire, I must admit I still like the no frills PostgreSQL/PostGIS better.

AWS to offer Windows + SQL Server

October 1st, 2008


versus

Amazon AWS team just sent out the announcement on a Windows OS offering for later this fall. This confirms rumors floating around on the AWS Roadmap, and will be a significant boost to the EC2 cloud computing. More here … http://aws.typepad.com/aws/2008/10/coming-soon-ama.html

GoGrid has been offering Windows + SQL Server virtual systems for awhile now. It will be interesting to see price comparisons. I imagine that like GoGrid the AWS Windows will cost more because of the MS license issue. The advantage of GoGrid has also been ease of use, hardware balancing, and nice monitoring tools. On the Amazon side is persistent storage S3 and EBS along with SQS. I’m looking forward to trying it out.

Cloud computing is growing. It is important as a platform for GIS. OGC WPS, WMS, WCS, WMS are making a mark on mapping SAAS and cloud platforms fit this model very well.

Spatial Analytics in the Cloud

September 19th, 2008


Fig 1 - risk polygon for hurricane Ike - the Timony Group

Peter Batty has a couple interesting blogs on Netezza and their recent spatial analytics release. Basically Netezza has developed a parallel system, hardware/software, for processing large spatial queries with scaling improvements of 1 to 2 orders of magnitude over Oracle Spatial. This is apparently accomplished by pushing some of the basic filtering and projection out to the hardware disk reader as well as more commonly used parallel Map Reduce techniques. Ref Google’s famous white paper: http://labs.google.com/papers/mapreduce-osdi04.pdf

One comment struck me as Rich Zimmerman mentioned that use of their system eliminated indexing and tuning essentially using the efficiency of brute force parallel processing. There is no doubt that their process is highly effective and successful given the number of client buy ins as well as Larry Ellison’s attention. I suppose, though, that an Oracle buy out is generally considered the gold standard of competitive pain when Oracle is on the field.

In Peter’s interview with Rich Zimmerman they discuss a simple scenario in which a large number of point records (in the billion range) are joined with a polygon set and processed with a spatial ‘point in polygon’ query. This is the type of analytics that would be common in real estate insurance risk analytics and is typically quite time consuming. Evidently Netezza is able to perform these types of analytics in near real time, which is quite useful in terms of evolving risk situations such as wildfire, hurricane, earthquake, flooding etc. In these scenarios domain components are dynamically changing polygons of risk, such as projected wind speed, against a relatively static point set.

Netezza performance improvement factors over Oracle Spatial were in the 100 to 500 range with Netezza SPU arrays being anywhere from 50 to 1000. My guess would be that the performance curve would be roughly linear. The interview suggested an amazing 500x improvement over Oracle Spatial with an unknown number of SPUs. It would be interesting to see a performance versus SPU array size curve.

I of course have no details on the Netezza hardware enhancements, but I have been fascinated with the large scale clustering potential of cloud computing, the poor man’s supercomputer. In the Amazon AWS model, node arrays are full power virtual systems with consequent generality, as opposed to the more specific SPUs of the Netezza systems. However, cloud communications has to have a much larger latency compared to an engineered multi SPU array. On the other hand, would the $0.1/hr instance cost compare favorably to a custom hardware array? I don’t know, but a cloud based solution would be more flexible and scale up or down as needed. For certain, cost would be well below even a single cpu Oracle Spatial license.

Looking at the sited example problem, we are faced with a very large static point set and a smaller dynamically changing polygon set. The problem is that assigning polygons of risk to each point requires an enormous number of ‘point in polygon’ calculations.

In thinking about the type of analytics discussed in Peter’s blog the question arises, how could similar spatial analytics be addressed in the AWS space? The following speculative discussion looks at the idea of architecting an AWS solution to the class of spatial analysis problems mentioned in the Netezza interview.

The obvious place to look is AWS Hadoop

Since Hadoop was originally written by the Apache Lucene developers as a way to improve text based search, it does not directly address spatial analytics. Hadoop handles the overhead of scheduling, automatic parallelization, and job/status tracking. The Map Reduce algorithm is provided by the developer as essentially two Java classes:

  Map - public static class MapClass extends MapReduceBase implements Mapper{ … }
  Reduce - public static class ReduceClass extends MapReduceBase implements Reducer { …. }

In theory, with some effort, the appropriate Java Map and Reduce classes could be developed specific to this problem domain, but is there another approach, possibly simpler?

My first thought, like Netezza’s, was to leverage the computational efficiency of PostGIS over an array of EC2 instances. This means dividing the full point set into smaller subsets, feeding these subset computations to their own EC2 instance and then aggregating the results. In my mind this involves at minimum:
 1. a feeder EC2 instance to send sub-tiles
 2. an array of EC2 computational instances
 3. a final aggregator EC2 instance to provide a result.

Points
One approach to this example problem is to treat the very large point set as an array tiled in the tiff image manner with a regular rectangular grid pattern. Grid tiling only needs to be done once or as part of the insert/update operation. The assumptions here are:
 a. the point set is very large
 b. the point set is relatively static
 c. distribution is roughly homogenous

If c is not the case, grid tiling would still work, but with a quad tree tiling pattern that subdivides dense tiles into smaller spatial extents. Applying the familiar string addressing made popular by Google Map and then Virtual Earth with its 0-3 quadrature is a simple approach to tiling the point table.

Fig 2 - tile subdivision

Recursively appending a char from 0 to 3 for each level provides a cell identifier string that can be applied to each tile. For example ‘002301′ identifies a tile cell NW/NW/SW/SE/NW/NE. So the first step, analogous to spatial indexing, would be a pass through the point table calculating tile signatures for each point. This is a time consuming preprocess, basically iterating over the entire table and assigning each point to a tile. An initial density guess can be made to some tile depth. Then if the point tiles are not homogenous (very likely), tiles with much higher counts are subdivided recursively until a target density is reached.

Creating an additional tile geometry table during the tile signature calculations is a convenience for processing polygons later on. Fortunately the assumption that the point table is relatively static means that this process occurs rarely.

The tile identifier is just a string signature that can be indexed to pull predetermined tile subsets. Once completed there is a point tile set available for parallel processing with a simple query.
 SELECT point.wkb_geom, point.id
  FROM point
  WHERE point.tile = tilesignature;

Note that tile size can be manipulated easily by changing the WHERE clause slightly to reduce the length of the tile signature. In effect this combines 4 tiles into a single parent tile (’00230*’ = ‘002300′ +’002301′ + ‘002302′ + ‘002303′ )
 SELECT point.wkb_geom, point.id
  FROM point
  WHERE
   (substring(tilesignature from 0 for( length(tilesignature)-1))||’*') LIKE point.tile;

Assuming the polygon geometry set is small enough, the process is simply feeding sub-tile point sets into ‘point in polygon’ replicated queries such as this PostGIS query:
 SELECT point.id
  FROM point, polygon
  WHERE
    point.wkb_geom && polygon.wkb_geom
   AND intersects(polygon.wkb_geom, point.wkb_geom);

This is where the AWS cloud computing could become useful. Identical CPU systems can be spawned using a preconfigured EC2 image with Java and PostGIS installed. A single feeder instance contains the complete point table with tile signatures as an indexed PostGIS table. A Java feeder class then iterates through the set of all tiles resulting from this query:
 ···SELECT DISTINCT point.tile FROM points ORDER BY point.tile

Using a DISTINCT query eliminates empty tiles as opposed to simply iterating over the entire tile universe. Again a relatively static point set indicates a static tile set. So this query only occurs in the initial setup. Alternatively a select on the tile table where the wkb_geom is not null would produce the same result probably more efficiently.

Each point set resulting from the query below is then sent to its own AWS EC2 computation instance.
 foreach tilesignature in DISTINCT point.tile
 {
  SELECT point.wkb_geom, point.id
  FROM points
  WHERE point.tile = tilesignature;
  }

Polygons

The polygon set also has assumptions:
 a. the polygon set is dynamically changing
 b. the polygon set is relatively small

Selecting a subset of polygons to match a sub-tile of points is pretty efficient using the tile table created earlier:
 SELECT polygon.wkb_geom
   FROM tile INNER JOIN polygon ON (polygon.tile = tile.id);
  WHERE tile.wkb_geom && polygon.wkb_geom;

Now the feeder instance can send a subset of points along with a matching subset of polygons to a computation EC2 instance.

Connecting EC2 instances

However, at this point I run into a problem! I was simply glossing over the “send” part of this exercise. The problem in most parallel algorithms is the communication latency between processors. In an ideal world shared memory would make this extremely efficient, but EC2 arrays are not connected this way. The cloud is not so efficient.

AWS services do include additional tools, Simple Storage Service, S3 , Simple Queue Service, SQS , and SimpleDB. S3 is a type of shared storage. SQS is a type of asynchronous message passing, while SimpleDB provides a cloud based DB capability on structured data.

S3 is appealing because writing collections of polygon and point sets should be fairly efficient, one S3 object per tile unit. At the other end, computation instances would read from the S3 input bucket and write results back to a result output bucket. The aggregator instance can then read from the output result bucket.

However, implicit in this S3 arrangement is a great deal of schedule messaging. SQS is an asynchronous messaging system provided for this type of problem. Since messages are being sent anyway, why even use S3? SQS messages are limited to 8k of text so they are not sufficient for large object communications. Besides point sets may not even change from one cycle to the next. The best approach is to copy each tile point set to an S3 Object, and separate S3 objects for polygon tile sets. Then add an SQS message to the queue. The computation instances read from the SQS message queue and load the identified S3 objects for processing. Note that point tile sets will only need to be written to S3 once at the initial pass. Subsequent cycles will only be updating the polygon tile sets. Hadoop would handle all of this in a robust manner taking into account failed systems and lost messages so it may be worth a serious examination.

SimpleDB is not especially useful in this scenario, because the feeder instance’s PostGIS is much more efficient at organizing tile objects. As long as the point and polygon tables will fit in a single instance it is better to rely on that instance to chunk the tiles and write them to S3, then alerting computational instances via SQS.

Once an SQS message is read by the target computation instance how exactly should we arrange the computation? Although tempting, use of PostGIS again brings up some problems. The point and polygon object sets would need to be added to two tables, indexed, and then queried with “point in polygon.” This does not sound efficient at all! A better approach might be to read S3 objects with their point and polygon geometry sets through a custom Java class based on the JTS Topology Suite

Our preprocess has already optimized the two sets using a bounds intersect based on a tile structure so plain iteration of all points over all polygons in a single tile should be fairly efficient. If the supplied chunk is too large for the brute force approach, a more sophisticated JTS extension class could index by polygon bbox first and then process with the Intersect function. This would only help if the granularity of the message sets was large. Caching tile point sets on the computational instances could also save some S3 reads reducing the computation setup to a smaller polygon tile set read.

This means that there is a bit of experimental tuning involved. A too fine grained tile chews up time in the messaging and S3 reads, while a coarse grained tile takes more time in the Intersect computation.

Finally each computation instance stores its result set to an S3 result object consisting of a collection of point.id and any associated polygon.ids that intersect the point. Sending an SQS mesage to the aggregator alerts it to availability of result updates. At the other end is an aggregator, which takes the S3 result objects and pushes them into an association table of point.id, polygon.id, or pip table. The aggregator instance can be a duplicate of the original feeder instance with its complete PostGIS DB already populated with the static point table and the required relation table (initially empty).

If this AWS system can be built and process in reasonable time an additional enhancement suggests itself. Assuming that risk polygons are being generated by other sources such as the National Hurricane Center, it would be nice to update the polygon table on an ongoing basis. Adding a polling class to check for new polygons and update our PostGIS table, would allow the polygons to be updated in near real time. Each time a pass through the point set is complete it could be repeated automatically reflecting any polygon changes. Continuous cycling through the complete tile set incrementally updates the full set of points.

At the other end, our aggregator instance would be continuously updating the point.id, polygon.id relation table one sub-tile at a time as the SQS result messages arrive. The decoupling afforded by SQS is a powerful incentive to use this asynchronous message communication. The effect is like a slippy map interface with subtiles continuously updating in the background, automatically registering risk polygon changes. Since risk polygons are time dependent it would also be interesting to keep timestamped histories of the polygons, providing for historical progressions by adding a time filter to our tile polygon select. The number of EC2 computation instances determine speed of these update cycles up to the latency limit of SQS and S3 read/writes.

Visualization of the results might be an interesting exercise in its own right. Continuous visualization could be attained by making use of the aggregator relation table to assign some value to each tile. For example in pseudo query code:
foreach tile in tile table {
···SELECT AVG(polygon.attribute)
  FROM point, pip, polygon WHERE pip.pointid = point.id AND polygon.id = pip.polygonid)
   AND point.tile = tilesignature;
 }

Treating each tile as a pixel lets the aggregator create polygon.value heat maps assigning a color and/or alpha transparency to each png image pixel. Unfortunately this would generally be a coarse image but it could be a useful kml GroundOverlay at wide zooms in a Google Map Control. These images can be readily changed by substituting different polygon.attribute values.

If Google Earth is the target visualization client using a Geoserver on the aggregator instance would allow a kml reflector to kick in at lower zoom levels to show point level detail as <NetworkLink> overlays based on polygon.attributes associated with each point. GE is a nice client since it will handle refreshing the point collection after each zoom or pan, as long as the view is within the assigned Level of Detail. Geoserver kml reflector essentially provides all this for almost free once the point featureType layer is added. Multiple risk polygon layers can also be added through Geoserver for client views with minimal effort.

COSTS

This is pure speculation on my part since I have not had time or money to really play with message driven AWS clusters. However, as an architecture it has merit. Adjustments in the tile granularity essentially adjust the performance up to the limit of SQS latency. Using cheap standard CPU instances would work for the computational array. However, there will be additional compute loads on the feeder and aggregator, especially if the aggregator does double duty as a web service. Fortunately AWS provides scaling classes of virtual hardware as well. Making use of a Feeder instance based on medium CPU adds little to system cost:
$0.20 - High-CPU Medium Instance
1.7 GB of memory, 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each), 350 GB of instance storage, 32-bit platform
(note: a High CPU Extra Large instance could provide enough memory for an in memory point table - PostrgeSQL memory Tuning)

The aggregator end might benefit from a high cpu instance:
$0.80 - High-CPU Extra Large Instance
7 GB of memory, 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform

A minimal system might be $0.20 feeder => five $0.10 computation instance => $0.80 aggregator = $1.10/hr plus whatever data transfer costs accrue. Keeping the system in a single zone to reduce SQS latency would be a good idea and in zone data costs are free.

Note that 5 computation instances are unlikely to provide sufficient performance. However, a nice feature of AWS cloud space is the adjustability of the configuration. If 5 is insufficient add more. If the point set is reduced drop off some units. If the polygon set increases substantially divide off the polygon tiling to its own high CPU instance. If your service suddenly gets slashdotted perhaps a load balanced webservice farm could be arranged? The commitment is just what you need and can be adjusted within a few minutes or hours not days or weeks.

Summary

Again this is a speculative discussion to keep my notes available for future reference. I believe that this type of parallelism would work for the class of spatial analytics problems discussed. It is particularly appealing to web visualization with continuous updating. The cost is not especially high, but then unknown pitfalls may await. Estimating four weeks of development and $1.50/hr EC2 costs leads to $7000 - $8000 for proof of concept development with an ongoing operational cost of about $1500/mo for a small array of 10 computational units. The class of problems involving very large point sets against polygons should be fairly common in insurance risk analysis, emergency management, and Telco/Utility customer base systems. Cloud arrays can never match the 500x performance improvement of Netezza, but cost should place it in the low end of the cost/performance spectrum. Maybe 5min cycles rather than 5sec are good enough. Look out Larry!

Chrome Problems

September 5th, 2008

Fig 1 IE with Silverlight ve:Map component

With the introduction of Chrome, Google has thrown down the gauntlet to challenge IE and Firefox. Out of curiosity I thought it would be interesting to download the current Chrome Beta and see what it could do with some of the interfaces I’ve worked on. Someone had recently quipped, “isn’t all of Google Beta?” I guess the same could be said of Amazon AWS, but then again in the “apples to apples” vein, I decided to compare IE8 Beta and Chrome Beta. The above screen shot shows an example of the new Silverlight ve:Map component in an ASP Ajax running on II6. The browser is IE8 beta in Vista, and surprise, not, it all works as expected.


Fig 2 Chrome with Silverlight ve:Map component

Also not surprisingly, the same Silverlight ve:Map component in an ASP Ajax site fares poorly in Chrome. In fact the component appears not at all, while curiously the menu asp:MenuItems act oddly. Instead of the expected drop down I get a refresh to a new horizontal row?


Fig 3 IE with Silverlight ve:Map component

Moving on to a Google Map Component embedded in the same ASP page, IE8 beta displays the map component including the newer G_SATELLITE_3D_MAP map type. ASP drop down menu and tooltips all work.


Fig 4 Chrome with Silverlight ve:Map component

Since this is a Google Map Component I would be disappointed if it did not work in Chrome, and it does. Except, I noticed the G_SATELLITE_3D_MAP control type is missing? I guess Chrome Beta has not caught up with Google Map Beta. Again the ASP Menu is not functional.


Fig 5 IE Google Map Control with Earth Mode - G_SATELLITE_3D_MAP

Back to IE to test the 3D Earth mode of my Google Map Component.As seen above it all works fine.


Fig 6 IE Silverlight Deep Earth

Now to check the new Silverlight DeepEarth component in IE. DeepEarth is a nice little MultiScaleTile source library for smoothly spinning around the VE tile engines. It works as amazingly smooth as ever.


Fig 7 Google Chrome Deep Earth

However, in Chrome, no luck, just a big white area. I suppose that Silverlight was not a high priority with Chrome.


Fig 8 IE SVG hurricane West Atlantic weather clip

Switching to some older SVG interfaces, I took a look at the Hurricane clips in the West Atlantic. It looks pretty good, Hanna is deteriorating to a storm and Ike is still out east of the Bahamas.


Fig 9 Chrome SVG hurricane West Atlantic weather clip

On Chrome it is not so nice. The static menu side of the svg frames shows up but the image and animation stack is just gray. Clicking on menu items verifies that events are not working. Of course this SVG is functional only in the Adobe SVG viewer, but evidently Chrome has some svg problems.


Fig 10 IE ASP .NET 3.5

Moving back to IE8, I browsed through a recent ASP .NET 3.5 site I built for an Energy monitoring service. This is a fairly complete demonstration of ListView and Linq SQL and it of course works in IE8 beta.


Fig 11 Chrome ASP .NET 3.5

Surprisingly, Chrome does a great job on the ASP .NET 3.5. Almost all the features work as expected with the exception of the same old Menu problems.


Fig 12 IE SVG OWS interface

Finally I went back down memory lane for an older OWS interface built with the SVG, using the Adobe Viewer variety. There are some glitches in IE8 beta. Although I can still see WMS and WFS layers and zoom around a bit , some annoying errors do pop up here and there. Adobe SVG viewer is actually orphaned, ever since Adobe picked up Macromedia and Flash, so it will doubtless receed into the distant past as the new browser generations arrives. Unfortunately, there is little Microsoft activity in SVG, in spite of competition from the other browsers, Safari, Firefox, and Opera. It will likely remain a 2nd class citizen in IE terms as SIlverlight’s intent is to replace Flash, which itself is a proprietary competitor to SVG.


Fig 13 Chrome SVG OWS interface

Chrome and Adobe SVG are not great friends. Rumor has it that Chrome intends to fully support SVG, so if I ever get around to it, I could rewrite these interfaces for Firefox, Opera, Chrome 2.0.

Summary:
Chrome is beta and brand new. Although it has a lot of nice features and a quick clean tabbed interface, I don’t see anything but problems for map interfaces. Hopefully the Google Map problems will be ironed out shortly. There is even hope for SVG at some later date. I imagine even Silverlight will be supported grudgingly since I doubt that Google has the clout to dictate useage on the internet.

TatukGIS - Generic ESRI with a Bit Extra

September 3rd, 2008

Fig1 basic TatukGIS Internet Server view element and legend/layer element

TatukGIS is a commercial product that is basically a generic brand for building GIS interfaces including web interfaces. It is developed in Gdynia Poland:


The core product is a Developer Kernel, DK, which provides basic building blocks for GIS applications in a variety of Microsoft flavors including:

  • DK-ActiveX - An ActiveX® (OCX) control supporting Visual Basic, VB.NET, C#, Visual C++
  • DK.NET - A manageable .NET WinForms component supporting C# and VB.NET
  • DK-CF - A manageable .NET Compact Framework 2.0/3.5 component - Pocket PC 2002 and 2003, Windows Mobile 5 and 6, Windows CE.NET 4.2, Windows CE 5 and 6
  • DK-VCL - A native Borland®/CodeGear® Delphi™/C++ Builder™

These core components have been leveraged for some additional products to make life a good deal easier for web and PDA developers. A TatukGIS Internet Server single server deployment license starts at $590 for the Lite Edition or $2000 per deployment server for the full edition in a web environment. I guess this is a good deal compared to ESRI/Oracle licenses, but not especially appealing to the open source integrators among us. There is support for the whole gamut of CAD, GIS, and raster formats as well as project file support for ESRI and MapInfo. This is a very complete toolkit.

The TatukGIS Internet Server license supports database access to all the usual DBs: "MSSQL Server, MySQL, Interbase, DB2, Oracle, Firebird, Advantage, PostgreSQL… " However, support for spatial formats are currently only available for Oracle Spatial/Locator and ArcSDE. Support for PostGIS and MS SQL Server spatial extensions are slated for release with TatukGIS IS 9.0.

I wanted to experiment a bit with the Internet Server, so I downloaded a trial version(free)..

Documentation was somewhat sparse, but this was a trial download. I found the most help looking in the sample subdirectories. Unfortunately these were all VB and it took a bit of experimental playing to translate into C#. The DK trial download did include a pdf document that was also somewhat helpful. Perhaps a real development license and/or server deployment license would provide better C# .NET documentation. I gather the historical precedence of VB is still evident in the current doc files.

The ESRI influence is obvious. From layer control to project serialization, it seems to follow the ESRI look and feel. This can be a plus or a minus. Although very familiar to a large audience of users, I am afraid the ESRI influence is not aesthetically pleasing or very smooth. I was able to improve over the typically clunky ArcIMS type zoom and wait interface by switching to the included Flash wrapper (simply a matter of setting Flash="true").

The ubiquitous flash plugin lets the user experience a somewhat slippy map interface familiar to users of Virtual Earth and Google Maps. We are still not talking a DeepZoom or Google Earth type interface, but a very functional viewer for a private data source. I was very pleased to find how easy it was to build the required functionality including vector and .sid overlays with layer/legend manipulation.

This is a very simple to use toolkit. If you have had any experience with Google Map API or Virtual Earth it is quite similar. Once a view element is added to your aspx the basic map interface is added server side:

<ttkGIS:XGIS_ViewerIS id="GIS" onclick=”GIS_Click" runat="server" OnPaint="GIS_Paint" Width="800px" Height="600px" OnLoad="GIS_Load" BorderColor="Black" BorderWidth="1px" ImageType="PNG24" Flash="True"></ttkGIS:XGIS_ViewerIS>

The balance of the functionality is a matter of adding event code to the XGIS_ViewerIS element. For example :

    protected void GIS_Load(object sender, EventArgs e)
    {
       GIS.Open( Page.MapPath( "data/lasanimas1.ttkgp" ) );
       GIS.SetParameters("btnFullExtent.Pos", "(10,10)");
       GIS.SetParameters("btnZoom.Pos", "(40,10)");
       GIS.SetParameters("btnZoomEx.Pos", "(70,10)");
       GIS.SetParameters("btnDrag.Pos", "(100,10)");
       GIS.SetParameters("btnSelect.Pos", "(130,10)");

       addresslayer = (XGIS_LayerVector)GIS.API.Get("addpoints19");
    }

The ttkgp project support allows addition of a full legend/layer menu with a single element, an amazing time saver:

<ttkGIS:XGIS_LegendIS id="Legend" runat="server" Width="150px" Height="600px" ImageType="PNG24" BackColor="LightYellow" OnLoad="Legend_Load" AllowMove="True" BorderWidth="1px"></ttkGIS:XGIS_LegendIS>

The result is a simple functional project viewer available over the internet, complete with zoom, pan, and layer manipulation. The real power of the TatukGIS is in the multitude of functions that can be used to extend these basics. I added a simple address finder and PDF print function, but there are numerous functions for routing, buffering, geocoding, projection, geometry relations etc. I was barely able to scratch the surface with my experiments.


Fig2 - TatukGIS Internet Server browser view with .sid imagery and vector overlays

The Bit Extra:
As a bit of a plus the resulting aspx is quite responsive. Because the library is not built with the MS MFC it has a performance advantage over the ESRI products it replaces. The TatukGIS website claims include the following:

"DK runs some operations run even 5 - 50 times faster than the leading GIS development products"

I wasn’t able to verify this, but I was pleased with the responsiveness of the interface, especially in light of the ease of development. I believe clients with proprietary data layers who need a quick website would be very willing to license the TatukGIS Internet Server. Even though an open source stack such as PostGIS, Geoserver, OpenLayers could do many of the same things, the additional cost of development would pretty much offset the TatukGIS license cost.

The one very noticeable restriction is that development is a Windows only affair. You will need an ASP IIS server to make use of the TatukGIS for development and deployment. Of course clients can use any of the popular browsers from any of the common OS platforms. Cloud clusters in Amazon’s AWS will not support TatukGIS IS very easily, but now that GoGrid offers Virtual Windows servers there are options.


Fig3 - TatukGIS Internet Server browser view with DRG imagery and vector overlays

Fig4 - TatukGIS Internet Server browser result from a find address function

Summary: TatukGIS Internet Server is a good toolkit for custom development, especially for clients with ESRI resources. The license is quite reasonable.

Virtual Earth - DeepEarth - Deep Pockets

August 24th, 2008

Fig 1 Browser Virtual Earth Map control with KML overlay in 3D

Microsoft has released a new set of controls with Silverlight 2.0Beta, including the Virtual Earth Map control, Microsoft.Live.ServerControls.VE. This makes it even easier for cross browser map interface solutions. Silverlight like Flash requires a client side code download, which is available for IE, FireFox, Opera, and Safari. This control introduces VE Map to Microsoft’s version of declarative xml vector graphics in the browser, building on the older work of SVG. Having a VE Map control available cross browser makes VE more competitive with Google Map API. Having a Map control with C# event coding rather than javascript makes life for developers more interesting.

The release of SQL Server 2008 and its spatial extensions makes this control even better from a developers perspective. I am looking forward to seeing how to use Linq Sql to tie together spatial Sql Server datasets and VE map controls.

In order to use this control you must register both a Silverlight and a VE Assembly:

<%@ Register Assembly="System.Web.Silverlight"
Namespace="System.Web.UI.SilverlightControls"
TagPrefix="asp" %>

<%@ Register assembly="Microsoft.Live.ServerControls.VE"
namespace="Microsoft.Live.ServerControls.VE"
tagprefix="ve" %>

Once these are added to your aspx page the control element itself looks like this:

<ve:Map ID="Map1" runat="server"
Height="100%" Width="100%"
ZoomLevel="5" Center-Latitude="38" Center-Longitude="-105"
MapStyle="Shaded" MiniMap="True"
MiniMapYoffset="150" MiniMapXoffset="10"
ShowFindControl="False" DashboardSize="Normal" />

This Map1 element has a host of properties and methods as this subset snapshot shows:


Fig 2 Sampling of <ve:Map> properties in the Visual Studio 2008 properties frame

Code behind cs can include any of the Map1 behaviors:

using System;
using System.Collections;
using System.Configuration;
using System.Data;
using System.Linq;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.HtmlControls;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Xml.Linq;
using Microsoft.Live.ServerControls.VE;

public partial class display_displayVE : System.Web.UI.Page
{
    protected void Page_Load(object sender, EventArgs e)····
    {
        if (!Page.IsPostBack)········
           {···········
	Map1.Clear();······
	ShapeLayer shplayer = new ShapeLayer();··
	Map1.AddShapeLayer(shplayer);·········

	ShapeSourceSpecification shapeSpec =
		new ShapeSourceSpecification(DataType.ImportXML,
		"http://localhost/brat/kml/Track.kml", shplayer);············

	Map1.ImportShapeLayerData(shapeSpec, "", true);········
         }····
     }
}

Listing 1 sample C# use of Map1 control in code behind

All of the previous VE Javascript SDK Javascript SDK should now have their equivalent in C#. I can use C# to interact with the VE map object including 3D mode, Traffic overlays, Routing, Geocoding, and Location Find, which means a quick easy way to add full featured maps to any business app. This map object is part of the Silverlight 2.0Beta.

Another interesting part of 2.0Beta is the MultiScaleImage or DeepZoom. I played with this earlier in a simplistic fashion but there is a nice feature I didn’t notice, MultiScaleTileSource. This lets a user add an alternative tile source which has lots of applications. Here is a pretty one by Mike Ormund, coding the Mandelbrot Set as a MultiScaleImage with a custom MultiScaleTileSource:


Fig 3 Mandelbrot Set with DeepZoom MultiScaleTileSource

Here is another blog with some details on using the Silverlight MultiScaleImage with MultiScaleTileSource: http://mtaulty.com/CommunityServer/blogs/mike_taultys_blog/archive/2008/06/25/10536.aspx

The logical extension of this from a GIS perspective is DeepEarth. DeepEarth is a CodePlex project that applies the MultiScaleTileSource to a Silverlight Map interface. The current version supports the Microsoft Virtual Earth Engine which is interesting in its own right, but can be modified to use other tile source such as Google Map, or any WCS source of your choice. DeepEarth is a fascinating interface. The mousewheel spin zoom and pan is addictive with all the fade in/out animation. This project also has a beautiful little navigation tool, seen in the upper left corner. Clicking on the navigation tool does a little spin sprocket animation to open and close the tool set. However, the real fun is just wheel spin zooming.


Fig 4 Deep Earth Silverlight with MulticaleImage tile sourced to the VE engine. A static image does not do it justice. Deep Earth really invites playing!

Because this directly accesses the VE tile engine it of course will not be legitimate without changes to the Microsoft license restrictions. Here is a quick summary of the commercial VE license:

"Standard version license is $8000 for 1,000,000/year transactions = 8,000,000/year tile renders. Note: no routing in standard
Advanced version license is $15,000 for 1,500,000/year transactions = 12,000,000/year tiles. Includes routing capability"

8,000,000 annual tile renders sounds like a lot of tiles, but playing with the DeepEarth interface and watching tiles roam in and out of view made me wonder. If you put up a public site, for example, what would be an average transaction number per view? I tested a zoom from world down to my house rooftop. Checking against the cached tiles C:\Users\user\AppData\Local\Microsoft\Windows\Temporary Internet Files\Low\Content.IE5 gave 540 tile images. This is just a little bitty zoom though, and I can see an average user stacking up even ten times this amount, but for sake of argument, let’s assume a mere 1000 tile rendering count per site view. Gulp, this leads to only about 8000 user views per year or 22 per day before the overage rates kick in:

"An overage rate (generally $0.01 per transaction) is listed for exceeding the preset number of transactions for use during the term."
Recalling there are 8 render tiles per transaction 0.01/8*1000 = $1.25 per viewer use of the interface!

Even if the Microsoft license were to allow direct VE access, I wonder what will be the chances of a commercial version of this type of interface. It also makes me wonder what kinds of tile rendering counts are generated on the Virtual Earth control? Checking the tile cache after using ve:Map reveals 320 tiles. So roughly 2/3 the count but still $0.80 per view. Seems to be a very useful metric to have before committing to a public website. If my rough calculations are correct then ve:Map could lead to DeepPockets more than DeepEarth.

An alternative to the VE Map control is the now venerable Google Map API recently updated with the Earth mode which gives the browser a bit of the Google Earth 3D view. The license is different:

"Google license: $10,000 annual license fee + page views over 2,000,000/annual"

I guess I would like some clarification on the ‘render transactions’ license of Virtual Earth versus the Google Map license of ‘page views’, but the Google license seems to compare rather favorably based on page views rather than tile renders. I checked the tile cache for a similar navigation using the Google GMap2 object and get about 600 tiles. However, for GMap2 objects this is irrelevant, since there is just a single page view regardless of the navigation tiles downloaded. In this light GMap2 looks really good at $0.005 per user view versus ve:Map at $0.50-$1.00 per user view. This is a 100:1 cost ratio so VE MAP team might need to revisit their pricing. I hope Microsoft can be more competitive eventually, since ve:Map is a nice SIlverlight control. It would also be nice to have some real use metrics for a reality check.


Fig 5 Google Map api control with the new Earth view

Summary:
DeepEarth is a beautiful interface and shows the power of the MultiScaleTileSource, but even if licensing allowed direct access to the VE tile engine or Googles tile engine, it appears cost would make it prohibitive. However, MultiScaleTileSource can be tied to much less ambitious tile engines (free). I am thinking about the WCS ImageMosaic Plugin of the GeoServer project. I imagine adequate performance would require prebuilding a GeoWebCache tile cache. In this scenario the webservice MultiScaleImage can tie into proprietary WCS imagery and still provide the beautiful interactivity of DeepEarth on a more limited scale. Not a bad interface for 6" GDS aerial imagery.

A quick look at GoGrid

August 14th, 2008

Fig 1 a sample ASP .NET 3.5 website running on a GoGrid server instance

GoGrid is a cloud service similar to AWS.( http://www.gogrid.com ) Just like Amazon’s AWS EC2, the user starts a virtual server instance from a template and then uses the instance like a dedicated server. The cost is similar to AWS, starting at about $0.10 per hour for a minimal server. The main difference from a user perspective is the addition of Windows servers and an easy to use control panel. The GoGrid control panel provides point and click setup of server clusters with even a hardware load balancer .

The main attraction for me is the availability of virtual Windows Servers. There are several Windows 2003 configuration templates as well as sets of RedHat or CentOS Linux templates:
· Windows 2003 Server (32 bit)/ IIS
· Windows 2003 Server (32 bit)/ IIS/ASP.NET/SQL Server 2005 Express Edition
· Windows 2003 Server (32 bit)/ SQL Server 2005 Express Edition
· Windows 2003 Server (32 bit)/ SQL Server 2005 Workgroup Edition
· Windows 2003 Server (32 bit)/ SQL Server 2005 Standard Edition

The number of templates is more limited than EC2 and I did not see a way to create custom templates. However, this limitation is offset by ease of management. For my experiment I chose the Windows 2003 Server (32 bit)/ IIS/ASP.NET/SQL Server 2005Express Edition. This offered the basics I needed to serve a temporary ASP web application.

After signing up, I entered my GoGrid control panel. Here I can add a service by selecting from the option list.


Fig 2- GoGrid Control Panel

Filling out a form with the basic RAM, OS, and Image lets me add a WebbApp server to my panel. I could additionally add several WebAPP servers and configure a LoadBalancer along with a Backend Database server by similarly filling out Control Panel forms.This appears to take the AWS EC2 service a step further by letting typical scaling workflows be part of the front end GUI. Although scaling in this manner can be done in AWS it requires installation of a software Load Balancer on one of the EC2 instances and a manual setup process.


Fig 3 - example of a GoGrid WebAPP configuration form

Once my experimental server came on line I was able to RemoteDesktop into the server and begin configuring my WebAPP. I first installedthe Microsoft .NET 3.5 framework so I could make use of some of its new features. I then copied up a sample web application showing the use of a GoogleMap Earth mode control in a simple ASP interface. This is a display interface which is connected to a different database server for displaying GMTI results out of a PostGIS table.

Since I did not want to point a domain at this experimental server, I simply assigned the GoGrid IP to my IIS website. I ran into a slight problem here because the sample webapp was created using .NET 3.5System.Web.Extensions. The webapp was not able to recognize the extension configurations in my WebConfig file. I tried copying the System.Web.Extensions.dlls into my webapp bin file. However, I was still getting errors. I then downloaded the ASP Ajax control and installed it on the GoGrid server but still was unable to get the website to display. Finally I went back to Visual Studio and remade the webapp using the ASP.NET Web App template without the extensions. I was then able to upload to my GoGrid server and configure IIS to see my website as the default http service.

There was still one more problem. I could see the website from the local GoGrid system but not from outside. After contacting GoGrid support I was quickly in operation with a pointer to the Windows Firewall which GoGrid Support kindly fixed for me. The problem was that theWindows 2003 template I chose does not open port 80 by default. I needed to use the Firewall manager to open port 80 for the http service. For those wanting to use ftp the same would be required for port 21.

I now had my experimental system up and running. I had chosen a 1Gb memory server so my actual cost on the server is $0.19/hour which is a little less for your money than the AWS EC2:

$0.10Small Instance (Default)
1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform

But again, running ASP .NET 3.5 is much more complex on EC2, requiring a Mono installation on a Linux base. I have not yet tried that combination and somehow doubt that it would work with a complex ASP .NET 3.5 website, especially with Ajax controls.

The GoogleMap Control with the Earth mode was also interesting. I had not yet embedded this into an ASP website. It proved to be fairly simple. I just needed to add a <asp:ScriptManager ID=”mapscriptmanager” runat=”server”/> to my site Master page and then the internal page javascript used to create the GoogleMap Control worked as normal.

I had some doubts about accessing the GMTI points from the webapp since often there are restrictions using cross domain xmlhttpRequests. There was no problem. My GMTI to KML servlet produces kml mime type "application/vnd.google-earth.kml+xml" which is picked up in the client javascript using the Google API:·
geoXml = new GGeoXml(url);

Evidently cross domain restrictions did not apply in this case, which made me happy, since I didn’t have to write a proxy servlet just to access the gmti points on a different server.

In Summary GoGrid is a great cloud service which finally opens the cloud to Microsoft shops where Linux is not an option. The GUI control panel is easy to use and configuring a fully scalable load balanced cluster can be done right from the control panel. GoGrid fills a big hole in the cloud computing world.