Azure and GeoWebCache tile pyramids


Azure Blob storage tile pyramid
Fig 1 – Azure Blob Storage tile pyramid for citylimits

Azure Overview

Shared resources continue to grow as essential building blocks of modern life, key to connecting communities and businesses of all types and sizes. As a result a product like SharePoint is a very hot item in the enterprise world. You can possibly view Azure as a very big, very public, SharePoint platform that is still being constructed. Microsoft and 3rd party services will eventually populate the service bus of this Cloud version with lots and lots of service hooks. In the meantime, even early stage Azure with Web Hosting, Blob storage, and Azure SQL Server makes for some interesting experimental R&D.

Azure is similar to Amazon’s AWS cloud services, and Azure’s pricing follows Amazon’s lead with the familiar “pay as you go, buy what you use” model. Azure offers web services, storage, and queues, but instead of giving access to an actual virtual instance, Azure provides services maintained in the Microsoft Cloud infrastructure. Blob storage, Azure SQL Server, and IIS allow developers to host web applications and data in the Azure Cloud, but only with the provided services. The virtual machine is entirely hidden inside Microsoft’s Cloud.

The folks at Microsoft are probably well aware that most development scenarios have some basic web application and storage component, but don’t really need all the capabilities, and headaches, offered by controlling their own server. In return for giving up some freedom you get the security of automatic replication, scalability, and maintenance along with the API tools to connect into the services. In essence this is a Microsoft only Cloud since no other services can be installed. Unfortunately, as a GIS developer this makes Azure a bit less useful. After all, Microsoft doesn’t yet offer GIS APIs, OGC compliant service platforms, or translation tools. On the other hand, high availability with automatic replication and scalability for little effort are nice features for lots of GIS scenarios.

The current Azure CTP lets developers experiment for free with these minor restrictions:

  • Total compute usage: 2000 VM hours
  • Cloud storage capacity: 50GB
  • Total storage bandwidth: 20GB/day


To keep things simple, since this is my first introduction to Azure, I looked at just using Blob Storage to host a tile pyramid. The Silverlight MapControl CTP makes it very easy to add tile sources as layers so my project is simply to create a tile pyramid and store this in Azure Blob storage where I can access it from a Silverlight MapControl.

In order to create a tile pyramid, I also decided to dig into the GeoWebCache standalone beta 1.2. This is beta and offers some new undocumented features. It also is my first attempt at using geowebcache as standalone. Generally I just use the version conveniently built into Geoserver. However, since I was only building a tile pyramid rather than serving it, the standalone version made more sense. Geowebcache also provides caching for public WMS services. In cases where a useful WMS is available, but not very efficient, it would be nice to cache tiles for at least subsets useful to my applications.

Azure Blob Storage

Azure CTP has three main components:

  1. Windows Azure – includes the storage services for blobs, queues, and cloud tables as well as hosting web applications
  2. SQL Azure – SQL Server in the Cloud
  3. .NET Services – Service Bus, Access Control Service, Work Flow …

There are lots of walk throughs for getting started in Azure. It all boils down to getting the credentials to use the service.

Once a CTP project is available the next step is to create a “Storage Account” which will be used to store the tile pyramid directory. From your account page you can also create a “Hosted Service” within your Windows Azure project. This is where web applications are deployed. If you want to use “SQL Azure” you must request a second SQL Azure token and create a SQL Service. The .NET Service doesn’t require a token for a subscription as long as you have a Windows Live account.

After creating a Windows Azure storage account you will get three endpoints and a couple of keys.

Endpoints:
http://sampleaccount.blob.core.windows.net/

http://sampleaccount.queue.core.windows.net/

http://sampleaccount.table.core.windows.net/

Primary Access Key: ************************************
Secondary Access Key: *********************************

Now we can start using our brand new Azure storage account. But to make life much simpler first download the following:

Azure SDK includes some sample code . . . HelloWorld, HelloFabric, etc to get started using the Rest interface. I reviewed some of the samples and started down the path of creating the necessary Rest calls for recursively loading a tile pyramid from my local system into an Azure blob storage nomenclature. I was just getting started when I happened to take a look at the CloudDrive sample. This saved me a lot of time and trouble.

CloudDrive lets you treat the Azure service as a drive inside PowerShell. The venerable MSDOS cd, dir, mkdir, copy, del etc commands are all ready to go. Wince, I know, I know, MSDOS? I’m sure, if not now, then soon there will be dozens of tools to do the same thing with nice drag and drop UIs. But this works and I’m old enough to actually remember DOS commands.

First, using the elevated Windows Azure SDK command prompt you can compile and run the CloudDrive with a couple of commands:

C:\AzureTools\samples\CloudDrive\buildme.cmd
C:\AzureTools\samples\CloudDrive\runme.cmd

Now open Windows PowerShell and execute the MounteDrive.ps1 script. This allows you to treat the local Azure service as a drive mount and start copying files into storage blobs.


Azure sample CloudDrive PowerShell
Fig 1 – Azure sample CloudDrive PowerShell

Creating a connection to the real production Azure service simply means making a copy of MountDrive.ps1 and changing credentials and endpoint to the ones obtained previously.

function MountDrive {
Param (
 $Account = "sampleaccount",
 $Key = "***************************************",
 $ServiceUrl="http://sampleaccount.blob.core.windows.net/",
 $DriveName="Blob",
 $ProviderName="BlobDrive")

# Power Shell Snapin setup
 add-pssnapin CloudDriveSnapin -ErrorAction SilentlyContinue

# Create the credentials
 $password = ConvertTo-SecureString -AsPlainText -Force $Key
 $cred = New-Object -TypeName Management.Automation.PSCredential -ArgumentList $Account, $password

# Mount storage service as a drive
 new-psdrive -psprovider $ProviderName -root $ServiceUrl -name $DriveName -cred $cred -scope global
}

MountDrive -ServiceUrl "http://sampleaccount.blob.core.windows.net/" -DriveName "Blob" -ProviderName "BlobDrive"

The new-item command lets you create a new container with -Public flag ensuring that files will be accessible publicly. Then the Blog: drive copy-cd command will copy files and subdirectories from the local file system to the Azure Blob storage. For example:

PS Blob:\> new-item imagecontainer -Public
Parent: CloudDriveSnapin\BlobDrive::http:\\127.0.0.1:10000\devstoreaccount1

Type Size LastWriteTimeUtc Name
---- ---- ---------------- ----
Container 10/16/2009 9:02:22 PM imagecontainer

PS Blob:\> dir

Parent: CloudDriveSnapin\BlobDrive::http:\\127.0.0.1:10000\

Type Size LastWriteTimeUtc Name
---- ---- ---------------- ----
Container 10/16/2009 9:02:22 PM imagecontainer
Container 10/8/2009 9:22:22 PM northmetro
Container 10/8/2009 5:54:16 PM storagesamplecontainer
Container 10/8/2009 7:32:16 PM testcontainer

PS Blob:\> copy-cd c:\temp\image001.png imagecontainer\test.png
PS Blob:\> dir imagecontainer

Parent: CloudDriveSnapin\BlobDrive::http:\\127.0.0.1:10000\imagecontainer

Type Size LastWriteTimeUtc Name
---- ---- ---------------- ----
Blob 1674374 10/16/2009 9:02:57 PM test.png

Because imagecontainer is public the test.png image can be accessed in the browser from the local development storage with:
http://127.0.0.1:10000/devstoreaccount1/imagecontainer/test.png
or if the image was similarly loaded in a production Azure storage account:
http://sampleaccount.blob.core.windows.net/imagecontainer/test.png

It is worth noting that Azure storage consists of endpoints, containers, and blobs. There are some further subtleties for large blobs such as blocks and blocklists as well as metadata, but there is not really anything like a subdirectory. Subdirectories are emulated using slashes in the blob name.
i.e. northmetro/citylimits/BingMercator_12/006_019/000851_002543.png is a container, “northmetro“, followed by a blob name,
/citylimits/BingMercator_12/006_019/000851_002543.png.”

The browser can show this image using the local development storage:
http://127.0.0.1:10000/devstoreaccount1/northmetro/citylimits/BingMercator_12
/006_019/000851_002543.png

Changing to producton Azure means substituting a valid endpoint for “127.0.0.1:10000/devstoreaccount1″ like this:
http://sampleaccount.blob.core.windows.net/northmetro/citylimits/BingMercator_12
/006_019/000851_002543.png

With CloudDrive getting my tile pyramid into the cloud is straightforward and it saved writing custom code.

The tile pyramid – Geowebcache 1.2 beta

Geowebcache is written in Java and synchronizes very well with the GeoServer OGC service engine. The new 1.2 beta version is available as a .war that is loaded into the webapp directory of Tomcat. It is a fairly simple matter to configure geowebcache to create a tile pyramid of a particular Geoserver WMS layer. (Unfortunately it took me almost 2 days to work out a conflict with an existing Geoserver gwc) The two main files for configuration are:


C:\Program Files\Apache Software Foundation\Tomcat 6.0\webapps\
                     geowebcache1.2\WEB-INF\geowebcache-servlet.xml
C:\Program Files\Apache Software Foundation\Tomcat 6.0\webapps\
                    geowebcache1.2\WEB-INF\classes\geowebcache.xml

geowebcache-servlet.xml customizes the service bean parameters and geowebcache.xml provides setup parameters for tile pyramids of layers. Leaving the geowebcache-servlet.xml at default will work fine when no other Geoserver or geowebcache is around. It can get more complicated if you have several that need to be kept separate. More configuration info.

Here is an example geowebcache.xml that uses some of the newer gridSet definition capabilities. It took me a long while to find the schema for geowebcache.xml:
http://geowebcache.org/schema/docs/1.2.0/
The documentation is still thin for this beta release project.

<?xml version="1.0" encoding="utf-8"?>
<gwcConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="http://geowebcache.org/schema/1.2.0/geowebcache.xsd"
  xmlns="http://geowebcache.org/schema/1.2.0">
<version>1.2.0</version>
<backendTimeout>120</backendTimeout>
<gridSets>
  <gridSet>
  <name>BingMercator</name>
  <srs><number>900913</number></srs>
  <extent>
  <coords>
  <double>-11706995</double>
  <double>4839671</double>
  <double>-11687135</double>
  <double>4861458</double>
  </coords>
  </extent>
  <alignTopLeft>true</alignTopLeft>
  <levels>15</levels>
  </gridSet>
</gridSets>
<layers>
  <wmsLayer>
  <name>citylimits</name>
  <gridSubsets>
  <gridSubset>
  <gridSetName>BingMercator</gridSetName>
  <zoomStart>0</zoomStart>
  <zoomStop>10</zoomStop>
  </gridSubset>
  <gridSubset>
  <gridSetName>GoogleMapsCompatible</gridSetName>
  </gridSubset>
  </gridSubsets>
  <wmsUrl><string>http://localhost:80/geoserver/wms</string></wmsUrl>
  <wmsLayers>northmetro:citylimits</wmsLayers>
  <wmsStyles>citylimits</wmsStyles>
  </wmsLayer>
</layers>
</gwcConfiguration>

After editing the configuration files, building the pyramid is a matter of pointing your browser at the local webapp and seeding the tiles down to the level you choose with the gridSet you want. The GoogleMapsCompatible gridSet is built into geowebcache and the BingMercator is a custom gridSet that I’ve added with extent limits defined.
http://localhost/geowebcache1.2/rest/seed/citylimits

This can take a few hours/days depending on the extent and zoom level you need. Once completed I use the CloudDrive PowerShell to copy all of the tiles into Azure blob storage:

PS Blob:\> copy-cd C:\Program Files\Apache Software Foundation\Tomcat 6.0\temp\geowebcache\citylimits

This also takes some time for the resulting 243,648 files of about 1Gb.

Silverlight MapControl

The final piece in the project is adding the MapControl viewer layer. First I add a new tile source layer in the Map Control of the MainPage.xaml

  <m:Map
      Name="MainMap"
      NavigationVisibility="Visible"
      Grid.Column="0" Grid.Row="1" Grid.RowSpan="1" Padding="5"
      Mode="Road">
    <m:Map.Children>
       <!-- Azure tile source -->
       <m:MapTileLayer x:Name="citylimitsAzureLayer" Opacity="0.5" Visibility="Collapsed">
         <m:MapTileLayer.TileSources>
             <local:CityLimitsAzureTileSource></local:CityLimitsAzureTileSource>
         </m:MapTileLayer.TileSources>
      </m:MapTileLayer>
             .
             .

The tile naming scheme is described here:
http://geowebcache.org/trac/wiki/filestorage2
The important point is:

“Most filesystems use btree’s to store the files in directories, so layername/projection_z/[x/(2(z/2))]_[y/(2(z/2))]/x_y.extension seems reasonable, since it works sort of like a quadtree. The idea is that half the precision is in the directory name, the full precision in the filename to make it easy to locate problematic tiles. This will also make cache purges a lot faster for specific regions, since fewer directories have to be traversed and unlinked. “

An ordinary tile source class looks just like this:

  public class CityLimitsTileSource : Microsoft.VirtualEarth.MapControl.TileSource
  {
        public CityLimitsTileSource() : base(App.Current.Host.InitParams["src"] +
          "/geoserver/gwc/service/gmaps?layers=northmetro:citylimits&zoom={2}&x={0}&y={1}")
        {
        }

        public override Uri GetUri(int x, int y, int zoomLevel)
        {
           return new Uri(String.Format(this.UriFormat, x, y, zoomLevel));
        }
  }

However, now I need to reproduce the tile name as it is in the Azure storage container rather than letting gwc/service/gmaps mediate the nomenclature for me. This took a little digging. The two files I needed to look at turned out to be:

GMapsConverter works because Bing Maps follows the same upper left origin convention and spherical mercator projection as Google Maps. Here is the final approach using the naming system in Geowebcache1.2.

public class CityLimitsAzureTileSource : Microsoft.VirtualEarth.MapControl.TileSource
{
  public CityLimitsAzureTileSource()
  : base(App.Current.Host.InitParams["azure"] + "citylimits/GoogleMapsCompatible_{0}/{1}/{2}.png")
  {
  }

  public override Uri GetUri(int x, int y, int zoomLevel)
  {
   /*
   * From geowebcache
   * http://geowebcache.org/trac/browser/trunk/geowebcache/src/main/java/org/geowebcache/storage/blobstore/file/FilePathGenerator.java
   * http://geowebcache.org/trac/browser/trunk/geowebcache/src/main/java/org/geowebcache/service/gmaps/GMapsConverter.java
   * must convert zoom, x, y, and z into tilepyramid subdirectory structure used by geowebcache
  */
  int extent = (int)Math.Pow(2, zoomLevel);
  if (x < 0 || x > extent - 1)
  {
     MessageBox.Show("The X coordinate is not sane: " + x);
  }

  if (y < 0 || y > extent - 1)
  {
     MessageBox.Show("The Y coordinate is not sane: " + y);
  }
  // xPos and yPos correspond to the top left hand corner
  y = extent - y - 1;
  long shift = zoomLevel / 2;
  long half = 2 << (int)shift;
  int digits = 1;
  if (half > 10)
  {
     digits = (int)(Math.Log10(half)) + 1;
  }
  long halfx = x / half;
  long halfy = y / half;
  string halfsubdir = zeroPadder(halfx, digits) + "_" + zeroPadder(halfy, digits);
  string img = zeroPadder(x, 2 * digits) + "_" + zeroPadder(y, 2 * digits);
  string zoom = zeroPadder(zoomLevel, 2);
  string test = String.Format(this.UriFormat, zoom, halfsubdir, img );

  return new Uri(String.Format(this.UriFormat, zoom, halfsubdir, img));
  }

/**
  * From geowebcache
  * http://geowebcache.org/trac/browser/trunk/geowebcache/src/main/java/org/geowebcache/storage/blobstore/file/FilePathGenerator.java
  * a way to pad numbers with leading zeros, since I don't know a fast
  * way of doing this in Java.
  *
  * @param number
  * @param order
  * @return
  */
  public static String zeroPadder(long number, int order) {
  int numberOrder = 1;

  if (number > 9) {
    if(number > 11) {
      numberOrder = (int) Math.Ceiling(Math.Log10(number) - 0.001);
    } else {
      numberOrder = 2;
    }
  }

  int diffOrder = order - numberOrder;

    if(diffOrder > 0) {
      //System.out.println("number: " + number + " order: " + order + " diff: " + diffOrder);
      StringBuilder padding = new StringBuilder(diffOrder);

      while (diffOrder > 0) {
        padding.Append("0");
        diffOrder--;
       }
       return padding.ToString() + string.Format("{0}", number);
    } else {
      return string.Format("{0}", number);
    }
  }
}

I didn’t attempt to change the zeroPadder. Doubtless there is a simple C# String.Format that would replace the zeroPadder from Geowebcache.

This works and provides access to tile png images stored in Azure blob storage, as you can see from the sample demo.

Summary

Tile pyramids enhance user experience, matching the performance users have come to expect in Bing, Google, Yahoo, and OSM. It is resource intensive to make tile pyramids of large world wide extent and deep zoom levels. In fact it is not something most services can or need provide except for limited areas. Tile pyramids in the Cloud require relatively static layers with infrequent updates.

Although using Azure this way is possible and provides performance, scalability, and reliability, I’m not sure it always makes sense. The costs are difficult to predict for a high volume site as they are based on bandwidth usage as well as storage. Also you may be paying storage fees for many tiles seldom or never needed. Tile pyramid performance is a wonderful thing, but it chews up a ton of storage, much of which is seldom if ever used.

For a stable low to medium volume application it makes more sense to host a tile pyramid on your own server. Possibly with high volume sites where reliability is the deciding factor moving to Cloud storage services is the right thing. This is especially true where traffic patterns swing wildly or grow rapidly and robust scaling is an ongoing battle.

Azure CTP is of course not as mature as AWS, but obviously it has the edge in the developer community and like many Microsoft technologies it has staying power to spare. Leveraging its developer community makes sense for Microsoft and with easy to use tools built into Visual Studio I can see Azure growing quickly. In time it will just be part of the development fabric with most Visual Studio deployment choices seamlessly migrating out to the Azure Cloud.

Azure release is slated for Nov 2009.

Comments are closed.