Codename “Dallas” Data Subscription


Fig 1 – Data.Gov subscription source from Microsoft Dallas

What is Dallas? More info here Dallas Blog

“Dallas is Microsoft’s Information Service, built on and part of the Windows Azure platform to provide developers and information workers with friction-free access to premium content through clean, consistent APIs as well as single-click BI/Reporting capabilities; an information marketplace allowing content providers to reach developers of ALL sizes (and developers of all sizes to gain access to content previously out of reach due to pricing, licensing, format, etc.)”

I guess I fall into the information worker category and although “friction-free” may not be quite the same as FOSS maybe it’s near enough to do some experiments. In order to make use of this data service you need to have a Windows Live ID with Microsoft. The signup page also asks for an invitation code which you can obtain via email. Once the sign-in completes you will be presented with a Key which is used for access to any of the data subscription services at this secure endpoint:
https://www.sqlazureservices.com

Here is a screen shot showing some of the free trial subscriptions that are part of my subscription catalog. This is all pretty new and most of the data sets listed in the catalog still indicate “Coming Soon.” The subscriptions interesting to me are the ones with a geographic component. There are none yet with actual latitude,longitude, but in the case of Data.gov’s crime data there is at least a city and state attribution.


Fig 2 – Dallas subscriptions

Here is the preview page showing the default table view. You select the desired filter attributes and then click preview to show a table based view. Also there is a copy of the url used to access the data on the left. Other view options include “atom 1.0″, “raw”, and direct import to Excel Pivot.

Fig 3 – Dallas DATA.Gov subscription preview – Crime 2006,2007 USA

There are two approaches for consuming data.

1. The easiest is the url parameter service approach.
https://api.sqlazureservices.com/DataGovService.svc/crimes/Colorado?$format=atom10

This isn’t the full picture because you also need to include your account key and a unique user ID in the http header. These are not sent in the url but in the header, which means using a specialized tool or coding an Http request.

	WebRequest request = WebRequest.Create(url);
	request.Headers.Add("$accountKey", accountKey);
	request.Headers.Add("$uniqueUserID", uniqueUserId);

	// Get the response
	HttpWebResponse response = (HttpWebResponse)request.GetResponse();

The response in this case is in Atom 1.0 format as indicated in the format request parameter of the url.

<feed xmlns="http://www.w3.org/2005/Atom"
  xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"
  xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
  <title type="text">Data.Gov - U.S. Offenses Known to Law Enforcement</title>
  <id>https://api.sqlazureservices.com/DataGovService.svc/crimes/Colorado?$format=atom10</id>
  <rights type="text">2009 U.S. Government</rights>
  <updated>2009-11-25T18:20:02Z</updated>
  <link rel="self" title="Data.Gov - U.S. Offenses Known to Law Enforcement"
href="https://api.sqlazureservices.com/DataGovService.svc/crimes/Colorado?$format=atom10" />
  <entry>
    <id>https://api.sqlazureservices.com/DataGovService.svc/crimes/Colorado?$format=atom10
&$page=1&$itemsperpage=1</id>
    <title type="text">Colorado / Alamosa in 2007</title>
    <updated>2009-11-25T18:20:02Z</updated>
    <link rel="self" href="https://api.sqlazureservices.com/DataGovService.svc/crimes/Colorado?$format=atom10
&$page=1&$itemsperpage=1" />
    <content type="application/xml">
      <m:properties>
        <d:State m:type="Edm.String">Colorado</d:State>
        <d:City m:type="Edm.String">Alamosa</d:City>
        <d:Year m:type="Edm.Int32">2007</d:Year>
        <d:Population m:type="Edm.Int32">8714</d:Population>
        <d:Violentcrime m:type="Edm.Int32">57</d:Violentcrime>
        <d:MurderAndNonEgligentManslaughter m:type="Edm.Int32">1</d:MurderAndNonEgligentManslaughter>
        <d:ForcibleRape m:type="Edm.Int32">11</d:ForcibleRape>
        <d:Robbery m:type="Edm.Int32">16</d:Robbery>
        <d:AggravatedAssault m:type="Edm.Int32">29</d:AggravatedAssault>
        <d:PropertyCrime m:type="Edm.Int32">565</d:PropertyCrime>
        <d:Burglary m:type="Edm.Int32">79</d:Burglary>
        <d:LarcenyTheft m:type="Edm.Int32">475</d:LarcenyTheft>
        <d:MotorVehicleTheft m:type="Edm.Int32">11</d:MotorVehicleTheft>
        <d:Arson m:type="Edm.Int32">3</d:Arson>
      </m:properties>
    </content>
  </entry>
		.
		.
		.
              	.

If you’re curious about MurderAndNonEgligentManslaughter, I assume it is meant to be: “Murder And Non Negligent Manslaughter”. There are some other anomalies I happened across such as very few violent crimes in Illinois. Perhaps Chicago politicians are better at keeping the slate clean.

2. The second approach using a generated proxy service is more powerful.

On the left corner of the preview page there is a Download C# service class link. This is a generated convenience class that lets you invoke the service with your account, ID, and url, but handles the XML Linq transfer of the atom response into a nice class with properties. There is an Invoke method that does all the work of getting a collection of items generated from the atom entry records:

    public partial class DataGovCrimeByCitiesItem
    {
        public System.String State { get; set; }
        public System.String City { get; set; }
        public System.Int32 Year { get; set; }
        public System.Int32 Population { get; set; }
        public System.Int32 Violentcrime { get; set; }
        public System.Int32 MurderAndNonEgligentManslaughter { get; set; }
        public System.Int32 ForcibleRape { get; set; }
        public System.Int32 Robbery { get; set; }
        public System.Int32 AggravatedAssault { get; set; }
        public System.Int32 PropertyCrime { get; set; }
        public System.Int32 Burglary { get; set; }
        public System.Int32 LarcenyTheft { get; set; }
        public System.Int32 MotorVehicleTheft { get; set; }
        public System.Int32 Arson { get; set; }

    }
                .
                .
                .
public List Invoke(System.String state,
            System.String city,
            System.String year,
            int page)
{
     .
     .
     .

Interestingly, you can’t just drop this proxy service code into the Silverlight side of a project. It has to be on the Web side. In order to be useful for a Bing Maps Silverlight Control application you still need to add a Silverlight WCF service to reference on the Silverlight side. This service simply calls the nicely generated Dallas proxy service which then shows up in the Asynch completed call back.

private void GetCrimeData(string state, string city, string year, int page,string crime )
{
  DallasServiceClient dallasclient = GetServiceClient();
  dallasclient.GetItemsCompleted += svc_DallasGetItemsCompleted;
  dallasclient.GetItemsAsync(state, city, year, page, crime);
}

private void svc_DallasGetItemsCompleted(object sender, GetItemsCompletedEventArgs e)
{
  if (e.Error == null)
  {
      ObservableCollection<DataGovCrimeByCitiesItem> results = e.Result as
                                ObservableCollection<DataGovCrimeByCitiesItem>;
                         .
                         .
                         .

This is all very nice, but I really want to use it with a map. Getting the Dallas data is only part of the problem. I still need to turn the City, State locations into latitude, longitude locations. This can easily be done by adding a reference to the Bing Maps Web Services Geocode service. With the geocode service I can loop through the returned items collection and send each off to the geocode service getting back a useable LL Location.

foreach(DataGovCrimeByCitiesItem item in results){
   GetGeocodeLocation(item.City + "," + item.State, item);
}

Since all of these geocode requests are also Asynch call backs, I need to pass my DataGovCrimeByCitiesItem object along as a GeocodeCompletedEventArgs e.UserState. It is also a bit tricky determining exactly when all the geocode requests have been completed. I use a count down to check for a finish.

With a latitude, longitude in hand for each of the returned DataGovCrimeByCitiesItem objects I can start populating the map. I chose to use the Bubble graph approach with the crime statistic turned into a diameter. This required normalizing by the maximum value. It looks nice, although I’m not too sure how valuable such a graph actually is. Unfortunately this CTP version of Dallas data service has an items per page limit of 100. I can see why this is done to prevent massive data queries, but it complicates normalization since I don’t have all the pages available at one time to calculate a maximum. I could work out a way to call several pages, but there is a problem with an odd behaviour which seems to get results looped back on the beginning to finish the default 100 count on pages greater than 1. There ought to be some kind of additional query for count, max, and min of result sets. I didn’t see this in my experiments.

One drawback to my approach is the number of geocode requests that are accumulated. I should really get my request list only once per state and save locally. All the bubble crime calculations could then be done on a local set in memory cache. There wouldn’t be a need then for return calls and geocode loop with each change in type of crime. However, this version is a proof of concept and lets me see some of the usefulness of these types of data services as well as a few drawbacks of my initial approach.

Here is a view of the Access Report for my experiment. If you play with the demo you will be adding to the access tallies. Since this is CTP I don’t get charged, but it is interesting to see how a dev pay program might utilize this report page. Unfortunately, User ID is currently not part of the Access Report. If this Access Report would also sort by the User ID you could simply identify each user with their own unique ID and their share of the burden could be tracked.

Fig 4 – Dallas Data.Gov subscription Access Report

Summary

The interesting part of this exercise is seeing how the Bing Maps Silverlight Control can be the nexus of a variety of data service sources. In this simple demo I’m using the Bing Maps service, The Bing Maps Web Geocode Service, and the Data.gov Dallas data service. I could just as easily add other sources from traditional WMS, WFS sources, or local tile pyramids and spatial data tables. The data sources can be in essence out sourced to some other service. All the computation happens in the client and a vastly more efficient distributed web app is the result. My server isn’t loaded with all kinds of data management issues or even all that many http hits.


Fig 5 – Distributed Data Sources – SOA
This entry was posted in Uncategorized by admin. Bookmark the permalink.

Comments are closed.