This blog is written by the team behind SpatialKey. For more on the application, please see the Introduction to SpatialKey Video or check out the Features. We'll write here about potential use cases for SpatialKey, as well as issues related to location inteligence, data visualization, crime mapping, and geographic visualization.
Since the initial Beta release of SpatialKey in March we have received many requests for International geocoding. If you are not familiar with the term, geocoding it is the process of converting street addresses, or zip codes (postal codes) to geographic coordinates often expressed as latitude and longitude. With SpatialKey we are using TIGER (Topologically Integrated Geographic Encoding and Referencing system) to geocode our data. TIGER is provided by the Census Bureau which is freely available for public use. The TIGER geocoder is only available in the US and there is no such international equivalent. We have been researching different options for a unified international solution for SpatialKey but there are several challenges that that have prevented us from providing a solution:
International geocoding requires conversion from vastly differing address formats across many different countries making it difficult to obtain a one size fits all solution. This requires us to implement specific solutions for different countries or areas of the world.
Of the well known and widely available international geocoding solutions most are not free or prevent the commercial use within a third party product.
Many international geocoding solutions are quite costly and can cost hundreds or thousands of dollars to geocode just a few thousand addresses.
What can I do now if I have International data that I want to use in SpatialKey?
If your existing data does not have latitude and longitude and is outside of the United States there are several third party solutions that you can use to geocode your data before importing into SpatialKey. The USC GIS research laboratory provides a comprehensive listing of free and paid Geocoding options. There are very few free services that offer bulk geocoding for international addresses but you can utilize one of the free services listed at USC GIS research laboratory and script your own solution for bulk geocoding. Using a combination of a third party geocoding solution and our data import API you can automate management of your data within SpatialKey.
One solution that provides a free bulk geocoding for European data with up to 5000 addresses per day is http://www.batchgeocode.com. BatchGeocode uses the Yahoo API’s and supports both United States and European addresses.
Will SpatialKey provide International geocoding in the future?
We will continue to search for a low cost, easy to use International geocoding solution that can be integrated with SpatialKey. If you have an immediate need for international geocoding within SpatialKey we can custom develop a solution that integrates with a third party geocoder. Contact our sales team if you are interested in a custom solution.
Getting data into SpatialKey has always been simple but until now there was no way to programatically automate imports and updates of your datasets. Today we introduced the SpatialKey Data Import API (DIAPI). The DataImport API allows developers to utilize a variety of platforms and programming languages (like Java, ColdFusion, .Net, PHP, etc.) to automate the creation and management of Datasets within SpatialKey.
Here are the basic steps to get started with the DIAPI:
Use the HTTP services to authenticate and upload these assets. See the Developer Guide and DIAPI Documentation for more details.
To help you get started we have provided samples in both Java and ColdFusion. Additionally we provide an example application that you can download, customize and deploy called the SpatialKey Data Poller.
We recently posted an article demonstrating how to Visualize and Map SalesForce Leads with SpatialKey. I like this example, because we haven’t built a special connector or any type of app in the Force.com AppExchange - although that’s something we’re thinking about. SalesForce allows you to export any report as CSV, and SpatialKey can simply consume that CSV. The article shows how to quickly build a report like this:
In just a few minutes, SalesForce users can gain a unique view into location patters related to their leads, opportunities, contacts, or accounts. We really enjoyed showing this example to attendees at the recent Web 2.0 Expo in San Francisco. Several people mentioned that they had been looking for a way to map their SalesForce data for quite some time. They were amazed at how easy this was to do, and at the interactivity of the reports. Others had never thought to visualize their CRM information this way. It was fun to watch the “ah ha” moment as people started to think of how this geographic intelligence could be applied to their sales and marketing efforts.
We gave away lots of beta invites at the show, so we are looking forward to hearing feedback from people visualizing their SalesForce data with SpatialKey. We’re especially interested to hear feedback about what other features or workflow users might like to see related to CRM visualizations. At the top of the list so far is the ability to associate your CRM data with third party demographics data, so we’re giving that some thought…
The San Francisco Chronicle’s website, SFGate.com, has a nice map showing the top locations in San Francisco where parking citations are issued. The dataset includes individual locations where 100 or more citations were issued, so it’s a map of the single places you’re most likely to get ticketed (but note that it doesn’t include the full dataset, only the top 576 locations). They’ve created a map that uses the Google Maps API and they overlay their own custom markers that use graduated circles to represent the number of tickets issued at any given location. The size of the circle indicates how many tickets were issued, and each unique location has one circle centered on the location.
But there are a few problems with this visualization. The two most obvious things that stand out are the difficulty in understanding the density of many map markers all overlapping one another, which is seen in the north-east area of the city (downtown), and the second issue is the fact that the one huge marker at the southern edge of the city makes every other marker look tiny and unimportant. A single glance at this map would lead me to conclude that there must be more parking citations issued in the southern area of the city than in the other areas. But that’s the wrong conclusion to draw.
The problem with overlapping markers
One of the big issues that we see with maps that contain a lot of data points is that the kind of markers that are typically used in online maps start to become unreadable when you get dense areas of data. In this dataset there aren’t even that many data points (576 total), but the concentration downtown makes that area a jumble of markers.
You can tell that there are a lot of points in the area, but you can’t tell how many there are. And in this case, each marker doesn’t just represent a single point, the size of the circle also represents how many tickets were issued at that location, so ideally I would be able to look at this map and tell where the most citations are issued, but I simply have no way of knowing that.
The problem with relative sizing of individual markers
The second problem has to do with that large marker near the bottom of the map. This map leads to a confusing conclusion because every map marker, regardless of how close it is to other markers (or even if it overlaps others) is showing the value of a single location. This means that if there are 10 locations all within a single city block that each have 100 citations, and then there is a separate location elsewhere in the city that has 500 citations, that location with 500 citations will appear 5 times as large as any of the other locations, and you will have no way of knowing that within a single block there were actually 1,000 citations issued (making that block a far more likely area for receiving a parking ticket). What we really want to see is the total citations issued within a certain geographic radius, so we can view which areas have the most total citations, not just the single locations.
SpatialKey to the rescue with aggregated heatmaps
To try to better understand the underlying data, I decided to bring the same dataset into SpatialKey. The data on the SFGate website was loaded into their Google Maps application in JSON format, and to get it into SpatialKey I simply grabbed the JSON feed, opened it up in a text editor, and did a bit of find and replace to convert the data to CSV (the whole process took a few minutes to get the CSV ready for import). Then I imported the data using the SpatialKey CSV import feature (for more on this and other features, check out the feature videos).
Once I had the data imported I loaded up a new report with a map and here’s what I got:
San Francisco parking citations heatmap in SpatialKey (click to enlarge)
This map shows a heatmap that visualizes the total citations issued. But the important difference is that the clusters of data points downtown are aggregated by geography so items that are very close together are all factored into the hotspots. Now we’re able to see the real relationship between areas of the city. That point down in the southern part of the city is still visible, but it becomes clear that there are far more citations issued downtown. This deeper understanding is possible because we aren’t simply throwing a marker for each point up on the map, we’re aggregating the total value for all markers within a certain geographic area.
If we zoom in downtown we can see another view that shows the more specific hotspots:
Heatmap of parking citations in downtown San Francisco (click to enlarge)
Looks like they get people along Market street. Right at the Westfield Shopping Center is a prime spot, as well as the intersections of Market and O’farrell and near Market and Sutter (if you’re parking there, look out!). You can see that out of the total parking citations in this dataset (82,911) about 42% (34,695) are issued just within the downtown area shown in the above screenshot.
I hope this example shows how important it is to be able to tell the right story with your data. SpatialKey gives you the flexibility to visualize your data in complex ways that go beyond simply throwing markers on a map. Have you run into similar problems with the current tools for web-based mapping? If so let us know in the comments and then sign up for the SpatialKey beta program!