Mapping Parking Tickets in San Francisco (and the problem with simple map markers)
31 March 2009
The San Francisco Chronicle’s website, SFGate.com, has a nice map showing the top locations in San Francisco where parking citations are issued. The dataset includes individual locations where 100 or more citations were issued, so it’s a map of the single places you’re most likely to get ticketed (but note that it doesn’t include the full dataset, only the top 576 locations). They’ve created a map that uses the Google Maps API and they overlay their own custom markers that use graduated circles to represent the number of tickets issued at any given location. The size of the circle indicates how many tickets were issued, and each unique location has one circle centered on the location.
But there are a few problems with this visualization. The two most obvious things that stand out are the difficulty in understanding the density of many map markers all overlapping one another, which is seen in the north-east area of the city (downtown), and the second issue is the fact that the one huge marker at the southern edge of the city makes every other marker look tiny and unimportant. A single glance at this map would lead me to conclude that there must be more parking citations issued in the southern area of the city than in the other areas. But that’s the wrong conclusion to draw.
The problem with overlapping markers
One of the big issues that we see with maps that contain a lot of data points is that the kind of markers that are typically used in online maps start to become unreadable when you get dense areas of data. In this dataset there aren’t even that many data points (576 total), but the concentration downtown makes that area a jumble of markers.
You can tell that there are a lot of points in the area, but you can’t tell how many there are. And in this case, each marker doesn’t just represent a single point, the size of the circle also represents how many tickets were issued at that location, so ideally I would be able to look at this map and tell where the most citations are issued, but I simply have no way of knowing that.
The problem with relative sizing of individual markers
The second problem has to do with that large marker near the bottom of the map. This map leads to a confusing conclusion because every map marker, regardless of how close it is to other markers (or even if it overlaps others) is showing the value of a single location. This means that if there are 10 locations all within a single city block that each have 100 citations, and then there is a separate location elsewhere in the city that has 500 citations, that location with 500 citations will appear 5 times as large as any of the other locations, and you will have no way of knowing that within a single block there were actually 1,000 citations issued (making that block a far more likely area for receiving a parking ticket). What we really want to see is the total citations issued within a certain geographic radius, so we can view which areas have the most total citations, not just the single locations.
SpatialKey to the rescue with aggregated heatmaps
To try to better understand the underlying data, I decided to bring the same dataset into SpatialKey. The data on the SFGate website was loaded into their Google Maps application in JSON format, and to get it into SpatialKey I simply grabbed the JSON feed, opened it up in a text editor, and did a bit of find and replace to convert the data to CSV (the whole process took a few minutes to get the CSV ready for import). Then I imported the data using the SpatialKey CSV import feature (for more on this and other features, check out the feature videos).
Once I had the data imported I loaded up a new report with a map and here’s what I got:
This map shows a heatmap that visualizes the total citations issued. But the important difference is that the clusters of data points downtown are aggregated by geography so items that are very close together are all factored into the hotspots. Now we’re able to see the real relationship between areas of the city. That point down in the southern part of the city is still visible, but it becomes clear that there are far more citations issued downtown. This deeper understanding is possible because we aren’t simply throwing a marker for each point up on the map, we’re aggregating the total value for all markers within a certain geographic area.
If we zoom in downtown we can see another view that shows the more specific hotspots:
Looks like they get people along Market street. Right at the Westfield Shopping Center is a prime spot, as well as the intersections of Market and O’farrell and near Market and Sutter (if you’re parking there, look out!). You can see that out of the total parking citations in this dataset (82,911) about 42% (34,695) are issued just within the downtown area shown in the above screenshot.
I hope this example shows how important it is to be able to tell the right story with your data. SpatialKey gives you the flexibility to visualize your data in complex ways that go beyond simply throwing markers on a map. Have you run into similar problems with the current tools for web-based mapping? If so let us know in the comments and then sign up for the SpatialKey beta program!




2 Comments