Recently I have done a project for GrowthFocus, an Australian company that helps clients with acquisitions, growth and exit strategies. This client wanted to visualize the locations of their consultants on a map. So he gave me a csv file with 70000 data points and a rough idea about how the map would look like. But how to visualize 70000 data points on a map?
A marker is an object used to indicate a position, place, or route. It can have different shapes but it is normally used to represent a single datapoint. But 70000 markers on a single map of Australia just doesn’t work. Luckily I was not the first with this problem, since there are a couple of libraries out there that can help. The Leaflet Markerclusters plugin uses a clustering technique to visualize large amounts of data. It works like this: if there is more than 1 datapoint in a certain region, a marker is created which represents all the datapoints in that region. The number of datapoints is indicated in the marker. A really nice features is that if you click on it, it will zoom in on that region and the marker will collapse in multiple smaller markers. These ‘submarkers’ can be single markers or again..cluster markers (but with less datapoints).
D3js Leaflet and Marker Clusters
Geojson format and performance
The markerclusters library requires the data to be in geojson format. This format is an extension to the very popular JSON format and it has a default way of storing the location information about the data points. Since I ended up with 2 geojson files which were in total about 10MB, the loading of the map was pretty slow. There are a couple of things I did to reduce the file size.
- Minimize the content: use abbreviations for attributes in every record. Saved me about 10%
- Minify the file: this means removing superfluous whitespaces, newlines, etc. This can be done online as well in code. This saved me about 40%
I ended up with two zip files, which were about 300KB in total which is a huge reduction compared to the 10MB. The client needs to do a little more work now, since the browser needs to unpack the data but processing it but we save some bandwidth and clients are powerful enough nowadays. See the link below for the visualization.