MarkerClusters for aggregating spatial data

Recently I have done a project for GrowthFocus, an Australian company that helps clients with acquisitions, growth and exit strategies. This client wanted to visualize the locations of their consultants on a map. So he gave me a csv file with 70000 data points and a rough idea about how the map would look like. But how to visualize 70000 data points on a map?

Marker Clusters

A marker is an object used to indicate a position, place, or route. It can have different shapes but it is normally used to represent a single datapoint. But 70000 markers on a single map of Australia just doesn’t work. Luckily I was not the first with this problem, since there are a couple of libraries out there that can help. The Leaflet Markerclusters plugin uses a clustering technique to visualize large amounts of data. It works like this: if there is more than 1 datapoint in a certain region, a marker is created which represents all the datapoints in that region. The number of datapoints is indicated in the marker. A really nice features is that if you click on it, it will zoom in on that region and the marker will collapse in multiple smaller markers. These ‘submarkers’ can be single markers or again..cluster markers (but with less datapoints).

D3js Leaflet and Marker Clusters

Leaflet is a great open-source Javascript library for interactive maps. I have used it before with R-shiny and D3js. I decided to use D3js, since the end result had to be integrated on the client’s website and with D3js you are more flexible to do this. To be able to use Markerclusters on top of the Leaflet library, you have to add a separate javascript and css file, which you can find on the github page.

Geojson format and performance

The markerclusters library requires the data to be in geojson format. This format is an extension to the very popular JSON format and it has a default way of storing the location information about the data points. Since I ended up with 2 geojson files which were in total about 10MB, the loading of the map was pretty slow. There are a couple of things I did to reduce the file size.

Minimize the content: use abbreviations for attributes in every record. Saved me about 10%
Minify the file: this means removing superfluous whitespaces, newlines, etc. This can be done online as well in code. This saved me about 40%
Compress the file and use a javascript library like JSZip to unzip. JSON data has a high compression ratio in general since it’s mostly a repetition of certain records. This saved me about 95%

I ended up with two zip files, which were about 300KB in total which is a huge reduction compared to the 10MB. The client needs to do a little more work now, since the browser needs to unpack the data but processing it but we save some bandwidth and clients are powerful enough nowadays. See the link below for the visualization.

Ger Inberg

data science developer

MarkerClusters for aggregating spatial data

Marker Clusters

D3js Leaflet and Marker Clusters

Geojson format and performance