Converting Boundary Data from OpenStreetMap to GeoJSON
A recent project required me to search for boundary data. After looking through numerous articles, I determined I could probably pull the data from openstreetmap.org. OpenStreetMap is a gigantic resource for global geoinformation, and it's open-source, so that is a plus. Then I just needed to figure out how to query the data I need and convert it to a useful format to stuff into a database (ElasticSearch in my case).
The first problem is getting the data. Initially, I pulled the entire planet file (45GB) from planet.openstreetmap.org. This file has everything from locations to streets, and also the boundary data that I need. After working on the data for a while, I found region-specific files at geofabrik.de.
curl http://download.geofabrik.de/north-america-latest.osm.pbf -o na-latest.osm.pbf
After the file downloaded, I used several tools to work on it. The first one is osmium. It's the swiss-army knife of geo. If you plan on doing extensive geo work, this tool is a must. The next one is osmfilter. Osmfilter allows querying of the osm file format (you'll see conversions later). Finally, use gdal to convert to the final geoJSON
data. I used Nominatim to reference the data available in OpenStreetMap. All these tools were available via Homebrew, so installation wasn't a problem.
brew install osmium-tool
brew install osmfilter
brew install gdal
The pbf
file contains everything from borders to streets. I'm looking for just administrative boundaries. That's where osmium
comes in.
osmium tags-filter na-latest.osm.pbf nwr/admin_level --overwrite -o na-latest-admin.osm.pbf
Now the pbf
file only contains items related to administrative levels. Unfortunately, that also includes tags unrelated to the boundary polygons. Luckily, osmfilter
sorts this out. As the name implies, this tool works with osm
files. A quick conversion is required.
osmconvert na-latest-admin.osm.pbf -o=na-latest-admin.osm
Perfect. Now osmfilter
can be used to remove everything I don't need.
osmfilter na-latest-admin.osm --drop-tags="barrier= building= highway= landuse= office= place= waterway=" -o=na-latest-admin-noplace.osm
With the clean file, it's time to convert to geoJSON
. gdal
can do this, but it requires the original pbf
format (of course it does). Another conversion's required.
osmconvert na-latest-admin-noplace.osm -o=na-latest-admin-noplace.osm.pbf
Finally, create the geoJSON
.
ogr2ogr -f GeoJSON na-latest-admin-noplace.geojson na-latest-admin-noplace.osm multipolygons
And with that, I have a pretty geoJSON
file I can use for my project. The project took a surprising amount of time to find the correct data and learn how to manage it.
Alternative approach
When I first attacked the project, I used osmtogeojson
to do the geoJSON conversion. It was successful, but with some of the queries, I was getting odd results. This was probably due to my knowledge of the tools more than any specific issue with them. This is how I did it.
osmconvert planet.osm.pbf --out-o5m > planet.o5m
osmfilter --keep="admin_level=2 and boundary=administrative" planet.o5m -o=myfile.osm
osmtogeojson myfile.osm > myfile.geojson
Honorable mentions
I've listed a couple of utilities and sites I found on my journey to get boundary data.
Conversion script
After spending many hours reading documentation, this script is the one that put me in the right direction. Special thanks to SomeoneElseOSM for putting it together.
OSM Admin Boundaries Map
This excellent little utility visualizes available OpenStreetMap data. If you're looking to verify the validity of a boundary or even download geoJSON for a specific place, this is the way to do it.
Please don't scrape this site. The author was kind enough to host this dataset, so let's be good samaritans and respect its intended use.
Pelias
These wildcats built a full geocoder on top of Elasticsearch. It's open-source and even has Docker images. If you're looking for a complete geo system, they seem to have done a great job.
Hopefully, someone can save themselves some time.