↑ OpenStreetmap Hacker's guide ↑↑ Net & Web  

Custom mapping (3) - rethinking the whole shebang

Data format -- Elevation data -- Rendering -- Style ideas -- Rendering ideas

If you are using OpenStreetmap data for your own purposes with your own requirements, there may come a point where you are no longer well served by the common way of doing things. Which part of it disagrees with you will depend very much on your requirements, so what is presented on this page can only be examples. But I hope I can demonstrate that creating your own data format and toolchain need not be prohibitively expensive in terms of time and effort, and the below may be of use to others at least partly.

Rethinking the data format

The idea that a custom file format may be beneficial is based on two observations: First, the distributed OpenStreetmap data files are "master copies" containing much information irrelevant to rendering, such as user ID and time of last change, original source of an object and references to other databases (both geospatial and others such as public transport). Second, the most time-consuming and very complex part of rendering maps is creating the database from the OpenStreetmap data files. This makes sense for the common use case of a web map server, where the database is accessed frequently and updated rather than recreated, but much less so for rendering complete maps of diverse regions. A suitable data format could be used for rendering directly from the file. To keep down its size and still allow quick random access, the file format would need to be binary.

Optimisation opportunities in a new data format:

Analysing tags and their values

Tags are the part of the OpenStreetmap data that give geometric objects their meaning, but also where all the inventiveness and cunning of human stupidity comes to bear. Some contributors make their contribution useless by assigning tag values in their native language (German and Russian are popular), including arbitrary information, strings of characters that would make decent passwords, numbers where text is expected or vice versa, and many other ways you never imagined in your wildest nightmares. While yes is a standard value of the building tag, the French (or Québéquois séparatiste?) oui is probably ignored by most software; the values fixme and ? will no doubt be useful once software has become clairvoyant; building=manhole cover and building=greenhouse/half-cylindrical plastic sheet covred plant growning housey things [sic] at least have some entertainment value. Selecting tags and values for inclusion in a custom format offers the opportunity to discard unusable data but also to salvage what can be salvaged.

Here is an example program that reads an OpenStreetmap XML file from stdin and outputs statistics of their values. Its results reveal how much of a dump the OpenStreetmap database is. Does your mapping solution rely on the location of anthills, on who of a group of people has visited a given peak in Ireland, on the model name of a power generator, on the diameter of fire hydrants, or on the blinking pattern and orientation of seamarks? If not, you can save a lot of space by filtering it. This program extracts objects with specific tags or values to take a closer look. Inspection makes it plain that querying specific tag values ignores a lot of information. Don't rely on building=barn, some contributors have used building=yes; bulding:type=barn. Making full use of the OpenStreetmap data is more like screen-scraping web pages than using a database.

Adding elevation information for large areas

As I prefer topographical maps, I often add contour lines based on the SRTM data to my maps, and would like to integrate them into any database from which I create maps. The usual tool for creating contour lines, gdal_contour, has ESRI shapefiles as its main output format and simply stops writing when its maximum size of 4 GB would be exceeded. (As I see it, the only real restriction is the total size field in the file header, but gdal_contour prefers denial of service to violating the standard.) Depending on its size and topography, trying to generate contours for a largish (country-sized) region can hit that limit.

Though gdal_contour supports a large number of data formats, not all of them can be created and their documentation tends not to mention size restrictions or support for attributes (such as the elevation of the contour, which is nice to have). Many more formats provided by GPSBabel may also be supported (with a curious syntax) but do not seem to support attributes either, as well as being proprietary and undocumented.

Since this page is about rolling your own data format, the problem to solve is merely getting the data out of gdal_contour without costing too much time for custom postprocessing. Potential options are:

Rethinking rendering

When viewing the OpenStreetmap web map, I notice regularly that at zoom levels around 12, the map looks very empty in many places, containing much fewer place names than would be possible. At other zoom levels, sometimes place names are omitted due to conflicts even though the map is not very busy; shifting the labels slightly would have allowed including them all. Going yet further, one could envisage trying to print all place names in order of descending size up to a maximum density. Unless I am mistaken, this is something that cannot be done with Mapnik.

As I have complained elsewhere in this guide, Mapnik is not very well documented. A related area where it fails is how the map style is set. As in many other programs that are harder to use than they need to be, the style interface is designed to make coding Mapnik easy rather than using it. Style settings of different zoom levels are independent by default even though this is far from the case in typical rendered maps. This is not to say that the choice should not be there, but that a sensible default should include having consistent colouration of roads and land use across all scales, and putting roads and buildings on top of the land use background, for example. Mapnik does not offer the right degrees of freedom for that. Map style languages on top of Mapnik can mitigate this to an extent, but introducing additional levels of abstraction is often itself development-centric, creates additional complexity and requires the user to become an expert at another underdocumented software package. Part of why reasonable defaults are lacking is that Mapnik is not acutally aware of what the map data mean &emdash; a river could nonsensically be rendered as a road if the style prescribed that.

Rendering a map is quite a complex matter, but it can be separated into parts that require awareness that a map is being rendered (rather than, say, a raytracing picture or a graph) and those that merely turn geometric objects into pixel data. The first part is what we would like full control over, to allow specification of any map style we wish, but without stating the obvious and without redundancy. The second part can be performed well by a program unrelated to map rendering such as an SVG renderer. Looking at OpenStreetmap data, how one would map it to SVG is quite obvious: There are points (nodes), lines (ways), polygons and text labels. What remains is deciding what to include, styling and coordinate projection using the proj library.

None of this is implemented yet, but here are some ideas that seem reasonable to me but are not reflected in the currently availble tools:

Ideas on map style

Ideas on rendering

Licensed under the Creative Commons Attribution-Share Alike 3.0 Germany License