Usage of osm2pgsql
Syntax --
Functionality --
Operation --
Projections --
Options --
Examples --
Database --
Authors --
Development --
References
This documentation page was cobbled together from the various locations where
documentation for osm2pgsql has been hidden to make it hard to find: The
OpenStreetmap Wiki, source readme file, help message and manual page
(referenced below). I have also added my own experiences where
possible.
Command line syntax
osm2pgsql [options] planet.osm
osm2pgsql [options] planet.osm.{gz,bz2,pbf}
osm2pgsql [options] file1.osm file2.osm file3.osm
The Wiki suggests running decompression and database import in parallel on
multi-core machines, like this:
bunzip -c file.osm.bz2 | osm2pgsql [options] /dev/stdin
Functionality
osm2pgsql serves to import packed OpenStreetmap geodata into a PostgreSQL
/ PostGIS database. This live database can then be used by rendering programs
or the
Nominatim geocoder. OSM planet snapshots can be downloaded
from here
. Geodata files for various
countries and regions ("extracts") are available
here
in the especially small PBF
(ProtoBufBinary) format, and from other sources (see
reference list in the Wiki
).
osm2pgsql currently supports two different database schemas that can be
selected with its -o option:
- A database schema that is optimized for ease of rendering by Mapnik
.
- A database schema that is optimized for geocoding with
Nominatim
, emphasizing the
spatially hierarchical organizations of objects.
Both schemas were specifically optimized for the purpose they were
intended for and they may therefore be less suitable for other
general purpose processing. Nevertheless, the rendering schema
might be useful for other purposes as well, and has been used
for a variety of additionally purposes.
Feature list:
- Converts OSM files to a PostgreSQL database
- Conversion of OSM tags to database columns is configurable in the style file
- Able to read .gz, .bz2, .pbf and .o5m files directly
- Can apply diffs to keep the database up to data
- Supports the choice of output projection
- Configurable table names
- Gazetteer back-end for Nominatim
- Support for hstore field type to store the complete set of tags in one database field if desired
If you are completely new to osm2pgsql, you may prefer to follow a tutorial
rather than read the docs. Tile server setup tutorials are linked
here, a tutorial for creating map tiles for offline
use is here.
The most important options determining its mode of operation are:
- -s, --slim
- Store temporary tables in database. This allows incremental updates using diff
files also available at OSM data servers, and reduces memory usage at a cost in
disk space and import time. This mode of operation is recommended.
See this Wiki page
on how
to keep a database up to date.
- --flat-nodes file
- Store temporary tables for --slim outside database in file. Reduces
disk space considerably. It is still only recommended for whole-planet
databases because the organisation of the file is optimised for that case.
- -k, --hstore
- Hstore is a data
type in a PostgreSQL database
that can itself store multiple key-value pairs.
With this option, osm2pgsql stores non-standard tags in an additional
hstore-type column, while standard tags are still stored in a column of their
own as usual. This option requires that the hstore extension has been enabled
in the database (see below).
Steps of operation
osm2pgsql performs the following actions:
- osm2pgsql connects to database and creates the following 4 tables
when used with the default output back-end (pgsql): planet_osm_point,
planet_osm_line, planet_osm_roads and planet_osm_polygon. The default prefix
"planet_osm" can be changed with the --prefix option. If you are using
--slim mode, it will create the following additional 3 tables:
planet_osm_nodes, planet_osm_ways and planet_osm_rels.
- Runs a parser on the input file and processes the nodes, ways and relations.
- If a node has a tag declared in the style file then it is added to planet_osm_point. If it has no such tag then the position is noted, but not added to the database.
- Ways are read in and converted into
WKT
geometries by using the
positions of the nodes read in earlier. If the tags on the way are listed in
the style file then the way will be written into the line or roads tables.
- If the way has one or more tags marked as "polygon" and forms a closed ring then it will be added to the planet_osm_polygon table.
- The relations are parsed. Osm2pgsql has special handling for a limited number of types: multipolygon, route and boundary. The code will build the appropriate geometries by referencing the members and outputting these into the database.
- Indexes are added to speed up the queries by Mapnik.
Geographic projections
The downloadable OpenStreetmap data contain the geographical latitude and
longitude of nodes (which are points out of which all else is composed).
When importing the data into a database, osm2pgsql already performs a map
projection. Presumably this serves to prevent duplication of the projection
computation later on, as most node coordinates will be needed multiple times
when rendering different zoom levels of the same region. The projection used
by default is the Pseudo-Mercator projection, so called because it is slightly
inconsistent and therefore can distort angles very slightly. It is universally
used by web map services and also called
Web Mercator
and spherical
Mercator. It can also be explicitly selected with the -m option.
Two other projections can be selected with simple command-line switches. The
first, -l, is not an actual projection, but stores the latitude and
longitude themselves. This may be helpful if you want to query the database
yourself using the geocoordinates. The other projection with a dedicated
command-line option, WGS84 Mercator (-M), is considered obsolete. (The
WGS84
, in case you wondered, stands for
today's standard ellipsoid used to approximate the earth's surface.)
In addition, osm2pgsql can create the database using any projection known
to the proj program that converts plain-text coordinate files between
projections. (It is an indirect dependency of osm2pgsql, so it has to be
installed.) The proj package contains a list of projections standardised
by the EPSG
in /usr/share/proj/epsg.
The osm2pgsql option -E allows to choose any of them by number. The
projections with dedicated options can also be represented as EPSG projections:
Projection | osm2pgsql option | EPSG index | Spatial Reference System (SRS) string for proj (and Mapnik and others using it) |
Latitude / longitude | -l | 4326 | +proj=longlat +datum=WGS84 +no_defs |
Pseudo-Mercator | -m (default) | 3857 (900913, 3785) | +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs +over |
WGS84 Mercator | -M | 3395 | +proj=merc +datum=WGS84 +k=1.0 +units=m +over +no_defs |
In order to do manual conversions with proj, you need not enter the whole
SRS string. Because proj already has the list of EPSG projections, you can
use the option +init=epsg:index instead.
The default OpenStreetmap rendering style for Mapnik assumes the default
(Pseudo-Mercator) projection, so you probably want to stick with that unless
you have special needs.
All options
osm2pgsql follows the usual GNU command line syntax, with long
options starting with two dashes (`-').
Mode of operation options
- -a|--append
- Add the OSM file into the database without removing existing data. This can
reputedly
lead to problems if objects in the new data are already in the database, such
as when a region is extended by an adjacent one. The referenced post hints
that this can be fixed by removing a uniqueness constraint of the database.
- -c|--create
- Remove existing data from the database. This is the default if
--append is not specified.
- --flat-nodes /path/to/nodes.cache
- The flat-nodes mode is a separate method to store slim mode node
information on disk. Instead of storing this information in the
main PostgreSQL database, this mode creates its own separate
custom database to store the information. As this custom database has
application level knowledge about the data to store and is not general
purpose, it can store the data much more efficiently. Storing the node
information for the full planet
requires about 100 GB in PostgreSQL, the same data is stored in
only ~20 GB (Oct 2013) using the flat-nodes mode. This can also increase the
speed of applying diff files. This option activates the flat-nodes mode and
specifies the location of the database file. It is a single large > 16GB
file. This mode is only recommended for
full planet imports as it doesn't work well with small extracts.
The default is disabled.
- -s|--slim
- Store temporary data in the database. Without this mode, all
temporary data is stored in RAM and if you do not have enough
the import will not work successfully. With slim mode, you
should be able to import the data even on a system with limited
RAM, although if you do not have enough RAM to cache at least
all of the nodes, the time to import the data will likely be
greatly increased. You must use slim mode for planet imports on 32-bit
systems, since there are too many nodes to store in RAM otherwise.
This switch is also required for later updates with --append.
- --drop
- Drop the slim mode tables from the database once the import is
complete. This can greatly reduce the size of the database, as
the slim mode tables typically are the same size, if not
slightly bigger than the main tables. It does not, however,
reduce the maximum spike of disk usage during import. It can
furthermore increase the import speed, as no indices need to be
created for the slim mode tables, which (depending on hardware)
can nearly halve import time. Slim mode tables however have to
be persistent if you want to be able to update your database, as
these tables are needed for diff processing.
- -r|--input-reader format
- Select input format reader. Does not normally need to be set, as the type
of input file is recognised automatically (by extension or by content?).
Available choices are libxml2 and primitive for XML
files, o5m for o5m
formatted
files and pbf for protobuf encoded files
(OSM Wiki
,
library docs
).
Not all choices may be available on all systems, as they can be disabled at
compile time.
- -S|--style /path/to/style
- Location of the osm2pgsql style file. This specifies which tags
from the data get imported into database columns and which tags
are dropped. Defaults to /usr/share/osm2pgsql/default.style.
- -o|--output backend
- Specifies the output back-end or database schema to use. Currently osm2pgsql
supports pgsql, gazetteer and null. pgsql is the default
output back-end that creats a PostGIS database schema for rendering with
Mapnik
. gazetteer is creates a
PostGIS database schema optimized for geocoding by
Nominatim
. null does not
write any output and is only useful for testing.
- -u|--utf8-sanitize
- Obsolete option to repair bad UTF-8 input data from before August 2007. Adds
about 10% overhead.
Projection options
- -l|--latlong
- Store data in degrees of latitude and longitude. This is incompatible with the
standard Mapnik style. But it could allow creating maps in custom projections,
including at or near the poles, where the Mercator projection does not extend.
It might also be an advantage if you want to query the resulting database
manually or with custom programs based on latitude and longitude.
- -m|--merc
- Store data in Pseudo-Mercator
projection. This is a variant of the
Mercator
projection that is
slightly
inconsistent mathematically and as a consequence distorts angles slightly but
is in wide use on the web because it is easy to calculate. See
here
for a discussion. This is the default.
- -M|--oldmerc
- Store data in WGS84 World
Mercator
projection. This
is considered deprecated.
- -E|--proj num
- Store data in EPSG
projection EPSG:num.
See above.
Database access options
- -d|--database name
- The name of the PostgreSQL database to connect to (default: gis).
- -p|--prefix prefix_string
- Prefix for table names (default: planet_osm).
- -U|--username name
- PostgreSQL database user name.
- -W|--password
- Force password prompt. Otherwise specify password in PGPASS environment
variable.
- -H|--host hostname
- Database server hostname or UNIX-domain socket location. Seems to default to
local host.
- -P|--port num
- Database server port. Defaults to 5432, PostgreSQL's default port.
Options related to an additional hstore-type
column for tags
These option requires that the hstore extension has been enabled in the
database (see below).
- -k|--hstore
- Add tags without column to an additional hstore (key/value) column in
PostgreSQL tables.
- -j|--hstore-all
- Add all tags to an additional hstore (key/value) column in PostgreSQL tables.
- -z|--hstore-column key_name
- Add an additional hstore (key/value) column containing all tags
that start with the specified string, e.g. --hstore-column
"name:" will produce an extra hstore column that contains all
name:xx tags
- --hstore-match-only
- Only keep objects that have a value in one of the columns (normal action with
--hstore is to keep all objects).
- --hstore-add-index
- Create indices for the hstore column during import.
Performance options
- -C|--cache num
- Only for slim mode: Use up to num MB of RAM for caching nodes. Giving
osm2pgsql sufficient cache to store all imported nodes typically greatly
increases the speed of the import. Each cached node requires 8 bytes of cache,
plus about 10% to 30% overhead. For a current OSM full planet import with its ~
1.9 billion nodes, a good value would be 17000 if you have enough RAM. If you
don't have enough RAM, it is likely beneficial to give osm2pgsql close to the
full available amount of RAM. Without slim mode, there is also a "way" cache
which takes up about as much space as the node cache does, but is not charged
against -C. [Remark by VS: This may be code for osm2pgsql using more
than twice as much RAM as given in the -C option, which seems to be
confirmed by experience.] num defaults to 800.
- --cache-strategy strategy
- There are a number of different modes in which osm2pgsql can organize its node
cache in RAM. These are optimized for different assumptions of the data and the
hardware resources available. Currently available strategies are dense,
chunk, sparse and optimized. dense assumes that the node IDs
are densely packed, i.e. only a few IDs in a range are missing. For extracts
this is usually not the case, making the cache very inefficient and wasteful of
RAM. chunk is optimised for non-contiguous memory allocation and is the
default on 32-bit systems. sparse assumes node IDs in the data are not
densely packed, greatly increasing caching efficiency in these cases. If node
IDs are densely packed, like in the full planet, this strategy has a higher
overhead for indexing the cache. optimized uses both dense and sparse
strategies for different ranges of the ID space. On a block by block basis it
tries to determine if it is more effective to store the block of IDs in sparse
or dense mode. This may use twice as much virtual memory, but no more physical
memory. It is the default on 64-bit machines.
- --number-processes num
- Specifies the number of parallel processes used for certain
operations. If disks are fast enough e.g. if you have an SSD,
then this can greatly increase speed of the "going over pending
ways" and "going over pending relations" stages on a multi-core
server. [Remark by VS: It is not clear (and of course not documented) how much
using more processors affects RAM usage, so you may have to reduce the
parameter of the -C option.]
- -I|--disable-parallel-indexing
- By default osm2pgsql initiates the index building on all tables
in parallel to increase performance. This can be disadvantageous
on slow disks, or if you don't have enough RAM for PostgreSQL to
perform up to 7 parallel index building processes (e.g. because
maintenance_work_mem is set high).
- -i|--tablespace-index tablespacename
- Store all indices in a separate
PostgreSQL
tablespace
named by this parameter. This allows to e.g. store the indices
on faster storage like SSDs. The following options allow more fine-grained
control:
- --tablespace-main-data tablespacename
- Store the data tables (non slim) in the given tablespace.
- --tablespace-main-index tablespacename
- Store the indices of the main tables (non slim) in the given tablespace.
- --tablespace-slim-data tablespacename
- Store the slim mode tables in the given tablespace.
- --tablespace-slim-index tablespacename
- Store the indices of the slim mode tables in the given tablespace.
- --unlogged
- Use PostgreSQL's unlogged tables for storing data. This requires PostgreSQL 9.1
or above. Data written to unlogged tables is not written to PostgreSQL's
write-ahead log, which makes them considerably faster than ordinary tables.
However, they are not crash-safe: an unlogged table is automatically
truncated after a crash or unclean shutdown.
Informational options
- -h|--help
- Help information. With -v, displays complete option list.
- -v|--verbose
- Verbose output.
Advanced options
- -b|--bbox minlon,minlat,maxlon,maxlat
- Apply a bounding box filter on the imported data, e.g. --bbox
-0.5,51.25,0.5,51.75. The general-purpose converter
Osmosis
is said to be more
efficient at extracting partial datasets than this option. In my experience,
this option cuts off ways at the boundary and removes relations that are not
completely within the region, so it may be advantageous to choose a box a bit
larger than the map region you want to render.
- -e|--expire-tiles [min_zoom-]max-zoom
- Create a tile expiry list.
- -o|--expire-output /path/to/expire.list
- Output file name for expired tiles list.
- -G|--multi-geometry
- Generate multi-geometry features in PostgreSQL tables.
Normally osm2pgsql splits multi-part geometries into separate
database rows per part. A single OSM id can therefore have several rows. With
this option, PostgreSQL instead generates multi-geometry features in the
PostgreSQL tables. Multi-geometry objects are a PostGIS feature representing
collections of geometrical objects that can represent OSM relations combining
multiple boundaries to an area (e.g. with holes).
- --tag-transform-script /path/to/script
- Specify a lua script to handle tag filtering and normalisation. The script
contains callback functions for nodes, ways and relations, which each take a
set of tags and returns a transformed, filtered set of tags which are then
written to the database.
- -x|--extra-attributes
- Include attributes for each object in the database. This includes the
user name and user ID of the last contributor and the time stamp and version of
the latest revision. Note: this option also requires additional entries in
your style file.
- -K|--keep-coastlines
- Keep coastline data rather than filtering it out. By default
"natural=coastline" tagged data will be discarded based on the
assumption that post-processed Coastline Checker shape files
will be used.
- --exclude-invalid-polygon
- OpenStreetMap data is defined in terms of nodes,
ways and relations and not in terms of actual geometric features.
osm2pgsql therefore tries to build postgis geometries out of this data
representation. However not all ways and relations correspond to valid PostGIS
geometries (e.g. self intersecting polygons). By default osm2pgsql tries to
automatically fix these geometries using ST_Buffer(0) around the invalid
polygons. With this option, invalid polygons are instead simply dropped
from the database.
Usage examples
Basic usage:
osm2pgsql -s -U postgres -d nameofdatabase /file/path/toosm/fileorpbf/name.osm
In the example above, postgres is the user of the database, the user
enabled slim mode (generally recommended). The above sample is feasible for
someone looking to export a city's OSM data into a PostGIS database. For more
advanced and larger datasets, read the Optimization section.
Setting up the database
Before running osm2pgsql, you have to create a PostgreSQL user and a
database with the
PostGIS functions enabled. This requires access as the
database administrator, normally the postgres user.
PostgreSQL 9.1 and PostGIS 2.0 or later are strongly suggested
for databases in production. It is generally best to run the
latest released versions if possible. PostgreSQL 8.4 and PostGIS 1.5
will work but are substantially slower. Additionally, PostGIS 2.0
contains enhancements that increase reliability as well as add new
features that style sheet authors can use.
The default name for this database is gis but this may
be changed by using osm2pgsql's --database option.
If the <username> matches the unix user id running the import
and rendering then this allows the PostgreSQL 'ident sameuser'
authentication to be used which avoids the need to enter a
password when accessing the database. This is setup by default
on many Unix installs but does not work on Windows (due to the
lack of unix sockets).
Some example commands are given below but you may also want to look at
this wiki page
.
sudo -u postgres createuser <username>
sudo -u postgres createdb -E UTF8 -O <username> <dbname>
To add the PostGIS extensions, you will have to run an SQL script that comes
with PostGIS. The version numbers in the path will vary.
sudo -u postgres psql -d <dbname> -f /usr/share/postgresql/contrib/postgis-2.1/postgis.sql
Next we need to give the <username> access to update the postgis
meta-data tables:
sudo -u postgres psql -d <dbname> -c "ALTER TABLE spatial_ref_sys OWNER TO <username>;"
The 900913 spatial reference system is not normally included with PostGIS. To
add it you should run the SQL script that comes with osm2pgsql:
sudo -u postgres psql -d <dbname> -U <username> -f /usr/share/osm2pgsql/900913.sql
If you want to use hstore support then you will also need to enable the
PostgreSQL hstore extension:
sudo -u postgres psql -d <dbname> -c "CREATE EXTENSION hstore;"
On PostgreSQL versions before 9.1 you need to install the hstore-new extension
instead by running an SQL script:
sudo -u postgres psql -d <dbname> -f /usr/share/postgresql/8.4/contrib/hstore-new.sql
Optimising database configuration for osm2pgsql imports
For an efficient operation of PostgreSQL you will need to tune the config
parameters of PostgreSQL from its default values. These are set in the
config file, which is located in the directory storing the database data, by
default /var/lib/postgres/data/postgresql.conf.
The values you need to set will depend on the hardware you have available, but
you will likely need to increase the values for the following parameters:
shared_buffers, checkpoint_segments, work_mem,
maintenance_work_mem, and effective_cache_size. Suggestions for values
can be found in the section "Configure the PostGIS database" on
this
page
.
See also this
Talk by
Frederik Ramm: Optimising the Mapnik/osm2pgsql Rendering Toolchain 2.0
at SOTM
2012.
Authors
osm2pgsql was written by Jon Burgess, Artem Pavlenko, and other
OpenStreetMap project members.
The manual page was written by Andreas Putzo for the Debian project, and amended by OpenStreetMap authors.
Development
Please report bugs to the
github tracker
.
Any questions should be directed at the
OSM dev list
.
References
Documentation of related programs:
Licensed under the Creative Commons Attribution-Share Alike 3.0 Germany License
TOS / Impressum