Skip to content

Mapzen Vector Tile Service

peitili edited this page Feb 17, 2022 · 111 revisions

Tilezen Vector Tiles

Installation Guide

1. Install

Install dependencies

# install misc tools
sudo apt-get install git unzip python-yaml
# install postgres / postgis
sudo apt-get install postgresql postgresql-contrib postgis postgresql-9.5-postgis-2.2
# Install jinja2
sudo apt-get install python-jinja2
# install tilezen fork of osm2pgsql
sudo apt-add-repository ppa:tilezen
sudo apt-get update
sudo apt-get install osm2pgsql

NOTE: PostgreSQL 9.5+ is required for some jsonb functions

2. Install vector-datasource

Install dependencies

# dev packages for building
sudo apt-get install build-essential autoconf libtool pkg-config
# dev packages for python and dependencies
sudo apt-get install python-dev python-virtualenv libgeos-dev libpq-dev python-pip python-pil libxml2-dev libxslt-dev

Download mapzen/vector-datasource:

This repo contains the supplementary data to load and the queries that are issued to the database for each layer.

git clone https://github.com/mapzen/vector-datasource.git
cd vector-datasource
# now checkout the latest tagged release (see warning below), for example:
# git checkout v1.4.0

WARNING: If you are standing up your own instance of the Tilezen stack (rather than doing development), it's best practice to checkout the latest tagged release rather than running off master. At the time of this writing that is v1.4.0, so you'd git checkout v1.4.0 to be on the same code base as the production Mapzen Vector Tile service. Similarly, you'd need to pin yourself against the related project's versions, e.g.: Requires: tileserver v2.1.0 and tilequeue v1.8.0 and mapbox-vector-tile v1.2.0 mentioned in the release notes in the sections below.

Setup a virtualenv

There are numerous ways to deploy python packages. virtualenv is used here, but other methods should work

At the moment, only Python 2.7.x is supported, so make sure you have a Python 2.7 version installed

# Create a virtualenv called 'env'. This can be named anything, and can be in the tileserver directory or anywhere on your system.
virtualenv env --python python2.7
source env/bin/activate

Install tileserver and tilequeue

pip install -U -r requirements.txt
python setup.py develop

3. Load data

Set up database

If you are setting up PostgreSQL for a single-user install, you may want to create a new database user (i.e: whoami). You can skip this next step if you already have your database roles established.

sudo -u postgres psql
CREATE USER [your username] SUPERUSER PASSWORD 'your password here';

First, create the database. We use the database name 'osm' here, but you can use any, e.g. 'gis'.

createdb -E UTF-8 -T template0 osm
psql -d osm -c 'CREATE EXTENSION postgis; CREATE EXTENSION hstore;'

Next, download the OpenStreetMap source data. You can use any PBF, but we use a Mapzen metro extract here to get started.

wget https://s3.amazonaws.com/metro-extracts.mapzen.com/new-york_new-york.osm.pbf

Load PBF data

osm2pgsql --slim --hstore-all -C 1024 -S osm2pgsql.style -d osm path/to/osm.pbf

Vector-datasource uses the slim tables, so --slim is required and the --drop option cannot be used.

You may also need to pass in other options, like -U or -W, to ensure that you connect to the database with a user that has the appropriate permissions. For more details, visit the osm2pgsql wiki page and the postgresql docs for creating a user. You may need to check your connection permissions too, which can be found in the pg_hba.conf file.

Note that if you import the planet, the process can take several days, and can consume over 1TB (2TB is preferred cause need more space to prepare the database) of disk space at the time of writing. The OSM planet gets bigger every week, so it might be necessary to do a few trial runs to find out what it takes today. Have a look at some our own performance tuning docs or those from Switch2OSM.org for recommendations.

Load additional data and update database

The vector-datasource/data directory contains scripts to load additional data and update the database to match our expected schema.

The additional data included with shapefiles.tar.gz is from a combination of sources, the full list is in data/assets.yaml, including a pointer to the latest cached datestamp. Everything bundled is open data, although some of it is manually generated or curated. The data comes from:

  • openstreetmap.org for static land/water polygons, and is under the same ODbL as the primary OpenStreetMap data it derives from.
  • naturalearthdata.com is sourced for themes and layers used at low zooms, and is under public domain..
  • admin_areas is based on OSM data and available under the ODbL. It gets generated by manually running Valhalla's mjolnir tool occasionally.
  • buffered_land is based on Natural Earth data and available under the public domain. It is manually curated by Tilezen and is a slightly buffered land polygon to clip admin boundaries against so that we don't get admin boundaries going off into the sea.
To import the data:
# Go to data directory, assumes you already changed directories into vector-datasource (above)
cd data
# Build the Makefiles that we'll use in the next steps
python bootstrap.py
# Download external data
make -f Makefile-import-data
# Import shapefiles into postgis
./import-shapefiles.sh | psql -Xq -d osm
# Add indexes and any required database updates
./perform-sql-updates.sh -d osm
# Clean up local shape files
make -f Makefile-import-data clean

NOTE that you may have to pass in a username/password to these scripts for them to connect to the database. Anywhere -d osm is specified, you may need to also pass in -U <username> and perhaps set a password too. For example, if my username is "foo" and my password is "bar", here's what I would do:

export PGPASSWORD=bar
./import-shapefiles.sh | psql -d osm -U foo
./perform-sql-updates.sh -d osm -U foo
To prepare the data:

The shapefiles.tar.gz is generated by running:

cd data
python bootstrap.py
make -f Makefile-prepare-data

This can take a very long time to download all the individual pieces! To speed up basic database setup we cache the results on S3 and indicate in the latest cached datestamp in the assets.yaml.

4. Serve vector tiles

  • Use tileserver for serving single tiles with Postgres.
  • Use tilequeue for caching a local region with Postgres... and with RAWR tiles to cache the whole world.

Configure

cd ../tileserver
cp config.yaml.sample config.yaml
# update configuration as necessary
edit config.yaml

Load Who's on First neighbourhood data

Finally, neighbourhood data is required to be loaded from Who's on First.

There are two ways of doing this; the newer way, recommended for any version of vector-datasource >= 1.8.0 (and pre-release versions of master after 2019-04-16), is to load the wof_snapshot.sql file distributed along with the assets shapefiles.tar.gz that we downloaded in a previous step. If you have a data/wof_snapshot.sql, then you should try this first:

psql -d osm -f data/wof_snapshot.sql

If you are running an older version of vector-datasource, try using the older pgdump of WOF data. Note that this data is very old (from August 2017?) and we'd strongly recommend using a more recent version of vector-datasource and the method above, if you can.

wget https://s3.amazonaws.com/nextzen-tile-assets/wof/wof-neighbourhoods.pgdump
pg_restore --clean -d osm -O wof-neighbourhoods.pgdump

This will load a snapshot of the neighbourhoods data.

You can periodically update the Who's On First neighbourhoods data by running the following:

wget https://raw.githubusercontent.com/mapzen/tilequeue/master/config.yaml.sample -O tilequeue-config.yaml
wget https://raw.githubusercontent.com/mapzen/tilequeue/master/logging.conf.sample
tilequeue wof-process-neighbourhoods --config tilequeue-config.yaml

Load contours data

Load contours data for contours layer:

wget https://www.dropbox.com/s/xh4gjdox9lgmxzh/contours.zip?dl=1
unzip contours.zip?dl=1
psql -d osm -f contours.sql

Run

The tile server can be run in one of two ways:

  • Directly, as a single-threaded Python process. This is better if you want to debug or step through code, but will not be able to use all the cores of your computer.
  • As a WSGI application through a multi-threaded (or multi-process) WSGI server such as gunicorn. This is better if you want to make best use of your computer by handling requests concurrently. However, it can complicate debugging or stepping through code.

To run tileserver using gunicorn, we recommend using the same number of workers as CPU cores (the -w argument, here for example 4):

gunicorn -w 4 "tileserver:wsgi_server('config.yaml')"

To run tileserver stand-alone for debugging:

python tileserver/__init__.py

5. Global build

Need to build tiles for the whole world? There's a new way to do that using tilequeue and RAWR tiles instead of Postgres:

Contribute!

You're ready to help us improve the Tilezen project! Please read our CONTRIBUTING.md document to understand how to contribute code.

Tests

Need to confirm your configuration? A test suite is included which can be run against a tile server.

Sample test URLs

Keeping up to date with osm data

OpenStreetMap data is constantly changing, and OpenStreetMap produces diffs for consumers to keep up to date. Mapzen uses osmosis and osm2pgsql to pull down the latest changes and apply them.

Generally speaking, tile service providers make the trade-off to prefer generating stale tiles over serving the request on demand more slowly. Mapzen also makes this trade-off.

A lot of factors go into choosing how to support a system that remains up to date. For example, existing infrastructure, tolerance for request latency and stale tiles, expected number of users, and cost can all play roles in coming up with a strategy for remaining current with OpenStreetMap changes.

Tracking releases

If you are on a particular release and would like to migrate your database to a newer one, you'll want to run the appropriate migrations. Database migrations are required when the database queries & functions that select what map content should be included in tiles change.

Note that the migration for each release in between will need to be run individually. For example, if you are on v0.5.0 and would like to upgrade to v0.7.0, you'll want to run the v0.6.0 and v0.7.0 migrations (we don't provide "combo" migrations).

# in this example, we're on v0.5.0 - checkout the migration to v0.6.0
git checkout v0.6.0
bash data/migrations/run_migrations.sh -d osm

# now our database reflects v0.6.0 - checkout the migration to v0.7.0
git checkout v0.7.0
bash data/migrations/run_migrations.sh -d osm

# now our database reflects v0.7.0