Importing OSM nodes into Spatialite

Second step of my SoTM10 pet project: creating a searchable database with the points. What a fantastic opportunity to learn Spatialite.

Learning Spatialite is easy. For example, you can use the two tutorials with catchy titles that assume your best wish in life is to create databases out of shapefiles using a pre-built, i386-only executable GUI binary downloaded over an insecure HTTP connection.

To be fair, the second of those tutorials is called "An almost Idiot's Guide", thus expliciting the requirement of being an almost idiot in order to happily acquire and run software in that way.

Alternatively, you can use A quick tutorial to SpatiaLite which is so quick it has examples that lead you to write SQL queries that trigger all sorts of vague exceptions at insert time. But at least it brought me a long way forward, at which point I could just cross reference things with PostGIS documentation to find out the right way of doing things.

So, here's the importer script, which will probably become my reference example for how to get started with Spatialite, and how to use Spatialite from Python:

#!/usr/bin/python

#
# poiimport - import nodes from OSM into a spatialite DB
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#

import xml.sax
import xml.sax.handler
from pysqlite2 import dbapi2 as sqlite
import simplejson
import sys
import os

class OSMPOIReader(xml.sax.handler.ContentHandler):
    '''
    Filter SAX events in a OSM XML file to keep only nodes with names
    '''
    def __init__(self, consumer):
        self.consumer = consumer

    def startElement(self, name, attrs):
        if name == "node":
            self.attrs = attrs
            self.tags = dict()
        elif name == "tag":
            self.tags[attrs["k"]] = attrs["v"]

    def endElement(self, name):
        if name == "node":
            lat = float(self.attrs["lat"])
            lon = float(self.attrs["lon"])
            id = int(self.attrs["id"])
            #dt = parse(self.attrs["timestamp"])
            uid = self.attrs.get("uid", None)
            uid = int(uid) if uid is not None else None
            user = self.attrs.get("user", None)

            self.consumer(lat, lon, id, self.tags, user=user, uid=uid)

class Importer(object):
    '''
    Create the spatialite database and populate it
    '''
    TAG_WHITELIST = set(["amenity", "shop", "tourism", "place"])

    def __init__(self, filename):
        self.db = sqlite.connect(filename)
        self.db.enable_load_extension(True)
        self.db.execute("SELECT load_extension('libspatialite.so')")
        self.db.execute("SELECT InitSpatialMetaData()")
        self.db.execute("INSERT INTO spatial_ref_sys (srid, auth_name, auth_srid,"
                        " ref_sys_name, proj4text) VALUES (4326, 'epsg', 4326,"
                        " 'WGS 84', '+proj=longlat +ellps=WGS84 +datum=WGS84"
                        " +no_defs')")
        self.db.execute("CREATE TABLE poi (id int not null unique primary key,"
                        " name char, data text)")
        self.db.execute("SELECT AddGeometryColumn('poi', 'geom', 4326, 'POINT', 2)")
        self.db.execute("SELECT CreateSpatialIndex('poi', 'geom')")
        self.db.execute("CREATE TABLE tag (id integer primary key autoincrement,"
                        " name char, value char)")
        self.db.execute("CREATE UNIQUE INDEX tagidx ON tag (name, value)")
        self.db.execute("CREATE TABLE poitag (poi int not null, tag int not null)")
        self.db.execute("CREATE UNIQUE INDEX poitagidx ON poitag (poi, tag)")
        self.tagid_cache = dict()

    def tagid(self, k, v):
        key = (k, v)
        res = self.tagid_cache.get(key, None)
        if res is None:
            c = self.db.cursor()
            c.execute("SELECT id FROM tag WHERE name=? AND value=?", key)
            for row in c:
                self.tagid_cache[key] = row[0]
                return row[0]
            self.db.execute("INSERT INTO tag (id, name, value) VALUES (NULL, ?, ?)", key)
            c.execute("SELECT last_insert_rowid()")
            for row in c:
                res = row[0]
            self.tagid_cache[key] = res
        return res

    def __call__(self, lat, lon, id, tags, user=None, uid=None):
        # Acquire tag IDs
        tagids = []
        for k, v in tags.iteritems():
            if k not in self.TAG_WHITELIST: continue
            for val in v.split(";"):
                tagids.append(self.tagid(k, val))

        # Skip elements that don't have the tags we want
        if not tagids: return

        geom = "POINT(%f %f)" % (lon, lat)
        self.db.execute("INSERT INTO poi (id, geom, name, data)"
                        "     VALUES (?, GeomFromText(?, 4326), ?, ?)",
                (id, geom, tags["name"], simplejson.dumps(tags)))

        for tid in tagids:
            self.db.execute("INSERT INTO poitag (poi, tag) VALUES (?, ?)", (id, tid))


    def done(self):
        self.db.commit()

# Get the output file name
filename = sys.argv[1]

# Ensure we start from scratch
if os.path.exists(filename):
    print >>sys.stderr, filename, "already exists"
    sys.exit(1)

# Import
parser = xml.sax.make_parser()
importer = Importer(filename)
handler = OSMPOIReader(importer)
parser.setContentHandler(handler)
parser.parse(sys.stdin)
importer.done()

Let's run it:

$ ./poiimport pois.db < pois.osm
SpatiaLite version ..: 2.4.0    Supported Extensions:
        - 'VirtualShape'        [direct Shapefile access]
        - 'VirtualDbf'          [direct Dbf access]
        - 'VirtualText'         [direct CSV/TXT access]
        - 'VirtualNetwork'      [Dijkstra shortest path]
        - 'RTree'               [Spatial Index - R*Tree]
        - 'MbrCache'            [Spatial Index - MBR cache]
        - 'VirtualFDO'          [FDO-OGR interoperability]
        - 'SpatiaLite'          [Spatial SQL - OGC]
PROJ.4 Rel. 4.7.1, 23 September 2009
GEOS version 3.2.0-CAPI-1.6.0
$ ls -l --si pois*
-rw-r--r-- 1 enrico enrico 17M Jul  9 23:44 pois.db
-rw-r--r-- 1 enrico enrico 37M Jul  9 16:20 pois.osm
$ spatialite pois.db
SpatiaLite version ..: 2.4.0    Supported Extensions:
        - 'VirtualShape'        [direct Shapefile access]
        - 'VirtualDbf'          [direct DBF access]
        - 'VirtualText'         [direct CSV/TXT access]
        - 'VirtualNetwork'      [Dijkstra shortest path]
        - 'RTree'               [Spatial Index - R*Tree]
        - 'MbrCache'            [Spatial Index - MBR cache]
        - 'VirtualFDO'          [FDO-OGR interoperability]
        - 'SpatiaLite'          [Spatial SQL - OGC]
PROJ.4 version ......: Rel. 4.7.1, 23 September 2009
GEOS version ........: 3.2.0-CAPI-1.6.0
SQLite version ......: 3.6.23.1
Enter ".help" for instructions
spatialite> select id from tag where name="amenity" and value="fountain";
24
spatialite> SELECT poi.name, poi.data, X(poi.geom), Y(poi.geom) FROM poi, poitag WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (xmin >= 2.56 AND xmax <= 2.90 AND ymin >= 41.84 AND ymax <= 42.00) ) AND poitag.tag = 24 AND poitag.poi = poi.id;
Font Picant de la Cellera|{"amenity": "fountain", "name": "Font Picant de la Cellera"}|2.616045|41.952449
Font de Can Pla|{"amenity": "fountain", "name": "Font de Can Pla"}|2.622354|41.974724
Font de Can Ribes|{"amenity": "fountain", "name": "Font de Can Ribes"}|2.62311|41.979193

It's impressive: I've got all sort of useful information for the whole of Spain in just 17Mb!

Let's put it to practice: I'm thirsty, is there any water fountain nearby?

spatialite> SELECT count(1) FROM poi, poitag WHERE poi.rowid IN (SELECT pkid FROM idx_poi_geom WHERE (xmin >= 2.80 AND xmax <= 2.85 AND ymin >= 41.97 AND ymax <= 42.00) ) AND poitag.tag = 24 AND poitag.poi = poi.id;
0

Ouch! No water fountains mapped in Girona... yet.

Problem 2 solved: now on to the next step, [[trying to show the results in some usable way||osm-search-nodes]].