Spatial4j supports reading and writing Shapes to and from strings in the following formats:
- Well Known Text (WKT)
- GeoJSON
- Polyshape
The classes ShapeReader
and ShapeWriter
are the interfaces for encoding
and decoding shapes respectively. They can be obtained from a SpatialContext
instance.
SpatialContext ctx = ...;
ShapeReader shpReader = ctx.getFormats().getReader(ShapeIO.WKT);
ShapeWriter shpWriter = ctx.getFormats().getWriter(ShapeIO.WKT);
Reading and writing polygons requires a reader/writer obtained from JtsSpatialContext
.
Well-Known-Text (WKT) is a simple to read text based format for representing spatial objects. It is defined by the OGC Simple Feature Specification. Wikipedia's page on it is pretty decent.
The following table shows the various shape types encoded as WKT:
Shape | WKT |
---|---|
Point | POINT(1 2) |
Rectangle | ENVELOPE(1, 2, 4, 3) (minX, maxX, maxY, minY) |
Circle | BUFFER(POINT(-10 30), 5.2) |
LineString | LINESTRING(1 2, 3 4) |
Buffered L/S | BUFFER(LINESTRING(1 2, 3 4), 0.5) |
Polygon | POLYGON ((1 1, 2 1, 2 2, 1 2, 1 1)) |
ShapeCollection | GEOMETRYCOLLECTION(POINT(1 2),LINESTRING(1 2, 3 4)) |
The ENVELOPE
keyword/syntax was borrowed from OGC's Common Query Language (CQL) standard, a superset of WKT. Note the odd argument order. It's not widely used. Alternatively a rectangular polygon can be used, which will be recognized as-such when
it is read and turned into a Rectangle instance.
The BUFFER
keyword is a Spatial4j extension to the WKT spec. It's used to produce a circle by buffering a point, and to buffer shapes generally. The 2nd argument is the buffer distance in degrees.
GeoJSON is a format for representing geographic objects using JavaScript Object Notation. It is defined by the open standard available at http://geojson.org.
The following list shows the various shape types encoded as GeoJSON:
-
Point
{ "type": "Point", "coordinates": [1, 2] }
-
Rectangle
{ "type": "Polygon", "coordinates": [ [[1,3], [1,4], [2,4], [2,3], [1,3]] ] }
-
Circle
{ "type": "Circle", "coordinates": [1, 2], "radius": 111.19508, "properties": { "radius_units": "km" } }
-
LineString
{ "type": "LineString", "coordinates": [[1, 2], [3, 4]] }
-
Buffered LineString
{ "type": "LineString", "coordinates": [[1, 2], [3, 4]], "buffer": 10 }
-
Polygon
{ "type": "Polygon", "coordinates": [ [[1, 1], [2, 1], [2, 2], [1, 2], [1, 1]] ] }
-
ShapeCollection
{ "type": "GeometryCollection", "geometries": [ { "type": "Point", "coordinates": [1, 2] }, { "type": "LineString", "coordinates": [ [1, 2], [3, 4] ] } ] }
The Polyshape format is an extension to the Polyline format from Google to include a wider diversity of shapes. The standard "Encoded Polyline Algorithm Format" algorithm offers a compact way to encode a list of lat/lon pairs into a simple ASCII string.
The following table shows the various shape types encoded as Polyshape:
Shape | Polyshape |
---|---|
Point | 0_ibE_seK |
Rectangle | 5_ibE_}hQ_ibE_ibE |
Circle | 4(_ibE)_ibE_seK |
LineString | 1_ibE_seK_seK_seK |
Buffered L/S | ``1(_c` |
Polygon | 2_ibE_ibE_ibE??_ibE~hbE??~hbE |
ShapeCollection | 0_ibE_seK 1_ibE_seK_seK_seK |
The Polyshape format uses a prefix key to denote the shape type:
Shape | Key |
---|---|
Point | 0 |
LineString | 1 |
Polygon | 2 |
MultiPoint | 3 |
Circle | 4 |
Rectangle | 5 |
Shape collections are represented as each of the individual shape encodings concatenated with a space.
The Polyshape format supports optional arguments to handle things like a Circle (with a radius) and
a BufferedLineString (with a buffer size). If the character immediately after the prefix is a '('
,
everything up to the next ')'
is considered an argument to the Shape.
A Polygon with interior rings is represented by appending the encoding of each ring prefixed with
a ')'
:
'2' + encode(exteriorRing) + ['(' + encode(interiorRing)]*
- The format is optimized to store lat/lon points, very big or very small values may get lost in the rounding
- All values are rounded to: Math.round(value * 1e5)
- In the JTS version, a homogeneous ShapeCollection will be read as a MultPoint, MultiLineString, or MultiPolygon
The following table shows a comparison among the encoded formats in terms of number of bytes in the final encoding. Note that this comparison may be somewhat misleading since the other encoding formats do not implicitly limit precision as Polyshape does. Percentages are calculated relative to the largest encoding.
GeoJSON | WKT | Binary | Polyshape | Shape |
---|---|---|---|---|
100% (42) | 38% (16) | 40% (17) | 20% (9) | POINT(100.1 0.1) |
100% (61) | 52% (33) | 67% (41) | 28% (17) | LINESTRING (100.1 0.1, 101.1 1.1) |
100% (96) | 33% (32) | 34% (33) | 18% (17) | ENVELOPE(100.1, 101.1, 1.1, 0.1) |
90% (158) | 69% (122) | %100 (177) | 33% (59) | POLYGON ((100.1 0.1, 101.1 0.1, 101.1 1.1, 100.1 1.1, 100.1 0.1), (100.2 0.2, 100.8 0.2, 100.8 0.8, 100.2 0.8, 100.2 0.2)) |
100% (169) | 51% (87) | %52 (88) | 21% (36) | GEOMETRYCOLLECTION(LINESTRING (100.1 0.1, 101.1 1.1),LINESTRING (102.1 2.1, 103.1 3.1)) |
100% (131) | 40% (53) | %31 (40) | 15% (20) | GEOMETRYCOLLECTION(POINT(100.1 0.1),POINT(101.1 1.1)) |
With this limited dataset it looks like the Polyshape format is on average ~25% as big as GeoJSON, ~50% as big as WKT, and ~40% as big as the binary encoding. The data used for this benchmark contains data with low precision numbers (100.1 vs 100.123456) so that may be a low estimate.