Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for polygon/linestring in results #823

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

red-fenix
Copy link

@red-fenix red-fenix commented Jul 11, 2024

At the moment, photon only returns the point of a location, and not the polygon (see #259). This PR will add the option to add the polygon (i.e. geometry) to the Elasticsearch Index and a API parameter polygon to return said polygon. If no polygon exists, the point is returned.

WARNING: This will increase the Elasticsearch Index size! (~575GB for a Planet import).

To enable: add the command line argument -use-geometry-column whilst importing and add &polygon=true to the API call.

@red-fenix red-fenix changed the title Add option for polygons in results Add option for polygon/linestring in results Jul 11, 2024
@lonvia
Copy link
Collaborator

lonvia commented Jul 14, 2024

I haven't done a full review yet but I do have some general thoughts on the implementation:

  • This really needs to be implemented for the OpenSearch version because the ES version of Photon is on its way out. Note that the OpenSearch variant does not use mappings.json. It defines its mapping in https://github.com/komoot/photon/blob/master/app/opensearch/src/main/java/de/komoot/photon/opensearch/IndexMapping.java.
  • As long as the geometry isn't used for lookup, indexes should be disabled on the new field to save a bit of disk space.
  • We already have a field extent which contains the bounding box. The geometry should replace this field, i.e. once full geometries are enabled, do not save extent and derive the extent from the geometry field when returning a result. We need to keep the centroid because it is not necessarily the geometric centroid of the geometry. (Note that extent was missing from the mapping specification so far. So likely it was in fact saved as a text field, which really is an oversight but can't really be fixed right now without creating an incompatible database version.)
  • This would be the second optional database feature after add support for structured queries (opensearch only) #815. Before we become too prolific with command-line arguments, I'd lean towards adding a single parameter -extra-db-features which takes a list of features (right now: structured, geometries).
  • I agree on the introduction of the polygon parameter but would prefer to disable it when the extended geometries are not available in the database. Otherwise there will be an endless stream of bug reports on photon.komoot.io, why the results return a point instead of polygon. You can save the state of the feature in the property table and load it from there on start, see add support for structured queries (opensearch only) #815 for an example on how to do it. This property also comes in handy during updates of the database. It would be useful to have exactly the same behaviour as on import then.
  • This needs some tests for the import.

Two other considerations come in mind but they are easily deferred to follow-up PRs:

  • Once we have full geometries, we'd want to use them for reverse lookup. See Discussion for PR to improve accuracy of reverse geocoding #357.
  • It might be worth to slightly simplify the geometries before importing them, or at least make that an option. Nominatim always keeps the original OSM geometries which sometimes can have a lot more support points than necessary. Simplification might help to further reduce database size.

@red-fenix
Copy link
Author

red-fenix commented Jul 17, 2024

I haven't done a full review yet but I do have some general thoughts on the implementation:

Thanks. I will update the PR with the changes in this file soon.

One question though about the 'Elasticsearch is on it's way out': I've been planning to update the Elastic client to the Java API so you can use an existing Elasticsearch cluster instead of the internal one (newer versions of Elasticsearch don't support the Transport client). Is my new PR still a good idea?

  • As long as the geometry isn't used for lookup, indexes should be disabled on the new field to save a bit of disk space.

Will take this along as well.

  • This would be the second optional database feature after add support for structured queries (opensearch only) #815. Before we become too prolific with command-line arguments, I'd lean towards adding a single parameter -extra-db-features which takes a list of features (right now: structured, geometries).

Agreed

  • I agree on the introduction of the polygon parameter but would prefer to disable it when the extended geometries are not available in the database. Otherwise there will be an endless stream of bug reports on photon.komoot.io, why the results return a point instead of polygon. You can save the state of the feature in the property table and load it from there on start, see add support for structured queries (opensearch only) #815 for an example on how to do it. This property also comes in handy during updates of the database. It would be useful to have exactly the same behavior as on import then.

OK. I will make the default to return the polygon when it's available in the index. The option 'polygon=false' will return the centroid instead.

  • This needs some tests for the import.

Will do.

Two other considerations come in mind but they are easily deferred to follow-up PRs:

I have to look into this issue.

  • It might be worth to slightly simplify the geometries before importing them, or at least make that an option. Nominatim always keeps the original OSM geometries which sometimes can have a lot more support points than necessary. Simplification might help to further reduce database size.

I will look into this.
Another thing related to this: when someone is searching for a street Nominatim (and thus Photon) returns a street in parts because they are separate OSM id's. I'm still looking for a method to merge multiple ways (i.e. linestrings) into 1 linestring to make sure the whole street is displayed instead of a part (example)

@lonvia
Copy link
Collaborator

lonvia commented Jul 21, 2024

One question though about the 'Elasticsearch is on it's way out': I've been planning to update the Elastic client to the Java API so you can use an existing Elasticsearch cluster instead of the internal one (newer versions of Elasticsearch don't support the Transport client). Is my new PR still a good idea?

We'll drop ES support completely and go with OpenSearch. Note that the OS version already supports an external OpenSearch cluster. The support is just somewhat rudimentary and HTTP-only.

@karussell
Copy link
Collaborator

Would this PR return a Multipolygon in case for queries like "hamburg"? See the nominatim response.

Currently photon returns a slightly confusing, but correct extent as only a single extent is allowed https://photon.komoot.io/api/?q=hamburg ... or maybe we add a new field extents in the response?

@red-fenix
Copy link
Author

Would this PR return a Multipolygon in case for queries like "hamburg"? See the nominatim response.

Currently photon returns a slightly confusing, but correct extent as only a single extent is allowed https://photon.komoot.io/api/?q=hamburg ... or maybe we add a new field extents in the response?

Yes, it would return a Polygon like Nominatim does.

@red-fenix
Copy link
Author

red-fenix commented Sep 5, 2024

I haven't done a full review yet but I do have some general thoughts on the implementation:

Done

  • I agree on the introduction of the polygon parameter but would prefer to disable it when the extended geometries are not available in the database. Otherwise there will be an endless stream of bug reports on photon.komoot.io, why the results return a point instead of polygon. You can save the state of the feature in the property table and load it from there on start, see add support for structured queries (opensearch only) #815 for an example on how to do it. This property also comes in handy during updates of the database. It would be useful to have exactly the same behaviour as on import then.

Done

  • This needs some tests for the import.

Done

I'm looking forward to your response :)

@lonvia
Copy link
Collaborator

lonvia commented Oct 29, 2024

Sorry for the long delay in responding.

I've decided to get a release out first before looking into this again. Because of this there is a minor merge conflict now regarding the dependencies. Also, the opensearch variant doesn't compile because setUpESWithPolygons() is only defined for the elasticsearch variant. Could you quickly fix both issues?

Copy link
Collaborator

@lonvia lonvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of more comments after trying this out. The OpenSearch version currently doesn't work at all. It already fails to import.

The output for the ElasticSearch version displays a wrong projection:

...
"geometry": {
  "coordinates":[9.5227103,47.1395576],
  "type":"Point",
  "crs":{"type":"name","properties":{"name":"EPSG:0"}}
}
...

Photon doesn't supply a CRS at all with the geometry. I'd leave it at that.

@red-fenix
Copy link
Author

Sorry for the long delay in responding.

I've decided to get a release out first before looking into this again. Because of this there is a minor merge conflict now regarding the dependencies. Also, the opensearch variant doesn't compile because setUpESWithPolygons() is only defined for the elasticsearch variant. Could you quickly fix both issues?

Will do this weekend.

@red-fenix
Copy link
Author

Couple of more comments after trying this out. The OpenSearch version currently doesn't work at all. It already fails to import.

I've made some adjustments and have been trying to get the OpenSearch version running however, it fails with an error:

Request failed: [mapper_parsing_exception] No handler for type [geo_shape] declared on field [geometry]

I've searching online, but can't seem to find the reason for OpenSearch clearly has geo_shape support.

In other news: I fixed the PR with your suggestions, but will perform some additional tests this weekend.

@lonvia
Copy link
Collaborator

lonvia commented Nov 18, 2024

Request failed: [mapper_parsing_exception] No handler for type [geo_shape] declared on field [geometry]

That's where I gave up my experiments with this branch as well. It's possible that the embedded version of OpenSearch that we are using is missing a dependency needed for geo_shape. Maybe try running against a stand-alone OpenSearch and see if it works then?

@otbutz
Copy link
Contributor

otbutz commented Nov 18, 2024

https://github.com/opensearch-project/geospatial ? Maybe it's enough to add the OpenSearch plugin as a dependency.

@red-fenix
Copy link
Author

red-fenix commented Dec 1, 2024

So, I've found out that there were multiple issues with my build:

  1. The Opensearch implementation is very different from the Elasticsearch implementation (ser/des vs parsing a Searchit), but I've made it work.

  2. The main problem is the opensearch-runner. It doesn't have all Opensearch plugins available and the geo plugin is disabled. I've been trying to add the Opensearch geospatial plugin as either a module or a plugin, but was unable to do so.

Therefore I switched to testcontainers and everything is working-ish. Unit tests for Opensearch work, but due to some small differences with Testcontainers, the Elasticsearch Unit tests don't work... yet! (It's mainly because testcontainer runs a Docker container with an Opensearch instance on random ports and the unit tests have to connect to said random port, although it's possible to expose the default ports).

Is there anyone who can help me add the Geo plugin to the opensearch-runner of would you be willing to switch to testcontainers and docker? If Photon switches to testcontainers, I can make some changes to the Elasticsearch unit tests.
Also, it's probably better to use the testcontainer docker with Photon when the local instance is used.

@otbutz
Copy link
Contributor

otbutz commented Dec 2, 2024

You could open an issue at https://github.com/codelibs/opensearch-runner. Either it's already working or it's a valid feature request.

@lonvia
Copy link
Collaborator

lonvia commented Dec 3, 2024

It needs to work with opensearch-runner because that is also what Photon uses in embedded mode.

From the code you cite it looks like you can supply your own module list by adding .moduleTypes(<comma-separated list of modules>) at

}).build(OpenSearchRunner.newConfigs().basePath(dataDirectory).clusterName(clusterName).numOfNode(1));

@lonvia
Copy link
Collaborator

lonvia commented Dec 4, 2024

With #855 merged you can now simply add the plugin to build.gradle and enable it by adding it to the list in

public static final String OPENSEARCH_MODULES =

@red-fenix
Copy link
Author

With #855 merged you can now simply add the plugin to build.gradle and enable it by adding it to the list in

public static final String OPENSEARCH_MODULES =

Thank you so far for your help. I've been trying to get it working, but failed so far. I've opened a issue with the Codelab Opensearch Runner.

@red-fenix
Copy link
Author

Opensearch has fixed it and I've updated the PR. Both Opensearch tests and Es Embedded are 100% working.

Copy link
Collaborator

@lonvia lonvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not completely finished with a detailed review but I added some comments.

I currently see two major issues that still need to be solved:

  • the OS variant is currently hard-coded towards polygons with the result that streets cannot be found anymore. The code should be able to deal with any geometries including invalid ones.
  • import in the ES variant with geometry enabled is very slow for me. Slower by a factor of 5. Any idea what is causing this?

The first issue does make me wonder if we shouldn't rather call the API parameter geometry and consistently go with that in the code. "polygon" is quite confusing for something that might also cause linestrings to appear. We could then have geometry=centroid/bbox/full.

app/es_embedded/src/main/java/de/komoot/photon/Server.java Outdated Show resolved Hide resolved
// gen.writeObjectFieldStart("properties");
// gen.writeStringField("name", "EPSG:0");
// gen.writeEndObject();
// gen.writeEndObject();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't just assume that the geometry is of type polygon. There may be pretty much anything in there: point, linestring, multilinestring, polygon, multipolygon.

Looks like we need tests for all the different geometry types to make sure this works.

Copy link
Author

@red-fenix red-fenix Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added support for LineString and Polygons for now. I have to examine the Nominatim database if they use other formats.

app/opensearch/src/main/java/de/komoot/photon/Server.java Outdated Show resolved Hide resolved
src/main/java/de/komoot/photon/SearchRequestHandler.java Outdated Show resolved Hide resolved
@red-fenix
Copy link
Author

Still not completely finished with a detailed review but I added some comments.

I currently see two major issues that still need to be solved:

  • the OS variant is currently hard-coded towards polygons with the result that streets cannot be found anymore. The code should be able to deal with any geometries including invalid ones.

I have build in support for LineString and Polygons and have to analyze the Nominatim database for other Geometries.

  • import in the ES variant with geometry enabled is very slow for me. Slower by a factor of 5. Any idea what is causing this?
    I will look into this, but since Elasticsearch support will be deprecated, how much time do I need to spend on this?

The first issue does make me wonder if we shouldn't rather call the API parameter geometry and consistently go with that in the code. "polygon" is quite confusing for something that might also cause linestrings to appear. We could then have geometry=centroid/bbox/full.

I will rename it to 'geometry' support in my next commit.

@red-fenix
Copy link
Author

Another thing: the Tests are working for Elastisearch as well as Opensearch. Though, whilst importing data, I get an 'Error during bulk import.'

Debugging this from the commandline is very hard, because I can't see the actual data that is being imported. I've been unable to run the import process in debugging from IntelliJ (I keep getting the error 'Jar hell').

Is there any way I can run the import process from Intellij so that I can better debug the error?

@lonvia
Copy link
Collaborator

lonvia commented Jan 12, 2025

I have build in support for LineString and Polygons and have to analyze the Nominatim database for other Geometries.

Nominatim has everything: Point, LineString, MultiLineString, Polygon, MultiPolygon. It definitely would be better to use a geojson builder instead of hand-coding the conversion. I see that you did that for ES. I do wonder though if that is the part that is making ES slow. That is why I was asking about the performance.

Is there any way I can run the import process from Intellij so that I can better debug the error?

Not sure where the jar hell comes from. I've only seen this in the ES variant. Maybe you could look into improving error reporting here? This code really could have a more helpful message and print the errors that the bulk import returned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants