Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: second order searches with elasticsearch #20

Open
kaplun opened this issue Sep 17, 2015 · 4 comments
Open

api: second order searches with elasticsearch #20

kaplun opened this issue Sep 17, 2015 · 4 comments

Comments

@kaplun
Copy link
Member

kaplun commented Sep 17, 2015

Invenio legacy supported queries such as: citedby:author:ellis. We need invenio-search to be able to support them, too, as this is critical for services such as INSPIRE for users to be able to browse records and aggregate information following citation links.

@kaplun
Copy link
Member Author

kaplun commented Sep 17, 2015

Note: I am opening the issue in invenio-search as I believe the solution will require interacting with elasticsearch in a loop and this will probably happen within invenio-search.

@jirikuncar jirikuncar added this to the future milestone Sep 17, 2015
@kaplun kaplun changed the title Support second order searches with elasticsearch api: second order searches with elasticsearch Sep 17, 2015
@tiborsimko
Copy link
Member

A brief summary of past musings on this topic:

  • for core second-order queries, the necessary information can be perhaps included first-hard in JSON-in-Elasticsearch, since expanded during JSON-in-PostgreSQL -> enhancer -> JSON-in-Elasticsearch step, kind of like authorities are doing;
  • for non-core second-order queries, and for any higher-order queries such as refersto:refersto:refersto:higgs, we'd most probably need to do a recursive expansion during runtime, just as in the Invenio legacy era. (Here, Solr 5 with its fast "intbitset" output would come handy.)

this is critical for services such as INSPIRE [...] the future milestone

If this feels as a contradiction, then I think there are two phases involved:

A. Collect ideas early to let them mature.
B. See who might liberate himself/herself to work on implementing them.

(Again, a bit like authorities that we started under phase A while the base stabilises enough to start also phase B.)

I guess @kaplun wanted to ticketise this topic early to get the ball rolling?

@kaplun
Copy link
Member Author

kaplun commented Sep 17, 2015

for core second-order queries, the necessary information can be perhaps included first-hard in JSON-in-Elasticsearch, since expanded during JSON-in-PostgreSQL -> enhancer -> JSON-in-Elasticsearch step, kind of like authorities are doing;

Mmh: let's take the case of citedby:author:ellis (which for INSPIRE is core). I assume we could e.g. extend the JSON being sent to Elasticsearch, with the list of citing records. Then I can very well see we could ask Elasticsearch for all the records matching author:ellis, and then Python side we can take the union of all the citation fields in the returned records. However, don't we need again a second pass to elasticsearch in order to rank them?

Or do you think this is also part of your latter point?

A. Collect ideas early to let them mature.
B. See who might liberate himself/herself to work on implementing them.

(Again, a bit like authorities that we started under phase A while the base stabilises enough to start also phase B.)

👍 In fact the RFC in authorities greatly helped us, because, while links are not yet resolved, nor maintained, we already have a clear idea on how to represent links, and what type of record manipulation and queries we can do.

I guess @kaplun wanted to ticketise this topic early to get the ball rolling?

😄

@tiborsimko
Copy link
Member

However, don't we need again a second pass to elasticsearch in order to rank them?

Yeah, various sorting/ranking options may require a second pass, as was the case with Solr/Xapian in legacy.

jalavik pushed a commit to jalavik/invenio-search that referenced this issue Feb 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants