Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global: allow using an index prefix #145

Closed

Conversation

david-caro
Copy link
Member

Now using the setting SEARCH_INDEX_PREFIX you can define a prefix to
use for all your indices.

This partially addresses #129

Signed-off-by: David Caro [email protected]

@david-caro david-caro force-pushed the allow_prefixing_indices branch 2 times, most recently from 0e45f34 to 054a3e1 Compare June 26, 2018 18:45
@david-caro
Copy link
Member Author

This seems to be all that is needed. I'm changing the api of one of the utils methods, so this might break someone's code.

Copy link
Member

@slint slint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think utils.build_index_name is only used in the module, so changing the signature is probably ok.

One small issue I see is that the prefixing only applies to index names and not aliases... On the other hand, I think that ESonDemand, allowed creation of aliases even without using a prefix... (e.g. I managed to create esod_prefix-records-v1.0.0 and then the records alias which pointed to it worked fine... Maybe the next guy that would try to create a records alias would have issues ¯\_(ツ)_/¯)

kwargs.setdefault('index', getattr(self.Meta, 'index', None))
kwargs.setdefault(
'index',
current_app.config['SEARCH_INDEX_PREFIX'] + '-' +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better if the dash (-) was part of the SEARCH_INDEX_PREFIX? I know it's how it's done by ESoD, but it will be hard to change in the future if somebody wants to do it differently

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep :), I just did that change locally, will push soon.

@@ -83,7 +83,11 @@ class Meta:

def __init__(self, **kwargs):
"""Use Meta to set kwargs defaults."""
kwargs.setdefault('index', getattr(self.Meta, 'index', None))
kwargs.setdefault(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are currently two ways to perform queries to different indices using this class:

  1. Sub-classing and overriding Meta.index
  2. Passing index explicitly as a keyword argument

The first case is of course covered... Though the usage of kwargs.setdefault instead of setting the value directly is probably because of the fact that this constructor was written before the need for prefixing index names "magically" existed...

Maybe it would be better to make Meta.index some kind of property which relies on some _index field? @lnielsen WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally I solved it by checking if the kwargs['index'] is prefixed, and if not then adding it (at some point there's also a list passed to it instead a string...)

@david-caro
Copy link
Member Author

Btw. this current version does take care of the aliases too \o/

@david-caro david-caro force-pushed the allow_prefixing_indices branch 3 times, most recently from 74f62a9 to 2e8e8bb Compare June 26, 2018 21:59
@@ -79,3 +79,5 @@
<https://www.elastic.co/guide/en/elasticsearch/reference/current/
search-request-min-score.html>`_ for more information.
"""

SEARCH_INDEX_PREFIX = ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have some documentation maybe here, as for the other config variables.

@ntarocco
Copy link
Contributor

@slint agree that build_index_name in inveniosoftware organisation is used only here, but maybe in any overlay/other modules this is used, since it is not marked as private.
So we will probably have to release it with a "major" version?

It would be also nice to update the doc and put there an example.

Now using the setting `SEARCH_INDEX_PREFIX` you can define a prefix to
use for all your indices.

Signed-off-by: David Caro <[email protected]>
Copy link
Member

@lnielsen lnielsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR. It's much appreciated and needed by everyone.

I'm ok with changing the signature of build_index_name as long as it's properly documented in the CHANGES.rst once the new version is released (i.e. 1.1.0).

Regarding aliases, would you mind adding a quick test that demonstrates the prefix for aliases work as well.

Also, what about the ES templates, do they need prefixes as well?

:returns: A string with the new index name prefixed in needed.
"""
index_prefix = app.config['SEARCH_INDEX_PREFIX']
if index.startswith(index_prefix):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which cases is this covering? Is it only when SEARCH_INDEX_PREFIX == '' or are there other cases?

Example: If you decided to name your prefix records and your index starts with records I would still expect the final index name to be records-records (might not make much sense, but it's consistent 😄).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is there because the libs at some point start doing nested calls passing the index string around, so if you have to make sure to prepend the perfix only once. Unfortunately, all the info you get is the index name string, so you can't add random info (like a boolean property prefixed). But, if you want your indices to end up as record-record the prefix should be record-, with the last dash, and that will not really collide (unless you want record-record-something). But I don't know how to prevent that from happening without changing substantially the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Yeah, not much to do about it. Perhaps just add a quick comment in the code, that this line is there due to nested calls.

@david-caro
Copy link
Member Author

Regarding aliases, would you mind adding a quick test that demonstrates the prefix for aliases work as well.

I'll try. So far the aliases and indices use the same function to get the names, and that one is the one that does the prefixing (build_index_name).

Also, what about the ES templates, do they need prefixes as well?
I'll check.

@dinosk
Copy link
Member

dinosk commented Aug 13, 2018

After some testing I see that the aliases have the prefix:

alias                     index                                          filter routing.index routing.search
prefix-demo               prefix-demo-bibliographic-bibliographic-v1.0.0 -      -             -
prefix-demo-bibliographic prefix-demo-bibliographic-bibliographic-v1.0.0 -      -             -
prefix-demo               prefix-demo-default-v1.0.0                     -      -             -
prefix-demo               prefix-demo-authorities-authority-v1.0.0       -      -             -
prefix-demo-authorities   prefix-demo-authorities-authority-v1.0.0       -      -             -

But for the templates if I understand correctly we'll need to add it, as after registering a sample one, it's registered in the following way:

...
"prefix-file_download/v6-file-download-v1" : {
    "order" : 0,
    "index_patterns" : [
      "events-stats-file-download-*"
    ],
...

@dinosk dinosk removed their assignment Aug 14, 2018
@lnielsen
Copy link
Member

lnielsen commented Aug 14, 2018

We've been digging into the prefixing in the current Invenio sprint, and I'd like to discuss if it's worthwhile adding support for prefixing indexes.

Template problem
Currently, as @dinosk mentions, there are some issues with templates. To fix it, we would need to dynamically modify the JSON body sent to Elasticsearch. E.g something like:

{
  "template": "<prefix>-events-stats-record-view-*",
  ...
  "aliases": {
    "<prefix>-events-recordview": {}
  }
}

It's not clear, if it's only template or aliases keys that have to be prefixed or if there are more keys.

Workaround
Currently, it's possible to prefix your indexes by simply adding a further folder:

mappings/v6/records/record-v1.0.0.json -> records-record-v1.0.0.json
mappings/v6/<prefix>/records/record-v1.0.0.json -> <prefix>-records-record-v1.0.0.json

Similar for templates, you can add your prefix inside the template JSON body manually.

CERN Central Elasticsearch Service
One of the use cases for prefixing is to use the central Elasticsearch service at CERN that requires prefixing. That is possible using the workaround above (in use by CERN Search).

Shared Elasticsearch for multiple Invenio instances
In case we do not add support for prefixing, I think the key use case we'll not be able to support is using a shared Elasticsearch instance for several Invenio instances since they would all try to create and manage the indexes (e.g. records-record-v1.0.0.json).

The down-side by using a shared Elasticsearch instance for multiple Invenio instances, is that each instance will have read/write access to all instances unless you add the expensive X-Pack from Elastic with support for authorization and permissions. OTOH you only need to operate one ES instance and thereby save resources.

What to do?
I would like to see prefixing support in Invenio-Search, but I have some doubts about if we can make it robustness enough, and given that there's an existing robust workaround, I'd like to get some inputs on if it's worthwhile the effort. Essentially, is using a shared Elasticsearch instance for multiple Invenio instances such an important use case that it justifies the efforts needed to make the prefixing robust enough?

@david-caro
Copy link
Member Author

Something that this seems not to address is having multiple invenio instances on CERN Central Elasticsearch Service, as that would require duplicating all the mappings on two different subdirectories no? (for example, having a 'dev', a 'qa' and a 'prod' instances).

@lnielsen
Copy link
Member

Correct. The dev, qa and prod is a special case of "multiple Invenio instances on a shared Elasticsearch cluster" which make it useless to use the central Elasticsearch cluster for multiple environments (you could still use it for only one environment).

Another issue is that the support for templates in the central Elasticsearch service. Right now this is handled via a git repo instead of via the API. This would mean e.g. the statistics module won't be usable on the central service (or at least it will be rather cumbersome).

I'm worried that prefixing is not robust enough to properly sandbox two Invenio instances running on the same ES cluster and make sure they don't interfere with each other in any way.

@ntarocco
Copy link
Contributor

I am closing this PR because it is outdated, the new PR is here: #156

@ntarocco ntarocco closed this Feb 15, 2019
@ntarocco ntarocco mentioned this pull request Feb 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants