Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync ldap into names #195

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jrcastro2
Copy link
Contributor

@jrcastro2 jrcastro2 commented Sep 4, 2024

As a non admin user, see the synced value

image

As an admin, both values, the deprecated one appears greyed

image

@jrcastro2 jrcastro2 force-pushed the sync-ldap-names branch 5 times, most recently from 212d6dc to 6673d1a Compare September 20, 2024 08:48
@jrcastro2 jrcastro2 marked this pull request as ready for review October 22, 2024 06:45
@jrcastro2 jrcastro2 force-pushed the sync-ldap-names branch 2 times, most recently from 1087edc to 9d5c950 Compare October 22, 2024 12:55
site/cds_rdm/jobs.py Outdated Show resolved Hide resolved
site/cds_rdm/jobs.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
@shared_task()
def merge_duplicate_names_vocabulary(since=None):
"""Merges duplicate names in the names vocabulary."""
service = current_service_registry.get("names")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for readability

Suggested change
service = current_service_registry.get("names")
names_service = current_service_registry.get("names")

site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
filters.append(dsl.Q("range", updated={"gte": since}))
combined_filter = dsl.Q("bool", filter=filters)

names = service.scan(system_identity, extra_filter=combined_filter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we rely on the db table instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we are applying a lot of filters, plus there depending on the time range and/or when we just imported ORCID and the CERN authors, there might be a lot of results to process and there are millions of values in this table, I would expect this to be quite slow.

I am aware that the DB is the source of truth but mainly for performance optimization I would go with OS, however, if there are strong reasons to prioritize the DB, I can change it and test it with a bigger sample to asses the performance

site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
assets/less/cds-rdm/globals/site.overrides Outdated Show resolved Hide resolved
site/cds_rdm/jobs.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
@ntarocco
Copy link
Contributor

ntarocco commented Oct 24, 2024

For the Dep - Group label, shall we put it even smaller, inside the parenthesis just after the email? It kind of belongs to the CERN part:
(ORCID logo ORCID ID CERN logo email depgroup)

@jrcastro2 jrcastro2 force-pushed the sync-ldap-names branch 2 times, most recently from e69f358 to 9bbad13 Compare November 6, 2024 10:46
site/cds_rdm/authors/schema.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
@jrcastro2 jrcastro2 changed the title WIP: sync ldap into names sync ldap into names Nov 11, 2024
* add person_id to names vocab
Copy link
Contributor

@ntarocco ntarocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments, but mainly cosmetic.

invenio.cfg Outdated Show resolved Hide resolved
site/cds_rdm/schemes.py Outdated Show resolved Hide resolved
return bool(re.match(pattern, val))


def cds():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed even if it does not validate anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to pass it to idutils.custom_schemes entrypoint so that it can detect the scheme when submitting the record.

site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Show resolved Hide resolved
site/tests/test_tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
site/cds_rdm/tasks.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

names vocab: allow names vocab to have 2 types of objects
3 participants