Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should node attribute key validation be recursive #3863

Open
sphuber opened this issue Mar 24, 2020 · 8 comments · May be fixed by #5971
Open

Should node attribute key validation be recursive #3863

sphuber opened this issue Mar 24, 2020 · 8 comments · May be fixed by #5971

Comments

@sphuber
Copy link
Contributor

sphuber commented Mar 24, 2020

Currently there is a validation for attribute keys to verify that they are strings. For example the following raises:

Dict(dict={0: 'test'})

However, nested keys are not validated and are simply converted to strings once the node is stored and the entire attributes dictionary is serialized. Compare the following two examples:

In [9]: Dict(dict={'test': {0: 'test'}}).attributes                                                                                                                                                                                                                                                                                                                                                                                
Out[9]: {'test': {0: 'test'}}

In [10]: Dict(dict={'test': {0: 'test'}}).store().attributes                                                                                                                                                                                                                                                                                                                                                                       
Out[10]: {'test': {'0': 'test'}}

The storing in the second case turned the integer key 0 into the string '0'.

@sphuber
Copy link
Contributor Author

sphuber commented Mar 25, 2020

Pinging @giovannipizzi and @greschd for discussion

@greschd
Copy link
Member

greschd commented Mar 25, 2020

Yes, I think the validation should be recursive.

As a side note: the behavior is different again for cases where the JSON serializer doesn't automatically convert to string. The following code raises an error when storing the node:

from aiida.orm import Dict

class Test:
    def __hash__(self):
         return 0

d = Dict(dict={'a': {Test(): 2}}) # works
d.store() # "TypeError: keys must be a string" from json/encoder.py

@sphuber
Copy link
Contributor Author

sphuber commented Mar 25, 2020

Your example is the expected behavior though. I delayed the validation through serialization to when the node is stored. This is for efficiency reasons, because as long as the node is not stored, attributes can be freely mutated. Validating the whole set of attributes every time is not efficient and only validating the changed attribute may not be sufficient. So therefore we chose to push validation down to just before storing the node.

@greschd
Copy link
Member

greschd commented Mar 25, 2020

Ah, I see. Yeah, the problematic case is when it is just mutated silently, because that can go undetected.

@giovannipizzi
Copy link
Member

Indeed, we moved validation upon storing.
I agree that silent mutation should be avoided.
Beside the very first example, are there other cases? We can run the key validation recursively, but we need to check that it's fast enough.

I am pinging @ltalirz because we changed the validation code as it was very slow, so before we merge any change we should double check the performance impact.

@giovannipizzi
Copy link
Member

See also #733 for some reference on some decision choices

@sphuber
Copy link
Contributor Author

sphuber commented Apr 2, 2020

If I remember correctly the biggest slowdown was that the entire set of attributes was cleaned in its entirely every time a single attribute was changed. When building up StructureData nodes then, where you call append_atom often, which will call set_attr many times, we ended up validating everything multiple times. I think validating the entire nested structure once just before storing should be acceptable. Anyway, like you said, the alternative is silent mutation which is perhaps worse than a slightly worse performance

@ltalirz
Copy link
Member

ltalirz commented Apr 2, 2020

I agree that validation should be recursive upon store (I think there is still the flag to skip validation entirely if maximum performance is needed, right?).
And yes, the original issue was that validation happened whenever the node was modified before storing.

The original performance issue was noticed when looping over a few hundred structure files to import them into StructureData.
I suggest this as a performance test here as well.

@mbercx mbercx linked a pull request Apr 15, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants