Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDFTx IO module #4078

Open
wants to merge 180 commits into
base: master
Choose a base branch
from
Open

JDFTx IO module #4078

wants to merge 180 commits into from

Conversation

benrich37
Copy link

@benrich37 benrich37 commented Sep 24, 2024

Summary

Major changes:

  • Robust out file parsing for JDFTx
  • In file parsing, managing and writing for JDFTx
  • Testing (coverage previously at 97%, should still be >80%)

Changes to existing code:
core.periodic_table

  • For both ElementBase and Species class, the code within the "valence" property was copy-pasted into a new "valences" property, with the error raised for ambiguous valence (more than one valence subshell) removed
    (if len(valence) > 1: raise ValueError(f"{self} has ambiguous valence"))
    and return signature changed from tuple[int | np.nan, int] to list[tuple[int | np.nan, int]] (return valence instead of return valence[0])
  • For both ElementBase and Species class, the "valence" property was re-written to reduce redundancy to raise the ambiguous valence ValueError if len(self.valences) > 1, and otherwise return self.valences[0]

(I have an idea of how to remove the redundancy between Species and ElementBase for valence, but I will hold off on this until the next PR)

Todos

  • Write broader inputs and outputs class to house JDFTXInfile and JDFTXOutfile along with additional inputs/outputs information

… top of __getattr__ magic method so it is more clear to the user what variables are accessible from a jdftxoutfile object (and other downstream class objects)
…done 100% to work properly without circular imports, but the size of this repo causes refactor symbol movement to take literally minutes, so keeping the structure like so for now
…stuff + JDFTXInfile item setting safety nets, additional testing for infile, and more informative README
Problems on readme line breaks

Signed-off-by: benrich37 <[email protected]>
Problems on README

Signed-off-by: benrich37 <[email protected]>
Problems on README

Signed-off-by: benrich37 <[email protected]>
Problems on README

Signed-off-by: benrich37 <[email protected]>
Signed-off-by: benrich37 <[email protected]>
Signed-off-by: benrich37 <[email protected]>
Updated README for JDFTX io branch

Signed-off-by: benrich37 <[email protected]>
Added pyproject.toml to workflow dependencies

Signed-off-by: benrich37 <[email protected]>
@benrich37
Copy link
Author

@mkhorton - all modules are ready to review!

@@ -204,7 +205,8 @@ def __getattr__(self, item: str) -> Any:
if val is None or str(val).startswith("no data"):
warnings.warn(f"No data available for {item} for {self.symbol}")
val = None
elif isinstance(val, list | dict):
# elif isinstance(val, dict | list):
Copy link
Contributor

@DanielYang59 DanielYang59 Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish you don't mind me commenting :)

I'm not seeing any error from pre-commit for this line, you might want to make sure you're running python >= 3.10 as the | operator for union type (PEP 604) is only supported after python 3.10:

isinstance(var, list | dict)  # python >= 3.10
isinstance(var, (list, dict)) # python <= 3.9

pymatgen is moving to python 3.10+ now:

requires-python = ">=3.10,<3.13"


Also do note their behaviours are different (isinstance would consider subclasses):

class Foo(dict):  # demo purpose, should not subclass dict (UserDict instead)
    def __init_subclass__(cls):
        return super().__init_subclass__()

user_dict = Foo(a=1)

print(isinstance(user_dict, dict))  # True
print(type(user_dict) in {dict, })  # False

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Daniel! Thank you for your review!

For my environment: I am running python 3.12 with a virtual environment set up using uv following the instructions on the contributing page (only with uv create venv replaced with uv venv as the syntax for uv has been updated)

For my issue: Running pre-commit with the line as elif isinstance(val, list | dict) raises the error within the mypy hook error: Unsupported left operand type for | ("type[dict[Any, Any]]") [operator]. I am seeing this is a problem on my end however, as I recently reformatted work computer to linux and this problem no longer occurs (previously I was developing on my personal mac laptop with an environment set up the same way). In both computers, the hooks for pre-commit was configured using the same pyproject.toml and pre-commit-config.yaml.

For the other issues: All the other changes to lines checking for object type was due to a similar or identical error raised mypy. I will check if these can also be fixed by just running pre-commit in linux

@@ -523,7 +532,8 @@ def term_symbols(self) -> list[list[str]]:
# Total ML = sum(ml1, ml2), Total MS = sum(ms1, ms2)
TL = [sum(ml_ms[comb[e]][0] for e in range(v_e)) for comb in e_config_combs]
TS = [sum(ml_ms[comb[e]][1] for e in range(v_e)) for comb in e_config_combs]
comb_counter = Counter(zip(TL, TS, strict=True))
# comb_counter: Counter = Counter(zip(TL, TS, strict=True))
Copy link
Contributor

@DanielYang59 DanielYang59 Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing any error here either, looks like it's a python version issue again, as the strict argument of zip is added in python 3.10:

Changed in version 3.10: Added the strict argument.

I would personally prefer the zip implementation:

  • it's more readable
  • It's ~1.5x faster:
    Time taken by Counter(zip(TL, TS, strict=True)): 1.061628 seconds
    Time taken by Counter(manual indexing):          1.646624 seconds
    
benchmark script (by GPT)
import timeit
import random
from collections import Counter

# Generate test data
def generate_data(size):
    TL = [random.randint(1, 100) for _ in range(size)]
    TS = [random.randint(1, 100) for _ in range(size)]
    return TL, TS

# Define the functions to compare
def counter_zip(TL, TS):
    return Counter(zip(TL, TS, strict=True))

def counter_manual(TL, TS):
    return Counter([(TL[i], TS[i]) for i in range(len(TL))])

# Set the size of the test data
size = 1000000
TL, TS = generate_data(size)

# Setup code for timeit
zip_setup = "from __main__ import counter_zip; from __main__ import TL; from __main__ import TS"
manual_setup = "from __main__ import counter_manual; from __main__ import TL; from __main__ import TS"

# Measure performance of Counter(zip(TL, TS, strict=True))
zip_time = timeit.timeit("counter_zip(TL, TS)", setup=zip_setup, number=10)
print(f"Time taken by Counter(zip(TL, TS, strict=True)): {zip_time:.6f} seconds")

# Measure performance of Counter([(TL[i], TS[i]) for i in range(len(TL))])
manual_time = timeit.timeit("counter_manual(TL, TS)", setup=manual_setup, number=10)
print(f"Time taken by Counter(manual indexing): {manual_time:.6f} seconds")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed by running pre-commit on linux instead of mac, but for record-keeping the mypy errors on my mac for this line was error: Unsupported left operand type for | ("type[dict[Any, Any]]") [operator] and error: No overload variant of "zip" matches argument types "list[int]", "list[Union[float, Literal[0]]]", "bool" [call-overload]

Copy link
Contributor

@DanielYang59 DanielYang59 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's again a python version issue, as strict is not available python <3.9, and therefore the type overload doesn't exist.

@@ -1639,9 +1656,18 @@ def get_el_sp(obj: int | SpeciesLike) -> Element | Species | DummySpecies:
of properties that can be determined.
"""
# If obj is already an Element or Species, return as is
if isinstance(obj, Element | Species | DummySpecies):
Copy link
Contributor

@DanielYang59 DanielYang59 Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this change? Perhaps you're not running Python 3.10+?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed by running on linux, but for record-keeping the mypy error on my mac for this was Unsupported left operand type for | ("type[Element]") [operator]

@benrich37
Copy link
Author

@DanielYang59 previous weird edits I made to some of the existing code in periodic_table.py has been reverted - now I just need to figure out what's wrong with my laptop's execution of pre-commit! Thank you again for taking the time to review!

@DanielYang59
Copy link
Contributor

DanielYang59 commented Nov 13, 2024

Hey Daniel! Thank you for your review!

No problem, I'm not a maintainer, but I'm hoping my comments would be helpful.

Running pre-commit with the line as elif isinstance(val, list | dict) raises the error within the mypy hook error: Unsupported left operand type for | ("type[dict[Any, Any]]") [operator]

This exactly suggests that the pre-commit is installed on a python <= 3.9 interpreter (| is not supported).

I fully understand this Python module thing could be very confusing, here is how to resolve it.


First of all, why this happened (my best guess):

As I don't have access to your env of course, my best guess is, you have multiple python interpreters install on your system (say one Python from the system package manager, for example apt for Ubuntu), and you installed pre-commit on your base interpreter. Then you create a virtual environment but didn't install pre-commit. If you run pre-commit at this point, it would come from your base Python interpreter, something like:

>>> which python3 
/usr/bin/python3  # the system installed base interpreter

# assume you haven't activated venv so far
>>> python3 -m pip install pre-commit  # pre-commit installed to your base interepreter

>>> source venv/bin/activate 

>>> which pre-commit
~/.local/bin/pre-commit  # pre-commit but from your base interpreter

How to resolve this?

  1. Install pre-commit on your activated venv as it would take higher priority than the system one (in general you might want to keep your base env clean BTW)
# Continue previous demo
>>> python3.12 -m pip install pre-commit  # install pre-commit for your venv

>>> which pre-commit
~/test/venv/bin/pre-commit  # now from venv
  1. Use module run with explicit interpreter path like python3 -m pre_commit run --all-files, do note for the package name on PyPI use Kebab case like pre-commit (dash) and use snake case for module name pre_commit (underscore)
  2. Use explicit path, something like venv/bin/pre-commit run --all-files to make sure pre-commit comes from that particular venv

Hope this helps :)

.gitignore Outdated
@@ -27,6 +27,10 @@ setuptools*
.*_cache
# VS Code
.vscode/*
codecov*
coverage.xml
jdftx*.xml
Copy link
Contributor

@DanielYang59 DanielYang59 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we might want to keep the project-level .gitignore clean (admittedly it need some clean up) and only add global/general patterns as it would have project-level impact, codecov/coverage should fit into this but I don't think jdftx* or actions-runner do (I don't think many people try to trigger active locally).

Perhaps you could consider the following options:

  • Use a local "global" .gitignore file for some very personal settings:
    1. Write settings into a separate file somewhere in your local machine, say ~/.gitignore_global
    2. Run git config --global core.excludesFile '~/.gitignore_global'
  • Add a .gitignore file to a child directory if you really need to ignore some files for whatever reason (I cannot think of much use case, for unit test we would prefer to write/modify file from a temp dir to avoid changing the state of your current project), do note .gitignore could exist anywhere in the project (not just top level) and it would apply recursively to its current and child directories

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very helpful - I will move those ignore lines soon to my global configuration. Also you were dead-on with the pre-commit problem! My laptop's default python3 is version 3.9 and I didn't realize the pre-commit I was using was installed there instead of the uv virtual environment. Installing the pip module onto the uv virtual environment and running python -m pip install pre-commit fixed everything! Thank you again for reviewing and helping me out on these issues! This has been very helpful for me.

benrich37 and others added 8 commits November 13, 2024 13:57
Moving project .gitignore exceptions added for files generated by cod…
…cture metadata to be passed to subsequent JOutStructure objects when intializing a JOutStructures object
Added "initial_structure" keyword arguments to allow for default Stru…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants