Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --also-remote option to verdi process dump #6578

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

GeigerJ2
Copy link
Contributor

@GeigerJ2 GeigerJ2 commented Oct 3, 2024

Tries to open a connection to the remote Computer using the transport of the CalcJobNode, and gets the files and folders which were not originally dumped, either as they weren't part of the CalcJobNode's repository or retrieved.

Will add the tests once the implementation is somewhat stable.

Don't want to assign you officially for review, but still pinging @khsrali and @agoscinski here if you guys want to have a look 👀

Note:
With PR #6584, it'll be possible to give a warning to the user about the total size of files that would be written to disk. We might eventually decide that it takes too long and one has to open too many transports, just to obtain the total file size to warn a user, so one could just issue a generic warning. The problem is that these files and directories are on the remote computer (if not deleted already), so one has to connect to get information about them.

Also, the returned file size will differ, as some of the files in the directory to which the RemoteData points to, will have already been retrieved. This can be checked for, based on e.g. retrieve_list, local_copy_list, etc.

Tries to open a connection to the remote Computer using the transport of
the CalcJobNode, and `get`s the files and folders which were not
originally dumped, either as they weren't part of the CalcJobNode's
repository or retrieved.
@GeigerJ2 GeigerJ2 force-pushed the feature/process-dump-remote-files branch from 469ecfa to 5693a63 Compare October 3, 2024 11:04
Copy link

codecov bot commented Oct 3, 2024

Codecov Report

Attention: Patch coverage is 27.58621% with 21 lines in your changes missing coverage. Please review.

Project coverage is 77.82%. Comparing base (ef60b66) to head (5693a63).
Report is 115 commits behind head on main.

Files with missing lines Patch % Lines
src/aiida/tools/dumping/processes.py 24.00% 19 Missing ⚠️
src/aiida/cmdline/commands/cmd_process.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6578      +/-   ##
==========================================
+ Coverage   77.51%   77.82%   +0.31%     
==========================================
  Files         560      566       +6     
  Lines       41444    42077     +633     
==========================================
+ Hits        32120    32741     +621     
- Misses       9324     9336      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


# Transport.get() works for files and folders, so we don't need to make a distinction
for rem, loc in zip(remote_paths, local_paths):
transport.get(str(rem), str(loc), ignore_nonexisting=True)
Copy link
Contributor

@khsrali khsrali Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think transport.get() has overwrite=True for default.
Maybe in this case it makes sense to issue a warning (that I'm overwriting your local files)

A one other points, also mentioned by others:

  • maybe one can pass glob patterns (e.g. all *.log files)

Copy link
Contributor Author

@GeigerJ2 GeigerJ2 Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point! In principle, dumping has an overwrite argument, as well. Though, I'm getting a bit tired of populating this setting down the hierarchy to the different sub-WorkChains and CalcJobs. So I've been thinking of either making it global, for just the parent directory of the dumping, or possibly making it more fine-grained. The remote files are a case where the latter case could make sense.

Currently working on the possibility to provide explicit list of files or glob patterns to include/exclude! 🚀

@khsrali
Copy link
Contributor

khsrali commented Oct 3, 2024

Thanks @GeigerJ2 I like this feature,
We already had some users from Empa who probably would have liked this..

is_flag=True,
default=False,
show_default=True,
help='If true, try to obtain also intermediate files on the remote computer that were not initially retrieved.',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps give a bit more detail here, e.g., mention that this requires opening a connection to the remote computer and that it possible cannot retrieve the files if the remote folder has been altered or deleted entirely in the meantime

)

if also_remote:
echo.echo_report(
'`--also-remote` set to True. Will try to retrieve additional files from the `workdir` of the remote Computer.' # noqa: E501
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really the workdir of the Computer (which is the base directory of the scratch) but rather of the calcjob itself

'`--also-remote` set to True. Will try to retrieve additional files from the `workdir` of the remote Computer.' # noqa: E501
)
echo.echo_report(
'If files are non-existent, they will be skipped silently. Check if the output files are what you expect.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good idea? will be difficult to tell if they werent copied because they no longer exist or because there was a problem


remote_workdir = calcjob_node.get_remote_workdir()
if remote_workdir is None:
raise NotExistentAttributeError(f"CalcJobNode <{calcjob_node.pk}> doesn't have a `remote_workdir`.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not caught, do you really want to let this let the entire dump fail? Perhaps just log a warning and move on

Comment on lines +309 to +311
retrieve_list = list(calcjob_node.get_retrieve_list()) # type: ignore[arg-type]
repository_list = calcjob_node.base.repository.list_object_names()
exclude_list = retrieve_list + repository_list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few things here:

  1. Shouldn't everything that it is in the retrieve list also be in the repository list? Maybe there may be more in the retrieve list, but that would then correspond to a file that doesn't exist, so it will presumable still not be there on the remote, so it can be ignored.
  2. a bigger problem though I think is that the retrieve_list can contain wildcards and I don't think you currently handle them in the code below right? If you just rely on the contents of the repository, this problem disappears.


# Getting the transport fails, propagate exception outwards
# (could just remove the except, but being explicit might make it clearer here)
except ConfigurationError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, a ConfigurationError? Where is that being thrown? Looks like a different problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants