Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding a triggered assertion #576

Open
etiennemlb opened this issue Jul 15, 2024 · 1 comment
Open

Understanding a triggered assertion #576

etiennemlb opened this issue Jul 15, 2024 · 1 comment

Comments

@etiennemlb
Copy link

etiennemlb commented Jul 15, 2024

What could possibly lead to errno being set to 7 at the following line:

MFU_ABORT(-1, "Failed to write file %s errno=%d (%s)",

This is triggered when making a Tio sized copy of many small files on a recent Luster FS.

@ofaaland
Copy link
Collaborator

@etiennemlb we have observed a similar issue in production but failed to reproduce it in test so far. When we saw it:

  • lustre 2.12.9_3llnl clients
  • lustre-2.14.0_17.llnl servers, backed by zfs-2.1
  • on file creation (not write)
  • using a PFL default layout, with a DoM component
  • while copying from an older Lustre FS to a newer one via dsync

On those same, the sysadmins removed the DoM component from the default PFL layout, making no other changes to either clients or servers, and file creates started working again.

We observed the same failure (before removing DoM) and success (after removing DoM) with the "touch" utility so it was not an mpifileutils issue.

We haven't created an upstream Whamcloud issue against Lustre because we can't reproduce ot understand what's going on, but believe it was a Lustre issue in the end.

I don't know if that's related to your issue or not, but thought it was worth mentioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants