Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installation fail according lustre_server_rocky8.10 #1

Open
xia-MM opened this issue Nov 21, 2024 · 32 comments
Open

installation fail according lustre_server_rocky8.10 #1

xia-MM opened this issue Nov 21, 2024 · 32 comments

Comments

@xia-MM
Copy link

xia-MM commented Nov 21, 2024

hi, expert,
i am a interest lustre , this is my first time to installation lustre . try to many times according your document , meeting the following error messages. how to fix it ,could you help me .

if test -n "" ; then
cp linux-stage/include/linux;
fi
if test -n "" ; then
cp linux-stage/include/uapi/linux;
fi
if test -n "/root/kernel/rpmbuild/BUILD/kernel-4.18.0-553.27.1.el8_10/linux-4.18.0-553.27.1.el8_lustre.x86_64/include/trace/events/ext4.h" ; then
cp /root/kernel/rpmbuild/BUILD/kernel-4.18.0-553.27.1.el8_10/linux-4.18.0-553.27.1.el8_lustre.x86_64/include/trace/events/ext4.h linux-stage/include/trace/events;
fi
Applying ext4 patches: rhel8.1/ext4-inode-version.patch suse15/ext4-lookup-dotdot.patch suse15/ext4-print-inum-in-htree-warning.patch rhel8/ext4-prealloc.patch ubuntu18/ext4-osd-iop-common.patch rhel8.7/ext4-misc.patch rhel8.7/ext4-mballoc-extra-checks.patch rhel8.7/ext4-hash-indexed-dir-dotdot-update.patch rhel8.1/ext4-kill-dx-root.patch rhel8.7/ext4-mballoc-pa-free-mismatch.patch rhel8.4/ext4-data-in-dirent.patch rhel8/ext4-nocmtime.patch base/ext4-htree-lock.patch rhel8.7/ext4-pdirop.patch rhel8/ext4-deep-tree.patch rhel8/ext4-max-dir-size.patch rhel8.7/ext4-corrupted-inode-block-bitmaps-handling-patches.patch ubuntu18/ext4-give-warning-with-dir-htree-growing.patch ubuntu18/ext4-jcb-optimization.patch rhel8.2/ext4-attach-jinode-in-writepages.patch rhel8/ext4-dont-check-before-replay.patch rhel7.6/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch rhel7.6/ext4-export-orphan-add.patch rhel8/ext4-export-mb-stream-allocator-variables.patch rhel8/ext4-simple-blockalloc.patch rhel8.7/ext4-mballoc-skip-uninit-groups-cr0.patch rhel8.7/ext4-mballoc-prefetch.patch rhel8.3/ext4-xattr-disable-credits-check.patch base/ext4-no-max-dir-size-limit-for-iam-objects.patch rhel7.6/ext4-dquot-commit-speedup.patch rhel8.7/ext4-introduce-EXT4_BG_TRIMMED-to-optimize-fstrim.patch rhel8/ext4-ialloc-uid-gid-and-pass-owner-down.patch base/ext4-projid-xattrs.patch rhel8.5/ext4-enc-flag.patch rhel8/ext4-ext-merge.patch base/ext4-delayed-iput.patch2 out of 4 hunks FAILED -- saving rejects to file fs/ext4/xattr.c.rej
make[2]: *** [autoMakefile:679: sources] Error 1
make[2]: Leaving directory '/root/lustre-release/ldiskfs'
make[1]: *** [autoMakefile:731: all-recursive] Error 1
make[1]: Leaving directory '/root/lustre-release'
make: *** [autoMakefile:585: all] Error 2

@DetlevCM
Copy link
Owner

I'm not too sure what can be done from afar to offer help (I might need to test if Rocky changed at some point...), but there are some pointers.

If you check your output, you see the following;
... rhel8.7/ext4-introduce-EXT4_BG_TRIMMED-to-optimize-fstrim.patch rhel8/ext4-ialloc-uid-gid-and-pass-owner-down.patch base/ext4-projid-xattrs.patch rhel8.5/ext4-enc-flag.patch rhel8/ext4-ext-merge.patch base/ext4-delayed-iput.patch2 out of 4 hunks FAILED -- saving rejects to file fs/ext4/xattr.c.rej

Basically, application of the patches failed.

There are at least two scenarios I can imagine:

  1. When I first set up a lustre VM as a test, I found that the ext4 code was not compatible with the patch files and I had to tweak the patch file. Then the kernel updated, all worked... It may be necessary again.
    Please check the error in detail, specifically the file that triggers it.
  2. Did you try applying the patches more than once? If yes, this will create a mess. In this case, I would recommend unpacking the kernel again and copying over files as appropriate, followed by patching.

Maybe write a step by step log of what you do to get to where you are?

@xia-MM
Copy link
Author

xia-MM commented Nov 22, 2024 via email

@xia-MM
Copy link
Author

xia-MM commented Nov 24, 2024

hi, sir:

i solve it . when i re-edit the ext4-delayed-iput.patch file ,then delete the following line :

Index: b2_15_linux-4.18.0-425.3.1.el8/fs/ext4/xattr.c

--- b2_15_linux-4.18.0-425.3.1.el8.orig/fs/ext4/xattr.c
+++ b2_15_linux-4.18.0-425.3.1.el8/fs/ext4/xattr.c
@@ -1579,6 +1579,36 @@ static int ext4_xattr_inode_lookup_creat
return 0;
}

+struct delayed_iput_work {

  •   struct work_struct work;
    
  •   struct inode *inode;
    

+};
+
+static void delayed_iput_fn(struct work_struct *work)
+{

  •   struct delayed_iput_work *diwork;
    
  •   diwork = container_of(work, struct delayed_iput_work, work);
    
  •   iput(diwork->inode);
    
  •   kfree(diwork);
    

+}
+
+static void delayed_iput(struct inode *inode, struct delayed_iput_work *work)
+{

  •   if (!inode) {
    
  •           kfree(work);
    
  •           return;
    
  •   }
    
  •   if (!work) {
    
  •           iput(inode);
    
  •   } else {
    
  •           INIT_WORK(&work->work, delayed_iput_fn);
    
  •           work->inode = inode;
    
  •           queue_work(EXT4_SB(inode->i_sb)->s_misc_wq, &work->work);
    
  •   }
    

+}
+
/*

  • Reserve min(block_size/8, 1024) bytes for xattr entries/names if ea_inode
  • feature is enabled.
    @@ -1596,6 +1626,7 @@ static int ext4_xattr_set_entry(struct e
    int in_inode = i->in_inode;
    struct inode *old_ea_inode = NULL;
    struct inode *new_ea_inode = NULL;
  •   struct delayed_iput_work *diwork = NULL;
      size_t old_size, new_size;
      int ret;
    

@@ -1672,7 +1703,11 @@ static int ext4_xattr_set_entry(struct e
* Finish that work before doing any modifications to the xattr data.
*/
if (!s->not_found && here->e_value_inum) {

  •           ret = ext4_xattr_inode_iget(inode,
    
  •           diwork = kmalloc(sizeof(*diwork), GFP_NOFS);
    
  •           if (!diwork)
    
  •                   ret = -ENOMEM;
    
  •           else
    
  •                   ret = ext4_xattr_inode_iget(inode,
                                          le32_to_cpu(here->e_value_inum),
                                          le32_to_cpu(here->e_hash),
                                          &old_ea_inode);
    

@@ -1825,7 +1860,7 @@ update_hash:

    ret = 0;

out:

  •   iput(old_ea_inode);
    
  •   delayed_iput(old_ea_inode, diwork);
      iput(new_ea_inode);
      return ret;
    

}

then execute the under command
make
make install
depmod -a/
usr/lib64/lustre/tests/llmount.sh
[root@Lustre lustre-release]# /usr/lib64/lustre/tests/llmount.sh
Stopping clients: Lustre /mnt/lustre (opts:-f)
Stopping clients: Lustre /mnt/lustre2 (opts:-f)
Lustre: executing set_hostid
Loading modules from /usr/lib64/lustre/tests/..
detected 2 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
gss/krb5 is not supported
quota/lquota options: 'hash_lqs_cur_bits=3'
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
Checking servers environments
Checking clients Lustre environments
Loading modules from /usr/lib64/lustre/tests/..
detected 2 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
gss/krb5 is not supported
Setup mgs, mdt, osts
Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1
Commit the device label on /tmp/lustre-mdt1
Started lustre-MDT0000
Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1
Commit the device label on /tmp/lustre-ost1
Started lustre-OST0000
Starting ost2: -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2
Commit the device label on /tmp/lustre-ost2
Started lustre-OST0001
Starting client: Lustre: -o user_xattr,flock Lustre@tcp:/lustre /mnt/lustre
Using TIMEOUT=20
osc.lustre-OST0000-osc-ffff941493575800.idle_timeout=debug
osc.lustre-OST0001-osc-ffff941493575800.idle_timeout=debug
setting jobstats to procname_uid
Setting lustre.sys.jobid_var from disable to procname_uid
Waiting 90s for 'procname_uid'
Updated after 3s: want 'procname_uid' got 'procname_uid'
disable quota as required

@DetlevCM
Copy link
Owner

It looks like the formatting broke a bit...

Good to hear that you found the solution. Honestly, it is a bit concerning that the build procedure does not seem to be entirely stable and had issues from time to time...

@xia-MM
Copy link
Author

xia-MM commented Nov 26, 2024

hi,expert
sorry for disturb you again, for me ,it's a newer , i interest the lustre, so i want to ask to your a question,

during the command configure lustre, can i choice ignore to install the ext4 patch file ?
appreciate you reply

@DetlevCM
Copy link
Owner

hi,expert sorry for disturb you again, for me ,it's a newer , i interest the lustre, so i want to ask to your a question,

during the command configure lustre, can i choice ignore to install the ext4 patch file ? appreciate you reply

I'm just a user of the code in the end 😅

As to the patch: I do not think it can be ignored, because lustre uses a modified ext4 file system. (Not sure if it can also use ZFS for the OSTs.)
So the patched e2fsutils and patched ext4 code are most definitely necessary.

May I suggest the lustre mailing list? - It would be a better place for questions that really address the inner workings of lustre: https://www.lustre.org/mailing-lists/

@xia-MM
Copy link
Author

xia-MM commented Nov 26, 2024

thx

@xia-MM
Copy link
Author

xia-MM commented Dec 17, 2024

sir, want to require another problem:
during installation , appear the following error, do you have any ideas ?
2549b8780396af8b4567c76e27b545d

@DetlevCM
Copy link
Owner

@xia-MM Just lie this, no, sorry...

Are you building this on Redhat or Rocky? - I recently ran through the rocky install for a test system and 8.10 worked without issues (except for a slightly different build number in the kernel).
(And 9.5 doesn't work which is another topic...)

When googling the error message, one finds that the same issue has occurred in the past with Lustre 2.9.x ... - I'm not sure sure what the preferred solution is.
Again, I'd possibly point at the mailing list.

@xia-MM
Copy link
Author

xia-MM commented Dec 20, 2024

hi, sir, i build the lustre on Rocky 8.10. which kernel version do you install ? this is my kernel version : kernel-4.18.0-553.32.1.el8_10

@DetlevCM
Copy link
Owner

hi, sir, i build the lustre on Rocky 8.10. which kernel version do you install ? this is my kernel version : kernel-4.18.0-553.32.1.el8_10

Checking the test system (VM actually), this is the kernel used:
Linux localhost.localdomain 4.18.0-553.32.1.el8_lustre.x86_64 #1 SMP Tue Dec 17 13:09:14 CET 2024 x86_64 x86_64 x86_64 GNU/Linux

And as mentioned before, I only had to tweak the odd build number here or there.
-> This is obviously the patched kernel after building and installing it (hence the _lustre in the name).

@xia-MM
Copy link
Author

xia-MM commented Dec 20, 2024

hi, sir i want to ask to you private question, could you give me your e-mail ? i want to require to you some problem when i meet some difficult things during learn luster . i like to learn lustre follow you.

@xia-MM
Copy link
Author

xia-MM commented Jan 2, 2025

hi, sir, did you have ever install lustre+ZFS on the ubuntu ? i try to compile zfs +lustre, when i try to the command " ./configure --with-linux=/root/linux-source-5.4.0/ --enable-server --enable-modules --with-zfs=/root/zfs/zfs-2.1.2/ --enable-ldiskfs "
always disaply the following ouput
1a16633e3491ccf003c120faf94315d

@DetlevCM
Copy link
Owner

DetlevCM commented Jan 2, 2025

hi, sir, did you have ever install lustre+ZFS on the ubuntu ? i try to compile zfs +lustre, when i try to the command " ./configure --with-linux=/root/linux-source-5.4.0/ --enable-server --enable-modules --with-zfs=/root/zfs/zfs-2.1.2/ --enable-ldiskfs " always disaply the following ouput 1a16633e3491ccf003c120faf94315d

Short answer, no, I never tried.
If you are now building on Ubuntu (which I also haven't tried), I'd recommend treating ldiskfs and zfs distinctively.
Unless I am mistaken, zfs lustre can work without ldiskfs. so rather than building both at the same time, build each individually and see where the error comes from, then continue from there.

@xia-MM
Copy link
Author

xia-MM commented Jan 5, 2025

hi, sir, did you have install packmaker for MGS/MDS? how to install agent : ocf:llnl:lustre ?

@DetlevCM
Copy link
Owner

DetlevCM commented Jan 6, 2025

hi, sir, did you have install packmaker for MGS/MDS? how to install agent : ocf:llnl:lustre ?

Sorry, no.

I can only point you at the official lustre documentation: https://wiki.whamcloud.com/display/PUB/Using+Pacemaker+1.1+with+a+Lustre+File+System

@xia-MM
Copy link
Author

xia-MM commented Jan 7, 2025

hi,sir, which high available software do you choice ? when you build a lustre environment for MGS/MDS/OSS ?
and another problem, how to install pactchset for lustre ? for example , like the belowing patchset

image

@xia-MM
Copy link
Author

xia-MM commented Jan 9, 2025

hi,sir,, have your ever meet the following error during build zfs-osd . the lustre version:2.15.5 . zfs version: 2.11.

LD [M] /root/lustre-release/lustre/osc/osc.o
CC [M] /root/lustre-release/lustre/osd-zfs/osd_handler.o
CC [M] /root/lustre-release/lustre/obdclass/lprocfs_status_server.o
In file included from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/lustre-release/lustre/osd-zfs/osd_internal.h: In function ‘osd_find_dnsize’:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:787:9: error: ‘DN_MAX_BONUSLEN’ undeclared (first use in this function); did you mean ‘DN_MAX_BONUS_LEN’?
787 | return DN_MAX_BONUSLEN;
| ^~~~~~~~~~~~~~~
| DN_MAX_BONUS_LEN
/root/lustre-release/lustre/osd-zfs/osd_internal.h:787:9: note: each undeclared identifier is reported only once for each function it appears in
/root/lustre-release/lustre/osd-zfs/osd_internal.h: At top level:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:848:20: error: static declaration of ‘dsl_pool_config_enter’ follows non-static declaration
848 | static inline void dsl_pool_config_enter(dsl_pool_t *dp, void *name)
| ^~~~~~~~~~~~~~~~~~~~~
In file included from /root/zfs/include/sys/dsl_deleg.h:30,
from /root/zfs/include/sys/zfs_ioctl.h:34,
from /root/zfs/include/os/linux/zfs/sys/zfs_vfsops_os.h:38,
from /root/zfs/include/sys/zfs_vfsops.h:30,
from /root/zfs/include/sys/zfs_fuid.h:32,
from /root/zfs/include/sys/zfs_acl.h:35,
from /root/zfs/include/sys/zfs_znode.h:30,
from /root/lustre-release/lustre/osd-zfs/osd_internal.h:56,
from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/zfs/include/sys/dsl_pool.h:181:6: note: previous declaration of ‘dsl_pool_config_enter’ was here
181 | void dsl_pool_config_enter(dsl_pool_t *dp, void *tag);
| ^~~~~~~~~~~~~~~~~~~~~
In file included from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:852:20: error: static declaration of ‘dsl_pool_config_exit’ follows non-static declaration
852 | static inline void dsl_pool_config_exit(dsl_pool_t *dp, void *name)
| ^~~~~~~~~~~~~~~~~~~~
In file included from /root/zfs/include/sys/dsl_deleg.h:30,
from /root/zfs/include/sys/zfs_ioctl.h:34,
from /root/zfs/include/os/linux/zfs/sys/zfs_vfsops_os.h:38,
from /root/zfs/include/sys/zfs_vfsops.h:30,
from /root/zfs/include/sys/zfs_fuid.h:32,
from /root/zfs/include/sys/zfs_acl.h:35,
from /root/zfs/include/sys/zfs_znode.h:30,
from /root/lustre-release/lustre/osd-zfs/osd_internal.h:56,
from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/zfs/include/sys/dsl_pool.h:183:6: note: previous declaration of ‘dsl_pool_config_exit’ was here
183 | void dsl_pool_config_exit(dsl_pool_t *dp, void *tag);
| ^~~~~~~~~~~~~~~~~~~~
In file included from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:863: error: "SPA_OLD_MAXBLOCKSIZE" redefined [-Werror]
863 | #define SPA_OLD_MAXBLOCKSIZE SPA_MAXBLOCKSIZE
|
In file included from /root/zfs/include/sys/spa.h:44,
from /root/zfs/include/sys/zio.h:39,
from /root/zfs/include/sys/arc.h:38,
from /root/lustre-release/lustre/osd-zfs/osd_internal.h:51,
from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/zfs/include/sys/fs/zfs.h:1642: note: this is the location of the previous definition
1642 | #define SPA_OLD_MAXBLOCKSIZE (1ULL << SPA_OLD_MAXBLOCKSHIFT)
|
In file included from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/lustre-release/lustre/osd-zfs/osd_internal.h: In function ‘osd_dmu_object_alloc’:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:937:5: error: ‘DN_MAX_BONUSLEN’ undeclared (first use in this function); did you mean ‘DN_MAX_BONUS_LEN’?
937 | DN_MAX_BONUSLEN, tx);
| ^~~~~~~~~~~~~~~
| DN_MAX_BONUS_LEN
/root/lustre-release/lustre/osd-zfs/osd_internal.h: In function ‘osd_zap_create_flags’:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:947:5: error: ‘DN_MAX_BONUSLEN’ undeclared (first use in this function); did you mean ‘DN_MAX_BONUS_LEN’?
947 | DN_MAX_BONUSLEN, tx);
| ^~~~~~~~~~~~~~~
| DN_MAX_BONUS_LEN
/root/lustre-release/lustre/osd-zfs/osd_internal.h: In function ‘osd_obj_bonuslen’:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:953:9: error: ‘DN_MAX_BONUSLEN’ undeclared (first use in this function); did you mean ‘DN_MAX_BONUS_LEN’?
953 | return DN_MAX_BONUSLEN;
| ^~~~~~~~~~~~~~~
| DN_MAX_BONUS_LEN
/root/lustre-release/lustre/osd-zfs/osd_handler.c: In function ‘osd_objset_open’:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:1125:42: error: incompatible type for argument 4 of ‘dmu_objset_own’
1125 | dmu_objset_own((name), (type), (ronly), (tag), (os))
| ^~~~~
| |
| struct osd_device *
/root/lustre-release/lustre/osd-zfs/osd_handler.c:937:8: note: in expansion of macro ‘osd_dmu_objset_own’
937 | rc = -osd_dmu_objset_own(o->od_mntdev, DMU_OST_ZFS,
| ^~~~~~~~~~~~~~~~~~
In file included from /root/zfs/include/sys/spa.h:46,
from /root/zfs/include/sys/zio.h:39,
from /root/zfs/include/sys/arc.h:38,
from /root/lustre-release/lustre/osd-zfs/osd_internal.h:51,
from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/zfs/include/sys/dmu.h:331:35: note: expected ‘boolean_t’ {aka ‘enum ’} but argument is of type ‘struct osd_device *’
331 | boolean_t readonly, boolean_t key_required, void *tag, objset_t **osp);
| ~~~~~~~~~~^~~~~~~~~~~~
In file included from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:1125:2: error: too few arguments to function ‘dmu_objset_own’
1125 | dmu_objset_own((name), (type), (ronly), (tag), (os))
| ^~~~~~~~~~~~~~
/root/lustre-release/lustre/osd-zfs/osd_handler.c:937:8: note: in expansion of macro ‘osd_dmu_objset_own’
937 | rc = -osd_dmu_objset_own(o->od_mntdev, DMU_OST_ZFS,
| ^~~~~~~~~~~~~~~~~~
In file included from /root/lustre-release/lustre/osd-zfs/osd_internal.h:59,
from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/zfs/include/sys/dmu_objset.h:214:5: note: declared here
214 | int dmu_objset_own(const char *name, dmu_objset_type_t type,
| ^~~~~~~~~~~~~~
In file included from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:1133:26: error: incompatible type for argument 2 of ‘dmu_objset_disown’
1133 | dmu_objset_disown((os), (tag))
| ^~~~~
| |
| struct osd_device *
/root/lustre-release/lustre/osd-zfs/osd_handler.c:1004:3: note: in expansion of macro ‘osd_dmu_objset_disown’
1004 | osd_dmu_objset_disown(o->od_os, B_TRUE, o);
| ^~~~~~~~~~~~~~~~~~~~~
In file included from /root/zfs/include/sys/spa.h:46,
from /root/zfs/include/sys/zio.h:39,
from /root/zfs/include/sys/arc.h:38,
from /root/lustre-release/lustre/osd-zfs/osd_internal.h:51,
from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/zfs/include/sys/dmu.h:333:48: note: expected ‘boolean_t’ {aka ‘enum ’} but argument is of type ‘struct osd_device *’
333 | void dmu_objset_disown(objset_t *os, boolean_t key_required, void *tag);
| ~~~~~~~~~~^~~~~~~~~~~~
In file included from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:1133:2: error: too few arguments to function ‘dmu_objset_disown’
1133 | dmu_objset_disown((os), (tag))
| ^~~~~~~~~~~~~~~~~
/root/lustre-release/lustre/osd-zfs/osd_handler.c:1004:3: note: in expansion of macro ‘osd_dmu_objset_disown’
1004 | osd_dmu_objset_disown(o->od_os, B_TRUE, o);
| ^~~~~~~~~~~~~~~~~~~~~
In file included from /root/lustre-release/lustre/osd-zfs/osd_internal.h:59,
from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/zfs/include/sys/dmu_objset.h:223:6: note: declared here
223 | void dmu_objset_disown(objset_t *os, boolean_t decrypt, void *tag);
| ^~~~~~~~~~~~~~~~~
In file included from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/lustre-release/lustre/osd-zfs/osd_handler.c: In function ‘osd_mount’:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:1133:26: error: incompatible type for argument 2 of ‘dmu_objset_disown’
1133 | dmu_objset_disown((os), (tag))
| ^~~~~
| |
| struct osd_device *
/root/lustre-release/lustre/osd-zfs/osd_handler.c:1248:3: note: in expansion of macro ‘osd_dmu_objset_disown’
1248 | osd_dmu_objset_disown(o->od_os, B_TRUE, o);
| ^~~~~~~~~~~~~~~~~~~~~
In file included from /root/zfs/include/sys/spa.h:46,
from /root/zfs/include/sys/zio.h:39,
from /root/zfs/include/sys/arc.h:38,
from /root/lustre-release/lustre/osd-zfs/osd_internal.h:51,
from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/zfs/include/sys/dmu.h:333:48: note: expected ‘boolean_t’ {aka ‘enum ’} but argument is of type ‘struct osd_device *’
333 | void dmu_objset_disown(objset_t *os, boolean_t key_required, void *tag);
| ~~~~~~~~~~^~~~~~~~~~~~
In file included from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:1133:2: error: too few arguments to function ‘dmu_objset_disown’
1133 | dmu_objset_disown((os), (tag))
| ^~~~~~~~~~~~~~~~~
/root/lustre-release/lustre/osd-zfs/osd_handler.c:1248:3: note: in expansion of macro ‘osd_dmu_objset_disown’
1248 | osd_dmu_objset_disown(o->od_os, B_TRUE, o);
| ^~~~~~~~~~~~~~~~~~~~~
In file included from /root/lustre-release/lustre/osd-zfs/osd_internal.h:59,
from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/zfs/include/sys/dmu_objset.h:223:6: note: declared here
223 | void dmu_objset_disown(objset_t *os, boolean_t decrypt, void *tag);
| ^~~~~~~~~~~~~~~~~
In file included from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/lustre-release/lustre/osd-zfs/osd_handler.c: In function ‘osd_umount’:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:1133:26: error: incompatible type for argument 2 of ‘dmu_objset_disown’
1133 | dmu_objset_disown((os), (tag))
| ^~~~~
| |
| struct osd_device *
/root/lustre-release/lustre/osd-zfs/osd_handler.c:1295:3: note: in expansion of macro ‘osd_dmu_objset_disown’
1295 | osd_dmu_objset_disown(o->od_os, B_TRUE, o);
| ^~~~~~~~~~~~~~~~~~~~~
In file included from /root/zfs/include/sys/spa.h:46,
from /root/zfs/include/sys/zio.h:39,
from /root/zfs/include/sys/arc.h:38,
from /root/lustre-release/lustre/osd-zfs/osd_internal.h:51,
from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/zfs/include/sys/dmu.h:333:48: note: expected ‘boolean_t’ {aka ‘enum ’} but argument is of type ‘struct osd_device *’
333 | void dmu_objset_disown(objset_t *os, boolean_t key_required, void *tag);
| ~~~~~~~~~~^~~~~~~~~~~~
In file included from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/lustre-release/lustre/osd-zfs/osd_internal.h:1133:2: error: too few arguments to function ‘dmu_objset_disown’
1133 | dmu_objset_disown((os), (tag))
| ^~~~~~~~~~~~~~~~~
/root/lustre-release/lustre/osd-zfs/osd_handler.c:1295:3: note: in expansion of macro ‘osd_dmu_objset_disown’
1295 | osd_dmu_objset_disown(o->od_os, B_TRUE, o);
| ^~~~~~~~~~~~~~~~~~~~~
In file included from /root/lustre-release/lustre/osd-zfs/osd_internal.h:59,
from /root/lustre-release/lustre/osd-zfs/osd_handler.c:51:
/root/zfs/include/sys/dmu_objset.h:223:6: note: declared here
223 | void dmu_objset_disown(objset_t *os, boolean_t decrypt, void *tag);
| ^~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[8]: *** [scripts/Makefile.build:275: /root/lustre-release/lustre/osd-zfs/osd_handler.o] Error 1
make[7]: *** [scripts/Makefile.build:522: /root/lustre-release/lustre/osd-zfs] Error 2
make[7]: *** Waiting for unfinished jobs....
CC [M] /root/lustre-release/lustre/obdclass/lu_ucred.o
CC [M] /root/lustre-release/lustre/obdclass/md_attrs.o
CC [M] /root/lustre-release/lustre/obdclass/obd_mount_server.o
CC [M] /root/lustre-release/lustre/obdclass/obdo_server.o
CC [M] /root/lustre-release/lustre/obdclass/scrub.o
CC [M] /root/lustre-release/lustre/obdclass/llog_test.o
LD [M] /root/lustre-release/lustre/obdclass/obdclass.o
make[6]: *** [scripts/Makefile.build:522: /root/lustre-release/lustre] Error 2
make[5]: *** [Makefile:1757: /root/lustre-release] Error 2
make[5]: Leaving directory '/usr/src/linux-headers-5.4.0-62-generic'
make[4]: *** [autoMakefile:1151: modules] Error 2
make[4]: Leaving directory '/root/lustre-release'
make[3]: *** [autoMakefile:689: all-recursive] Error 1
make[3]: Leaving directory '/root/lustre-release'
make[2]: *** [autoMakefile:551: all] Error 2
make[2]: Leaving directory '/root/lustre-release'
make[1]: *** [debian/rules:229: build-stamp] Error 2
make[1]: Leaving directory '/root/lustre-release'
dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2
cp: cannot stat '/control.bkp': No such file or directory
cp: cannot stat '/control.main.bkp': No such file or directory
cp: cannot stat '/control.modules.in.bkp': No such file or directory
make: *** [autoMakefile:1301: debs] Error 2

@DetlevCM
Copy link
Owner

I haven't personally built lustre with zfs, so I cannot comment.

@xia-MM
Copy link
Author

xia-MM commented Jan 13, 2025

hi,sir, which high available software do you choice ? when you build a lustre environment for MGS/MDS/OSS ?
and another problem, how to install pactchset for lustre ? for example , like the belowing patchset:
image

@DetlevCM
Copy link
Owner

hi,sir, which high available software do you choice ? when you build a lustre environment for MGS/MDS/OSS ? and another problem, how to install pactchset for lustre ? for example , like the belowing patchset: image

As to patches: when patches are release, to the best of my knowledge, these are applied to the source code you would need to recompile the software and replace binaries/libraries.

@xia-MM
Copy link
Author

xia-MM commented Jan 13, 2025

hi, sorry for disturb you again. did you meeting the following error messages ,when install e2fsprogs software on the ubuntu.

71b089714e7d5982e5a1725351c4183

@RonnieSahlberg
Copy link

I think the original issue here is the same as for the issue I opened right now for the 9.4 instructions.

The problem is that the instructions reference an older version of the 9.4 or 8.10 kernel src.rpm
but lustre-release has moved on and synced up to a more modern version of these kernel rpms.

In the case of 9.4 this is the ldisk patch that no longer applies and causes the build to break ./ldiskfs/kernel_patches/patches/rhel9.4/ext4-delayed-iput.patch

Same error: 2 out of 4 hunks FAILED -- saving rejects to file fs/ext4/xattr.c.rej

Here as well you will need to update the instructions to point to the most recent 8.10 kernel src rpm.
Search and replace 553.22 with 553.27 and I think it will work.

@xia-MM
Copy link
Author

xia-MM commented Jan 19, 2025

hi, RonnieSahlberg: OK, i will be try .and i will be reply to the result.

@DetlevCM
Copy link
Owner

I think the original issue here is the same as for the issue I opened right now for the 9.4 instructions.

The problem is that the instructions reference an older version of the 9.4 or 8.10 kernel src.rpm but lustre-release has moved on and synced up to a more modern version of these kernel rpms.

In the case of 9.4 this is the ldisk patch that no longer applies and causes the build to break ./ldiskfs/kernel_patches/patches/rhel9.4/ext4-delayed-iput.patch

Same error: 2 out of 4 hunks FAILED -- saving rejects to file fs/ext4/xattr.c.rej

Here as well you will need to update the instructions to point to the most recent 8.10 kernel src rpm. Search and replace 553.22 with 553.27 and I think it will work.

Indeed, there is a problem with the kernel moving:
Basically, when the kernel updates, the lustre patches need to update: remove superfluous patches, add new ones.
Sometimes changes are also minor; then there is no need for changes, thus when the kernel is already supported, all is well and it becomes easy to tweak a couple of paths and all works.

I recently encountered an issue with Rocky Linux where installing 9.4 now will lead to up to date sources for a 9.5 kernel being downloaded... - The rpms no longer build - until the patches are available (and I hope they are now) as something changed in the kernel.
Extremely frustrating and annoying and it makes me wonder about Rocky's reliability... Why does the OS version on an "enterprise linux" change without the user explicitly upgrading...
Of course we'd like our lustre (and other software)à to be always up to date or ahead, but in reality this is not always possible...
What it does, is add another complication... - Keeping track with kernel versions as well as certain breaking changes in the kernel...

@xia-MM
Copy link
Author

xia-MM commented Jan 23, 2025

always up to date lustre's version , so that can get new feature .but will be meet new bug or new problem . everytimes when i deploy produce environment at the customer field. i strong aggainst recomment lastest version.

@DetlevCM
Copy link
Owner

always up to date lustre's version , so that can get new feature .but will be meet new bug or new problem . everytimes when i deploy produce environment at the customer field. i strong aggainst recomment lastest version.

This is where the prebuilt binaries from Whamcloud come into play.
Alternatively, you could also build your own rpms and use those. I have done this with Rocky 9.4 in test configurations... Notices that I have build issues as the kernel advanced and then just used the rpms I had already built...

As it stands, I don't think there is a good short term answer due to the number of patches and adjustments made on the lustre side.
However I believe the lustre developers are looking into reducing the number of patches and upstreaming some of the work into the kernel.

@RonnieSahlberg
Copy link

The lustre developer do not have any influence of what changes RedHat makes between minor releases which can cause the Lustre out-of-tree patches to break.

Maybe you need to change the instructions to lock the lustre version to a specific sha1 commit that is known to work.
I.e. when you build 9.4 against lustre, do not build against current master of lustre as it might have moved to match a more modern version of the 9.4 kernel source rpm.
Instead checkout and build against a known compatible sha1 commit of lustre.

@xia-MM
Copy link
Author

xia-MM commented Jan 23, 2025

hi, my meaning, if i am according ChangeLogs strictly about kernel version and lustre version . compile with source code, maybe i will be unsuccessful .

@RonnieSahlberg
Copy link

What i means is, If you want to use kernel kernel-5.14.0-427.37.1.el9_4 and build lustre-release
you can NOT use lustre-release master branch as it has moved on to work with a later kernel, which conflicts with the patchews in lustre-release.

But IF you want it to work, you can checkout an older version of lustre-release before the incompatible changes for post 5.14.0-37.1 was made.

do this:
cd lustre-release
git checkout 8a7703eec9bb77a0dd85047a04910d30eb8843aa

and that will allow you to build against a 5.14.0-427.37.1 kernel.

@xia-MM
Copy link
Author

xia-MM commented Feb 5, 2025

hi,sir, intend to consult another question: when i try to installation Rocky Linux9.4 , display the following messages:

error: unable to find a match: kernel-abi-whitelists .

how to get kernel-abi-whitelists ?

@RonnieSahlberg
Copy link

The KABI lists are in the SOURCES directory when you unpack the srpm. it is the kabi files.

Maybe you can disable the kabi checks? The kabi checks are just for redhat to make sure that they do not change any of the APIs that important out of tree modules (that redhat cares about) change.

It s a list of ~700 out of ~30.000 kernel symbols that are important that they do not change because some important third party vendor modules depend on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants