Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

怎么用lucaprot训练一个多分类的蛋白质模型呢,而不是二分类的 #72

Open
liaohu1231 opened this issue Oct 25, 2024 · 1 comment

Comments

@liaohu1231
Copy link

No description provided.

@LucaOne
Copy link

LucaOne commented Oct 26, 2024

参考:https://github.com/LucaOne/LucaOneTasks/blob/master/src/training/lucaone/run_ProtLoc_lucaone_linear.sh
这里只使用一个通路(这个任务只使用lucaone的embedding通路,因为目标是为了对比embedding的效果,你可以将这个脚本的INPUT_TYPE="seq_matrix"便是两通路,也就是增加原始序列的通路,其他参数按需修改,比如序列最大允许长度SEQ_MAX_LENGTH与matrix_max_length,用于最终checkpoint选择的BEST_METRIC_TYPE,以及训练learning rate与batch size等)。
你的任务数据集csv的格式参考这个任务,这个任务是6分类,csv里面最后一个字段是label index,label index与label name的映射在label.txt中。
建议使用LucaOne的embedding,我们使用了很多任务进行测试,LucaOne的embedding在很多任务上都超过esm的embedding。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants