-
Notifications
You must be signed in to change notification settings - Fork 109
Issues: NVIDIA/NeMo-Curator
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Re-add Something isn't working
test_read_data_different_columns_blocksize
bug
#557
opened Feb 18, 2025 by
sarahyurick
Extend support to non-English languages for PII Deidentifier
enhancement
New feature or request
#554
opened Feb 18, 2025 by
hamsarajan
[FEA] Add Sampling-Based Clustering in SemDedup
enhancement
New feature or request
#538
opened Feb 11, 2025 by
VibhuJawa
[FEA] Remove GPU-related messages on CPU-only servers
enhancement
New feature or request
#535
opened Feb 10, 2025 by
miguelusque
The output list length of open question pipeline is wrong ?
bug
Something isn't working
#533
opened Feb 10, 2025 by
leadtekleadtek
Refactor separate_by_metadata and Partition On to use the same code paths.
enhancement
New feature or request
#524
opened Feb 5, 2025 by
VibhuJawa
torch.OutOfMemoryError: CUDA out of memory. while performing peft curation with sdg on default configs
bug
Something isn't working
#520
opened Feb 5, 2025 by
mohit5tech
Unifying Deduplication API Modules
enhancement
New feature or request
#516
opened Feb 4, 2025 by
praateekmahajan
Inconsistent filter modules behavior
enhancement
New feature or request
#515
opened Feb 4, 2025 by
zxnie
Consecutive execution of fuzzy deduplication on different columns fails with errors
bug
Something isn't working
#501
opened Jan 29, 2025 by
sarahyurick
[FEA] Enable Best Fit Packing
enhancement
New feature or request
#492
opened Jan 21, 2025 by
VibhuJawa
Post to internal slack if nightly tests fail
enhancement
New feature or request
#488
opened Jan 17, 2025 by
praateekmahajan
nemo_curator.utils.distributed_utils.read_data doesn't work for my own parquet dataset unless cleaning text by myself
bug
Something isn't working
#482
opened Jan 16, 2025 by
RickyShi46
false_positive_check=True need to add in ThWiki tutorial
bug
Something isn't working
#481
opened Jan 15, 2025 by
yangjingyi
default FTFY setting may induce undesirable results in some languages
enhancement
New feature or request
#476
opened Jan 9, 2025 by
AtsunoriFujita
Fuzzy Duplicates Identification fails on batched_merge_and_write when document dataset is read with blocksize
bug
Something isn't working
#462
opened Jan 2, 2025 by
praateekmahajan
jusText not work with Chinese webpage
bug
Something isn't working
#459
opened Dec 31, 2024 by
yangjingyi
Enable
test_aegis_classifier
and test_instruction_data_guard_classifier
#456
opened Dec 23, 2024 by
sarahyurick
Remove dependency on New feature or request
convert_str_id_to_int
in FuzzyDedup Scripts
enhancement
#447
opened Dec 20, 2024 by
praateekmahajan
Fuzzy dedup - minhash buckets and jaccard_map_buckets
bug
Something isn't working
#430
opened Dec 13, 2024 by
ms-leemina
Update minhash API after 25.02
enhancement
New feature or request
#426
opened Dec 11, 2024 by
ayushdg
Previous Next
ProTip!
Adding no:label will show everything without a label.