Releases: RUCAIBox/RecBole
RecBole v1.2.0
RecBole v1.2.0 Release Notes
After a long period of hard work, we have completed the upgrade of RecBole and released a new version: RecBole v1.2.0!
In this release, we fully consider users' feedback and demands to improve the user friendliness of RecBole. First, we include more benchmark models and datasets to meet the latest needs of users. Secondly, we improve the benchmark framework by including commonly used data processing methods and efficient training and evaluation APIs, and also provide more support for result analysis and use. Thirdly, in order to improve the user experience, we provide more comprehensive project pages and documentation. According to the issues and discussions, we also fix a number of bugs and update the documentation to make it more user-friendly.
In a word, RecBole v1.2.0 is more efficient, convenient and flexible than previous versions. More details will be introduced in the following part:
- Highlights
- New Features
- Bug Fixes
- Code Refactor
- Docs
Highlights
The RecBole v1.2.0 release includes a quantity of wonderful new features, some bug fixes and code refactor. A few of the highlights include:
- We add 7 new models and 2 new datasets.
- More flexible data processing. We reframe the overall data flow with PyTorch towards a compatible data module and add more task-oriented data processing methods.
- More user-friendly documentations. We update the website and documentation with detailed descriptions including visualization of benchmark configurations and more practical examples of the customized training strategy, multi-GPU training cases and detailed running steps. Besides, we also develop a FAQ page based on the existing GitHub issues of RecBole.
New Features
- Add 7 new models:
- Add 2 new datasets: Music4All-Onion (#1668), Amazon-M2 (#1828).
- Add the pretrain method to ConvNCF (#1651).
- Support converting results to latex code (#1645).
- Support different eval dataloaders for valid and test phases (#1666).
Bug Fixes
- Model:
- Fix a bug in DIEN: mask the padding value in aux loss and add softmax to attention values (#1485).
- Fix the bug of deepcopy in DCNV2, xDeepFM, SpectralCF, FOSSIL, HGN, SHAN and SINE (#1488).
- Fix a bug in EASE: change the data format (#1497).
- Fix a bug in NeuMF: fix the
load_pretrain
function (#1502). - Fix a bug in LINE: add log function when computing loss (#1507).
- Fix the field counts for float-like features in
abstract_recommender.py
(#1603). - Fix a bug in GCMC: change the last dense layer to
dense_layer_v
for item hidden representations (#1635). - Fix a bug in KD_DAGFM: use xavier_normal_initialization to initialize embedding (#1641).
- Fix a bug in KSR: add an extra param
kg_embedding_size
(#1647). - Fix a bug in S3Rec: load
item_seq
from gpu to cpu for indexing (#1651). - Fix a bug in
AutoEncoderMixin
: convert tensors into the correct device (#1749). - Fix a bug in DGCF: correct l2 distance computation (#1845).
- Dataset:
- Trainer:
- Util:
- Evaluator:
- Fix
data.count_users
incollector.py
(#1526).
- Fix
- Config:
- Main:
- Fix bugs when collecting results from
mp.spawn
in multi-GPU training (#1875).
- Fix bugs when collecting results from
- Typo:
- Fix typos in
dataset_list.json
(#1756).
- Fix typos in
Code Refactor
- Refactor all autoencoder models: add class
AutoEncoderMixin
and only set rating matrix to cuda whenget_rating_matrix
is called (#1491). - Refactor BERT4Rec: align with the original paper (#1522, #1639, #1859).
Docs
- Mask the ip information (#1479).
- Update docs of
train_neg_sample_args
parameter (#1513). - Add hypertune config docs (#1524).
- Add
model_list
anddataset_list
(#1525). - Add FiGNN to the
model_list
(#1548). - Add
numerical_feature
to docs (#1560). - Replace
neg_sampling
withtrain_neg_sample_args
in docs (#1569, #1570). - Add docs of KD_DAGFM (#1642).
- Add significant test (#1644).
- Add the rst file of FiGNN, KD_DAGFM and RecVAE (#1650).
- Add update for SIGIR 2023 in
README.md
(#1662). - Update
requirement.txt
(#1870).
RecBole v1.1.1
RecBole v1.1.1 Release Notes
After more than half a year of hard work, we have completed the upgrade of RecBole and released a new version: RecBole v1.1.1 !
In this release, we fully consider users' feedback and demands to improve the user friendliness of RecBole. Specifically, we update several commonly used mainstream data processing methods and reconstruct our data module to be compatible with a series of efficient data processing APIs. Meanwhile, we implement distributed training and parallel tuning modules to accelerate models with large-scale data. According to the issues and discussions, we also fix a number of bugs and update the documentation to make it more user-friendly.
In a word, RecBole v1.1.1 is more efficient, convenient and flexible than previous versions. More details will be introduced in the following part:
- Highlights
- New Features
- Bug Fixes
- Code Refactor
- Docs
Highlights
The RecBole v1.1.1 release includes a quantity of wonderful new features, some bug fixes and code refactor. A few of the highlights include:
- We add 5 new models into RecBole.
- More flexible data processing. We add data transformation for sequential models, discretization of continuous features for context-aware models and knowledge graph filtering for knowledge-aware models.
- More efficient training and tuning. We add three components in RecBole: multi-GPU training, mixed precision training and intelligent hyperparameter tuning, which makes it more efficient to deal with the large-scale data in different recommendation scenarios.
- More reproducible configurations. To further facilitate the search process of hyper parameters, we provide the hyper-parameter selection range and recommended configurations for each model on three datasets, covering four types of recommendation tasks.
- More user-friendly documentation. We add detailed running examples and run-time configurations for all kinds of recommendation tasks.
New Features
- Add 5 new models:
- Add ipynb tutorials of prediction in run_example (#1229).
- Support mixed precision training (#1337).
- Add the implemention of distributed recommendation (#1338).
- Support data filtering of knowledge graph (#1342).
- Support counting of FLOPs (#1345).
- Add Python code formatting in github action according to PEP8 (#1349).
- Add non-ergodic hyper-parameter search strategy (#1350).
- Add float feature field discretization (#1352).
- Support hyper-parameter search using Ray (#1360, #1411).
- Add data transform (#1380).
- Add benchmark into RecBole (#1416).
Bug Fixes
- Model:
- Fix a bug in
abstract_recommender.py
: updateembed_input_fields
function (#1177) - Fix a bug in SGL: remove the device in embedding layer (#1180).
- Fix a bug in NeuMF: updated the copy method of model parameters (#1186).
- Fix the code in SRGNN: code optimization of SRGNN (#1217).
- Fix UserWarning in LightGCN, NGCF, NCL, SGL and SimpleX: add
np.array()
inget_norm_adj_mat
andcsr2tensor
(#1225, #1397). - Fix a bug in CORE: remove
item_seq_len
inforward
(#1379). - Fix a bug in FwFMs: update
float_embeddings
andfwfm_layer
in FwFMs (#1414).
- Fix a bug in
- Dataset:
- Fix the bug in
Interaction
when input tensor is 0-d tensor (#1188). - Fix the bug of
unused_col
is not used when usingbenchmark_file
(#1301). - Fix dataloader random factors (#1340).
- Delete transform log (#1385).
- Fix serialize bug when save/load dataloaders (#1386).
- Fix the funtion of
history_item_matrix
(#1405). - Fix compatibility issues for
dataloader.dataset
(#1475).
- Fix the bug in
- Trainer:
- Util:
- Config:
- Typo:
- Fix typo of
ValueError
indataset._get_download_url
(#1190).
- Fix typo of
Code Refactor
- Refractor the negative sampling: use
train_neg_sample_args(dict)
instead ofneg_sampling(dict)
(#1343). - Refractor the log: (1) add hash config and rename log file (#1341). (2) add model and dataset name to log file (#1381).
- Refractor the test process: add tests for hyper-tuning (#1361).
- Refractor the
configurator
: add warning for old parameter (#1367). - Refractor the popularity sampling: add alpha parameter for popularity sampling distribution (#1382).
- Refractor the
FPMC
: add CE and BPR loss to FPMC (#1383). - Refractor the
run_hyper
: add display parameter for run_hyper (#1385).
Docs
-
Add supplementary description of DMF (#1194).
-
Fix the authors for SR-GNN in docs (#1204).
-
Add sequential, context and knowledge quick start (#1351).
-
Fix docstring warning when making html files (#1353).
-
Fix some documentation typo (#1359).
-
Add docs for Distributed DataParallell (#1362).
-
Insert
eval_collector.data_collect
when evaluate from checkpoint (#1364). -
Modify
neg_sampling
intotrain_neg_sample_args
in sequential docs (#1365). -
Update open source contributions, model list and add constraints for purpose in README (#1371, #1457).
-
Fix warnings in docs and modify configuration (#1373).
-
Rename
neg_sampling
totrain_neg_sample_args
(#1383). -
Fix description of mixed precision training and Ray (#1407).
RecBole v1.1.0
RecBole v1.1.0 Release Notes
After more than half a year of hard work, we have completed the upgrade of RecBole and released a new version: RecBole v1.1.0 !
In this release, we fully consider users' feedback and demands to improve the user friendliness of RecBole. Specifically, we update several commonly used mainstream data processing methods and reconstruct our data module to be compatible with a series of efficient data processing APIs. Meanwhile, we implement distributed training and parallel tuning modules to accelerate models with large-scale data. According to the issues and discussions, we also fix a number of bugs and update the documentation to make it more user-friendly.
In a word, RecBole v1.1.0 is more efficient, convenient and flexible than previous versions. More details will be introduced in the following part:
- Highlights
- New Features
- Bug Fixes
- Code Refactor
- Docs
Highlights
The RecBole v1.1.0 release includes a quantity of wonderful new features, some bug fixes and code refactor. A few of the highlights include:
- We add 5 new models into RecBole.
- More flexible data processing. We add data transformation for sequential models, discretization of continuous features for context-aware models and knowledge graph filtering for knowledge-aware models.
- More efficient training and tuning. We add three components in RecBole: multi-GPU training, mixed precision training and intelligent hyperparameter tuning, which makes it more efficient to deal with the large-scale data in different recommendation scenarios.
- More reproducible configurations. To further facilitate the search process of hyper parameters, we provide the hyper-parameter selection range and recommended configurations for each model on three datasets, covering four types of recommendation tasks.
- More user-friendly documentation. We add detailed running examples and run-time configurations for all kinds of recommendation tasks.
New Features
- Add 5 new models:
- Add ipynb tutorials of prediction in run_example (#1229).
- Support mixed precision training (#1337).
- Add the implemention of distributed recommendation (#1338).
- Support data filtering of knowledge graph (#1342).
- Support counting of FLOPs (#1345).
- Add Python code formatting in github action according to PEP8 (#1349).
- Add non-ergodic hyper-parameter search strategy (#1350).
- Add float feature field discretization (#1352).
- Support hyper-parameter search using Ray (#1360, #1411).
- Add data transform (#1380).
- Add benchmark into RecBole (#1416).
Bug Fixes
- Model:
- Fix a bug in
abstract_recommender.py
: updateembed_input_fields
function (#1177) - Fix a bug in SGL: remove the device in embedding layer (#1180).
- Fix a bug in NeuMF: updated the copy method of model parameters (#1186).
- Fix the code in SRGNN: code optimization of SRGNN (#1217).
- Fix UserWarning in LightGCN, NGCF, NCL, SGL and SimpleX: add
np.array()
inget_norm_adj_mat
andcsr2tensor
(#1225, #1397). - Fix a bug in CORE: remove
item_seq_len
inforward
(#1379). - Fix a bug in FwFMs: update
float_embeddings
andfwfm_layer
in FwFMs (#1414).
- Fix a bug in
- Dataset:
- Fix the bug in
Interaction
when input tensor is 0-d tensor (#1188). - Fix the bug of
unused_col
is not used when usingbenchmark_file
(#1301). - Fix dataloader random factors (#1340).
- Delete transform log (#1385).
- Fix serialize bug when save/load dataloaders (#1386).
- Fix the funtion of
history_item_matrix
(#1405).
- Fix the bug in
- Trainer:
- Util:
- Config:
- Typo:
- Fix typo of
ValueError
indataset._get_download_url
(#1190).
- Fix typo of
Code Refactor
- Refractor the negative sampling: use
train_neg_sample_args(dict)
instead ofneg_sampling(dict)
(#1343). - Refractor the log: (1) add hash config and rename log file (#1341). (2) add model and dataset name to log file (#1381).
- Refractor the test process: add tests for hyper-tuning (#1361).
- Refractor the
configurator
: add warning for old parameter (#1367). - Refractor the popularity sampling: add alpha parameter for popularity sampling distribution (#1382).
- Refractor the
FPMC
: add CE and BPR loss to FPMC (#1383). - Refractor the
run_hyper
: add display parameter for run_hyper (#1385).
Docs
-
Add supplementary description of DMF (#1194).
-
Fix the authors for SR-GNN in docs (#1204).
-
Add sequential, context and knowledge quick start (#1351).
-
Fix docstring warning when making html files (#1353).
-
Fix some documentation typo (#1359).
-
Add docs for Distributed DataParallell (#1362).
-
Insert
eval_collector.data_collect
when evaluate from checkpoint (#1364). -
Modify
neg_sampling
intotrain_neg_sample_args
in sequential docs (#1365). -
Update open source contributions, model list and add constraints for purpose in README (#1371, #1457).
-
Fix warnings in docs and modify configuration (#1373).
-
Rename
neg_sampling
totrain_neg_sample_args
(#1383). -
Fix description of mixed precision training and Ray (#1407).
RecBole v1.0.1
RecBole v1.0.1 Release Notes
After nearly half a year, we summarized the recent updates and released a new version: RecBole v1.0.1!
In this version, we widely listen to users' feedbacks and suggestions to enhance the usability and stability of RecBole. With the collaboration of our teams and open source contributors, several wonderful features are added. According to the issues and discussions, we also fix some bugs and update the documentation for better user experience.
In a nutshell, RecBole v1.0.1 is more reliable, powerful and user-friendly than previous versions. More details will be introduced in the following part:
- Highlights
- New Features
- Bug Fixes
- Code Refactor
- Docs
Highlights
The RecBole v1.0.1 release includes a number of wonderful new features, some bug fixes and code refactor. A few of the highlights include:
- We add 5 new models into RecBole.
- We simplify the evaluator and now it's easier to implement a new metric by a class.
- We refractor the sampling module and support both static and dynamic negative sampling.
- Now you can log metrics to Weights and Biases for better visualization and comparison.
New Features
- Add 5 new models:
- We add Dynamic Negative Sampling (DNS) in
neg_sampling
(#1006). - We add
require_pow
inEmbLoss
(#1091). - We update the path of logging files to categorize experimental results by model names (#1102).
- We support visualization by Weights and Biases (#1138).
Bug Fixes
- Model:
- Fix a bug in LightGBM model:
load_model
is unsupported byBooster
and replaced byself.lgb.Booster
(#943). - Fix bugs in LightGBM and XGBoost models: update the saved model file (#973).
- Fix a bug in ENMF: use batch user instead of all users for loss calculation (#1002).
- Fix a bug in LightGCN: update the regularization loss with
require_pow=True
(#1091). - Fix a bug in BERT4Rec: fix
gather_indexes
ofseq_output
(#1115). - Fix a bug in
layer.py
: usenn.ModuleDict
in embedding layer ofContextSeqEmbAbstractLayer
(#1129).
- Fix a bug in LightGBM model:
- Dataset:
- Trainer:
- Sampler:
- Evaluator:
- Fix
used_info
: returntopk_idx
instead ofrec_mat
(#1019).
- Fix
- Case Study:
- Typo:
Code Refactor
- Refractor the
Trainer
: addPretrainTrainer
and simplifyRecVAETrainer
(#944). - Refractor the
Dataloader
: removeDataloaderType
(#944). - Refractor the negative sampling: use
neg_sampling (dict)
instead oftraining_neg_sample
(#944). - Refractor the evaluator and config: easier to implement a new metric by a class (#947).
- Refractor the
Dataset
: addDataset.fields
to source (#953). - Refractor the squeeze function: fix all bugs in squeeze function (#1025).
Docs
- Fix the format of docs in training and evaluation settings (#942).
- Update
customize_dataloaders.rst
and changetrain_eval_intro
intoevaluation_support
(#944). - Update
customize_metrics.rst
to customize a new metric easily (#947). - Update
parameter_dict
withneg_sampling
in docs of sequential models (#956). - Add
encoding (str)
inenvironment_settings.rst
(#966). - Fix the required lowest Python version from 3.6 to 3.7 (#1030).
- Fix model initialization in
use_modules.rst
(#1039, #1072, #1101). - Update description of
normalize_all
indata_settings.rst
(#1063, #1064).
RecBole v1.0.0
RecBole v1.0.0 Release Notes
After a long period of development, we finally finished the refactor of RecBole and released a new version: RecBole v1.0.0!
In this version, we widely listen to users' suggestions and carefully consider their needs. After many discussions with our teams, we re-designed the framework of RecBole.
In this time, we pay more attention to the users' experience and customize development. We simplify the config module, data module and evaluation module to improve the convenience of usage and the code readability, and some fascinating features are also added. According to the issues, we also fix some bugs and now RecBole is more reliable. You may find RecBole v1.0.0 changes a lot compared with the previous versions when you use it, but don't worry, we also update the docs for new version, which have more usage examples and become more user-friendly.
All in all, RecBole v1.0.0 must be the most powerful, wonderful and reliable version by far. We hope you will like it!
More details will be introduced in the following part:
- Highlights
- New Features
- Bug Fixes
- Code Refactor
- Docs
Highlights
The RecBole v1.0.0 release includes a number of wonderful new features, some bug fixes and code refactor. A few of the highlights include:
- We simplify the config settings and now it's much easier to set config.
- We add the automatically dataset downloading module. An initialization of a Dataset object will automatically download its processed atomic files (for datasets we have collected)
- We support the Tensorboard in model training and now you can get more visualizing information.
- We add 5 new evaluation metrics: [
ItemCoverage
,AveragePopularity
,GiniIndex
,ShannonEntropy
andTailPercentage
]. - The API Docs have been re-worked in this release to make them more consistent and updated to the latest code base, and in this time we add more usage examples to make it more user-friendly.
- Now you can save the
Dataset
andDataloader
automatically.
New Features
- Add automatically dataset downloading module (#851)
- Add 5 new evaluation metrics: [
ItemCoverage
,AveragePopularity
,GiniIndex
,ShannonEntropy
andTailPercentage
] (#865, #867, #869). - Support the
split_by_ratio
for sequential recommendation models (#873). - Support the Tensorboard in RecBole and remove the
draw_loss_pic
(#875). - Add example for running session-based recommendation benchmarks (#885)
- Support behavior sequence benchmark loading (#885)
- Add more methods for leave-one-out split. Now you can split the dataset into [train, test] or [train, valid] with leave-one-out split (#890).
- Add GPU usage in logger (#906).
- Support repeatable recommendation scene for non-sequential recommendation models (#909).
- Add early stopping in XGBoost and LightGBM (#928).
- We improve the save and load function in
Dataset
andDataloader
, and now you can save theDataset
andDataloader
automatically. (#939)
Bug Fixes
- Fix the bugs in Auto-Encoder models (#798).
- Fix the bugs in XGBoost and LightGBM (#922).
- Fix the bugs in xDeepFM (#937).
Code Refactor
- Refactor the evaluation config module (#862)
- Refactor the
remap_ID
inDataset
(#868) - Refactor the
SequentialDataset
(#873) - Refactor the
Dataloader
(#873, #876). - Refactor the evaluation module (#894).
- Refactor the sampler module (#903).
Docs
RecBole v0.2.1
RecBole v0.2.1 Release Notes
- Highlights
- New Features
- Bug Fixes
- Code Refactor
Highlights
The RecBole v0.2.1 release includes a number of wonderful new features, some bug fixes and code refactor. In this version, we pay more attention to improving user experience. A few of the highlights include:
- We add 7 new models into RecBole.
- We add colors to logger and now RecBole is "colorful".
Dataset
andDataloader
can be saved now, which makes RecBole much more flexible.- Now you can get training loss line graph of models by set
draw_loss_pic
.
New Features
- Add 7 new models:
- We add color to logger info, which makes logger much more clear (#761)
- We add
plot_train_loss()
in trainer, and now user can get training loss line graph of model(#724) - We add
dataset.save()
andsave_split_dataloader()
, and now users can save pre_processed dataset or pre_processed dataloaders and reload them for other models training. (#760) - We add other parameters (including model parameters) output in logger (#725)
- We add example code of case study and save/load in
run_example/
(#774) - We add
docs/
into RecBole (#735)
Bug Fixes
-
Fix a datatype bug in Windows, which may cause runtime error when run sequential models in Windows platform(#710)
-
Fix a bug in
general_dataloader
, which may cause runtime error whenContextFullDataLoader
is empty (#723)
Code Refactor
RecBole v0.2.0
RecBole v0.2.0 Release Notes
- Highlights
- New Features
- Improvements
- Bug Fixes
Highlights
The RecBole v0.2.0 release includes a number of new features, model efficiency improvements and bug fixes. A few of the highlights include:
- We add 12 new models into RecBole, including several non-sampling models and an external algorithm lib model: XGBoost.
- Case study is added to RecBole, which is helpful for users to analyze the model result (e.g: give an item ID and a user ID and get the score and ranking position of the item).
- We improve the efficiency of data loading and negative sampling.
- We now support the full ranking evaluation for context-aware recommendation models.
New Features
- Add 12 new models:
training_neg_sample_num
of pairwise_loss model now can be greater than1
(#533).- We add
training_neg_sample_distribution
in config setting to choose the negative sampling strategy during training (#534). - We add
benchmark_filename
in config setting to load pre-split dataset (#596). - Progress bar is added for training and evaluating (#618).
- We add
loss_decimal_place
andmetric_decimal_place
in config setting to control decimal place of loss and metric results separately (#625). - We add
GAUC
metric into evaluation (reference: Deep Interest Network for Click-Through Rate Prediction, KDD 2018) (#572). - We add
unused_col
in config setting to drop the columns only used in data preparation but not used in model (#559).
Improvements
- We support the ranking evaluation for context-aware recommendation models (#503).
- We improve the efficiency of data loading and negative sampling (#559).
- We remove the
pre_neg_sampling
in Dataloader, which is helpless to model training (#559). - We improve the underlying data structure of RecBole, which can promote efficiency of data processing (#559).
- We refactor the evaluation code (#572) and reformat the mode code (#647).
Bug Fixes
Model
- Fix a bug in NeuMF model: this bug may cause
dropout_ratio
disable (#629). - Fix a bug in NGCF and GCMC model: now the sparse dropout is disable during evaluation in NGCF and GCMC (#601).
- Fix a bug in DCN model: this bug may cause crash when running on CPU (#633).
- Fix a bug in BERT4Rec model: this bug may cause crash when running on CPU (#556).
Trainer
- Fix a bug in the
Trainer._generate_train_loss_output()
: this bug may cause the training log is missing (#559).
Data
- Fix a bug in the sampler: this bug may cause runtime error (#559)
Basic RecBole v0.1.2
RecBole v0.1.2 Release Notes
- Highlights
- New Features
- Improvements
- Bug Fixes
Highlights
The RecBole v0.1.2 release includes a number of new features, model efficiency improvements and bug fixes. A few of the highlights include:
- We add CI (Continuous Integration) to RecBole, which improves the efficiency and quality of our development.
- We improve the efficiency of GNN-based general recommendation models(NGCF, GCMC, LightGCN, SpectralCF) by refactoring the construction of sparse interaction tensor. In this way, usage of GPU RAM can be greatly reduced.
- Some bugs in models, trainer and data are fixed.
New Features
- Add gradient clipping. (#533)
- Add
Dataset
's attributestoken2id
andid2token
.(#511) - Add continuous integration. (#496)
Improvements
- Improve the efficiency of GNN-based general recommendation models (NGCF, GCMC, LightGCN, SpectralCF). (#525, #526)
SequentialDataloader
: add support for historical sequence of Non-SEQ features. (#547)
Bug Fixes
Model:
-
Fix a bug in the DMF model: crash when training set contains item without interactive records. ( #505)
-
Fix a bug in the TransRec model: this bug may cause runtime error. (#502)
Trainer:
-
Fix a bug in the trainer: this bug may cause the NeuMF model to report an error if
eval_batch_size=1
.(#537) -
Fix the bug that some models don't work when
epochs = 0
. (#499)
Data:
Basic RecBole v0.1.1
Basic RecBole