Code Generation as a Dual Task of Code Summarization

Ad-hoc implementation of the CS/CG model proposed by Wei et al.

Getting started

Each dataset must be defined as a sub-class of torch.utils.data.Dataset, with methods for
- preprocessing and vocab builder (text -> vocab look-up indices)
- __getitem__ which must return a training example
- __len__
- generating train/test/valid splits
- computing language model probabilites (i.e. P(x), where x: anno/code tensor)

Computing LM probabilities

Get train/test/valid splits for a dataset.
Construct a configuration for the LM.
For each kind (anno/code), train a LM and dump the model as lm-{dataset_name}-{kind}.pt (e.g. lm-django-anno.pt).
Finally, using these models, compute P(x) for each x (anno/code tensor).

Reference

@article{wei2019code,
  title={Code Generation as a Dual Task of Code Summarization},
  author={Wei, Bolin and Li, Ge and Xia, Xin and Fu, Zhiyi and Jin, Zhi},
  journal={arXiv preprint arXiv:1910.05923},
  year={2019}
}

Name	Name	Last commit message	Last commit date
Latest commit alexandru-dinu Remove bagoftools dependency & run black. Jun 28, 2021 441e1b0 · Jun 28, 2021 History 36 Commits
language_model	language_model	Remove bagoftools dependency & run black.	Jun 28, 2021
.gitignore	.gitignore	Better experiment managing.	Feb 1, 2020
LICENSE	LICENSE	Create LICENSE	Oct 20, 2020
README.md	README.md	Remove bagoftools dependency & run black.	Jun 28, 2021
cscg.ipynb	cscg.ipynb	Remove bagoftools dependency & run black.	Jun 28, 2021
dataset.py	dataset.py	Remove bagoftools dependency & run black.	Jun 28, 2021
lang.py	lang.py	Remove bagoftools dependency & run black.	Jun 28, 2021
loaders.py	loaders.py	Remove bagoftools dependency & run black.	Jun 28, 2021
namespace.py	namespace.py	Remove bagoftools dependency & run black.	Jun 28, 2021
notes.ipynb	notes.ipynb	Bulk updates.	Feb 5, 2020
requirements.txt	requirements.txt	Remove bagoftools dependency & run black.	Jun 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Generation as a Dual Task of Code Summarization

Getting started

Computing LM probabilities

Reference

About

Languages

License

code-gen/cscg

Folders and files

Latest commit

History

Repository files navigation

Code Generation as a Dual Task of Code Summarization

Getting started

Computing LM probabilities

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages