Skip to content
/ cscg Public

Code Generation as a Dual Task of Code Summarization.

License

Notifications You must be signed in to change notification settings

code-gen/cscg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

441e1b0 · Jun 28, 2021

History

36 Commits
Jun 28, 2021
Feb 1, 2020
Oct 20, 2020
Jun 28, 2021
Jun 28, 2021
Jun 28, 2021
Jun 28, 2021
Jun 28, 2021
Jun 28, 2021
Feb 5, 2020
Jun 28, 2021

Repository files navigation

Code Generation as a Dual Task of Code Summarization

render Code style: black

Ad-hoc implementation of the CS/CG model proposed by Wei et al.

Getting started

  • Each dataset must be defined as a sub-class of torch.utils.data.Dataset, with methods for
    • preprocessing and vocab builder (text -> vocab look-up indices)
    • __getitem__ which must return a training example
    • __len__
    • generating train/test/valid splits
    • computing language model probabilites (i.e. P(x), where x: anno/code tensor)

Computing LM probabilities

  • Get train/test/valid splits for a dataset.
  • Construct a configuration for the LM.
  • For each kind (anno/code), train a LM and dump the model as lm-{dataset_name}-{kind}.pt (e.g. lm-django-anno.pt).
  • Finally, using these models, compute P(x) for each x (anno/code tensor).

Reference

@article{wei2019code,
  title={Code Generation as a Dual Task of Code Summarization},
  author={Wei, Bolin and Li, Ge and Xia, Xin and Fu, Zhiyi and Jin, Zhi},
  journal={arXiv preprint arXiv:1910.05923},
  year={2019}
}