Skip to content

Wind-2375-like/logogram

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python 3.8 MIT License

(LOgogram)

This is the official implementation for our ACL 2024(Findings) paper: Unveiling the Art of Heading Design: A Harmonious Blend of Summarization, Neology, and Algorithm.

We introduce (LOgogram), a novel heading-generation benchmark comprising 6,653 paper abstracts with corresponding descriptions and acronyms as headings.

To measure the generation quality, we propose a set of evaluation metrics from three aspects: summarization, neology, and algorithm.

Additionally, we explore three strategies (generation ordering, tokenization, and framework design) under prelavent learning paradigms (supervised fine-tuning, reinforcement learning, and in-context learning with Large Language Models).

Environment Setup

We recommend you to create a new conda virtual environment for running codes in the repository:

conda create -n logogram python=3.8
conda activate logogram

Then install PyTorch 1.13.1. For example install with pip and CUDA 11.6:

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

Finally, install the remaining packages using pip:

pip install -r requirements.txt

1. Dataset Processing

1.1 Collection of Paper whose Heading Contains Acronyms

We crawl the ACL Anthology and then exclude examples whose headings do not contain acronyms.

The unfiltered dataset is saved in /raw-data/acl-anthology/data_acl_all.jsonl.

1.2 Apply Filtering Rules and Replace Acronyms in Abstracts with Masks

We further applied a set of tailored filtering rules based on data inspection to eliminate anomalies. Acronyms in the abstracts were replaced with a mask to prevent acronym leakage. The details are in src/data_processing.ipynb.

1.3 Dataset Statistics

We plot the distributions with regard to the text length and the publication number of our dataset in Figure 3 and 4 in our paper. To reproduce, see src/data_statistics.ipynb.

2. Justification of Metrics

We evaluate the generated headings from the summarization, neologistic, and algorithmic constraints. Specifically, we propose three novel metrics, WordLikeness (WL), WordOverlap (WO), and LCSRatio (LR) from the neologistic and algorithmic aspects. To justify our metrics, we also plot the density estimation of different metrics and their joint distribution in Figure 5 and 6, demonstrating that the gold-standard examples achieve high value in these metrics. To reproduce, see src/data_statistics.ipynb.

3. Apply Strategies under Learning Paradigms

3.1 Supervised Fine-Tuning (SFT) Paradigm

We fine-tune the T5 model and explore the effectiveness of the generation ordering, tokenization, and framework design strategies.

  1. To fine-tune and inference (description then acronym, acronym subword-level tokenization, onestop framework), run:
accelerate launch t5_brute_finetune.py --model_name t5-base --model_mode abstract2description:shorthand --model_save_path ./models/t5-a2ds-token-base --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2ds-token-base/checkpoint-5 --model_mode abstract2description:shorthand --prediction_save_path ./prediction/brute_t5_a2ds_token_predictions.csv
  1. To fine-tune and inference (acronym then description, acronym subword-level tokenization, onestop framework), run:
accelerate launch t5_brute_finetune.py --model_name t5-base --model_mode abstract2shorthand:description --model_save_path ./models/t5-a2sd-token-base --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2sd-token-base/checkpoint-5 --model_mode abstract2shorthand:description --prediction_save_path ./prediction/brute_t5_a2sd_token_predictions.csv
  1. To fine-tune and inference (description then acronym, acronym letter-level tokenization, onestop framework), run:
accelerate launch t5_brute_finetune.py --model_name t5-base --model_mode abstract2description:shorthand --shorthand_mode character --model_save_path ./models/t5-a2ds-char-base --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2ds-char-base/checkpoint-5 --model_mode abstract2description:shorthand --shorthand_mode character --prediction_save_path ./prediction/brute_t5_a2ds_char_predictions.csv
  1. To fine-tune and inference (description then acronym, acronym subword-level tokenization, pipeline framework), run:
accelerate launch t5_brute_finetune.py --model_name t5-base --model_mode abstract2description --model_save_path ./models/t5-a2ds-token-pipe/1 --save_total_limit 1

accelerate launch t5_brute_finetune.py --model_name t5-base --model_mode abstract-description2shorthand --model_save_path ./models/t5-a2ds-token-pipe/2 --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2ds-token-pipe/1/checkpoint-5 --model_mode abstract2description --prediction_save_path ./prediction/brute_t5_a2ds_token_pipe_predictions.csv

accelerate launch t5_brute_inference.py --model_name models/t5-a2ds-token-pipe/2/checkpoint-5 --model_mode abstract-description2shorthand --prediction_save_path ./prediction/brute_t5_a2ds_token_pipe_predictions.csv

3.2 Reinforcement Learning (RL) Paradigm

The RL paradigm is built upon the foundation of the SFT paradigm. Specifically, we choose the Proximal Policy Optimization (PPO) algorithm. We evaluate all strategies with the exception of due to the relatively unexplored territory of feedback mechanisms within the RL paradigm for pipeline language models.

  1. To further fine-tune and inference , run:
TOKENIZERS_PARALLELISM=false accelerate launch t5_ppo_finetune.py --model_mode abstract2description:shorthand --model_save_path ./models/t5-a2ds-token-ppo --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2ds-token-ppo --model_mode abstract2description:shorthand --prediction_save_path ./prediction/brute_t5_a2ds_token_ppo_predictions.csv
  1. To further fine-tune and inference , run:
TOKENIZERS_PARALLELISM=false accelerate launch t5_ppo_finetune.py --model_mode abstract2shorthand:description --model_save_path ./models/t5-a2sd-token-ppo --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2sd-token-ppo --model_mode abstract2shorthand:description --prediction_save_path ./prediction/brute_t5_a2sd_token_ppo_predictions.csv
  1. To further fine-tune and inference , run:
TOKENIZERS_PARALLELISM=false accelerate launch t5_ppo_finetune.py --model_mode abstract2description:shorthand --shorthand_mode character --model_save_path ./models/t5-a2ds-char-ppo --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2ds-char-ppo --model_mode abstract2description:shorthand --shorthand_mode character --prediction_save_path ./prediction/brute_t5_a2ds_char_ppo_predictions.csv

3.3 In-Context Learning with Large Language Models (ICL) Paradigm

To replicate the results of ICL, run the following code

python icl_main.py

The generation model can be selected from

  • onestop
  • onestop_sd:
  • onestop_char:
  • pipeline:

4. Evaluation

To evaluate the generated acronyms, run:

python run_eval.py \
    --file <CSV file> \
    --eval_type shorthand \
    --hypos-col <the column name of generated acronyms> \
    --refs-col <the column name of ground truth acronyms>

For descriptions, run:

python run_eval.py \
    --file <CSV file> \
    --eval_type description \
    --hypos-col <the column name of generated descriptions> \
    --refs-col <the column name of ground truth descriptions>

By default, the CSV files are saved in prediction/.

5. Citation

If you want to cite our dataset and paper, you can use this BibTex:

@inproceedings{cui-etal-2024-unveiling,
    title = "Unveiling the Art of Heading Design: A Harmonious Blend of Summarization, Neology, and Algorithm",
    author = "Cui, Shaobo  and
      Feng, Yiyang  and
      Mao, Yisong  and
      Hou, Yifan  and
      Faltings, Boi",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.368",
    pages = "6149--6174",
    abstract = "Crafting an appealing heading is crucial for attracting readers and marketing work or products. A popular way is to summarize the main idea with a refined description and a memorable acronym. However, there lacks a systematic study and a formal benchmark including datasets and metrics. Motivated by this absence, we introduce LOgogram, a novel benchmark comprising 6,653 paper abstracts with corresponding descriptions and acronyms. To measure the quality of heading generation, we propose a set of evaluation metrics from three aspects: summarization, neology, and algorithm. Additionally, we explore three strategies for heading generation(generation ordering, tokenization of acronyms, and framework design) under various prevalent learning paradigms(supervised fine-tuning, in-context learning with Large Language Models(LLMs), and reinforcement learning) on our benchmark. Our experimental results indicate the difficulty in identifying a practice that excels across all summarization, neologistic, and algorithmic aspects.",
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 89.1%
  • Python 10.9%