(LOgogram)

This is the official implementation for our ACL 2024(Findings) paper: Unveiling the Art of Heading Design: A Harmonious Blend of Summarization, Neology, and Algorithm.

We introduce (LOgogram), a novel heading-generation benchmark comprising 6,653 paper abstracts with corresponding descriptions and acronyms as headings.

To measure the generation quality, we propose a set of evaluation metrics from three aspects: summarization, neology, and algorithm.

Additionally, we explore three strategies (generation ordering, tokenization, and framework design) under prelavent learning paradigms (supervised fine-tuning, reinforcement learning, and in-context learning with Large Language Models).

Environment Setup

We recommend you to create a new conda virtual environment for running codes in the repository:

conda create -n logogram python=3.8
conda activate logogram

Then install PyTorch 1.13.1. For example install with pip and CUDA 11.6:

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

Finally, install the remaining packages using pip:

pip install -r requirements.txt

1. Dataset Processing

1.1 Collection of Paper whose Heading Contains Acronyms

We crawl the ACL Anthology and then exclude examples whose headings do not contain acronyms.

The unfiltered dataset is saved in /raw-data/acl-anthology/data_acl_all.jsonl.

1.2 Apply Filtering Rules and Replace Acronyms in Abstracts with Masks

We further applied a set of tailored filtering rules based on data inspection to eliminate anomalies. Acronyms in the abstracts were replaced with a mask to prevent acronym leakage. The details are in src/data_processing.ipynb.

1.3 Dataset Statistics

We plot the distributions with regard to the text length and the publication number of our dataset in Figure 3 and 4 in our paper. To reproduce, see src/data_statistics.ipynb.

2. Justification of Metrics

We evaluate the generated headings from the summarization, neologistic, and algorithmic constraints. Specifically, we propose three novel metrics, WordLikeness (WL), WordOverlap (WO), and LCSRatio (LR) from the neologistic and algorithmic aspects. To justify our metrics, we also plot the density estimation of different metrics and their joint distribution in Figure 5 and 6, demonstrating that the gold-standard examples achieve high value in these metrics. To reproduce, see src/data_statistics.ipynb.

3. Apply Strategies under Learning Paradigms

3.1 Supervised Fine-Tuning (SFT) Paradigm

We fine-tune the T5 model and explore the effectiveness of the generation ordering, tokenization, and framework design strategies.

To fine-tune and inference (description then acronym, acronym subword-level tokenization, onestop framework), run:

accelerate launch t5_brute_finetune.py --model_name t5-base --model_mode abstract2description:shorthand --model_save_path ./models/t5-a2ds-token-base --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2ds-token-base/checkpoint-5 --model_mode abstract2description:shorthand --prediction_save_path ./prediction/brute_t5_a2ds_token_predictions.csv

To fine-tune and inference (acronym then description, acronym subword-level tokenization, onestop framework), run:

accelerate launch t5_brute_finetune.py --model_name t5-base --model_mode abstract2shorthand:description --model_save_path ./models/t5-a2sd-token-base --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2sd-token-base/checkpoint-5 --model_mode abstract2shorthand:description --prediction_save_path ./prediction/brute_t5_a2sd_token_predictions.csv

To fine-tune and inference (description then acronym, acronym letter-level tokenization, onestop framework), run:

accelerate launch t5_brute_finetune.py --model_name t5-base --model_mode abstract2description:shorthand --shorthand_mode character --model_save_path ./models/t5-a2ds-char-base --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2ds-char-base/checkpoint-5 --model_mode abstract2description:shorthand --shorthand_mode character --prediction_save_path ./prediction/brute_t5_a2ds_char_predictions.csv

To fine-tune and inference (description then acronym, acronym subword-level tokenization, pipeline framework), run:

accelerate launch t5_brute_finetune.py --model_name t5-base --model_mode abstract2description --model_save_path ./models/t5-a2ds-token-pipe/1 --save_total_limit 1

accelerate launch t5_brute_finetune.py --model_name t5-base --model_mode abstract-description2shorthand --model_save_path ./models/t5-a2ds-token-pipe/2 --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2ds-token-pipe/1/checkpoint-5 --model_mode abstract2description --prediction_save_path ./prediction/brute_t5_a2ds_token_pipe_predictions.csv

accelerate launch t5_brute_inference.py --model_name models/t5-a2ds-token-pipe/2/checkpoint-5 --model_mode abstract-description2shorthand --prediction_save_path ./prediction/brute_t5_a2ds_token_pipe_predictions.csv

3.2 Reinforcement Learning (RL) Paradigm

The RL paradigm is built upon the foundation of the SFT paradigm. Specifically, we choose the Proximal Policy Optimization (PPO) algorithm. We evaluate all strategies with the exception of due to the relatively unexplored territory of feedback mechanisms within the RL paradigm for pipeline language models.

To further fine-tune and inference , run:

TOKENIZERS_PARALLELISM=false accelerate launch t5_ppo_finetune.py --model_mode abstract2description:shorthand --model_save_path ./models/t5-a2ds-token-ppo --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2ds-token-ppo --model_mode abstract2description:shorthand --prediction_save_path ./prediction/brute_t5_a2ds_token_ppo_predictions.csv

To further fine-tune and inference , run:

TOKENIZERS_PARALLELISM=false accelerate launch t5_ppo_finetune.py --model_mode abstract2shorthand:description --model_save_path ./models/t5-a2sd-token-ppo --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2sd-token-ppo --model_mode abstract2shorthand:description --prediction_save_path ./prediction/brute_t5_a2sd_token_ppo_predictions.csv

To further fine-tune and inference , run:

TOKENIZERS_PARALLELISM=false accelerate launch t5_ppo_finetune.py --model_mode abstract2description:shorthand --shorthand_mode character --model_save_path ./models/t5-a2ds-char-ppo --save_total_limit 1

accelerate launch t5_brute_inference.py --model_name models/t5-a2ds-char-ppo --model_mode abstract2description:shorthand --shorthand_mode character --prediction_save_path ./prediction/brute_t5_a2ds_char_ppo_predictions.csv

3.3 In-Context Learning with Large Language Models (ICL) Paradigm

To replicate the results of ICL, run the following code

python icl_main.py

The generation model can be selected from

onestop
onestop_sd:
onestop_char:
pipeline:

4. Evaluation

To evaluate the generated acronyms, run:

python run_eval.py \
    --file <CSV file> \
    --eval_type shorthand \
    --hypos-col <the column name of generated acronyms> \
    --refs-col <the column name of ground truth acronyms>

For descriptions, run:

python run_eval.py \
    --file <CSV file> \
    --eval_type description \
    --hypos-col <the column name of generated descriptions> \
    --refs-col <the column name of ground truth descriptions>

By default, the CSV files are saved in prediction/.

5. Citation

If you want to cite our dataset and paper, you can use this BibTex:

@inproceedings{cui-etal-2024-unveiling,
    title = "Unveiling the Art of Heading Design: A Harmonious Blend of Summarization, Neology, and Algorithm",
    author = "Cui, Shaobo  and
      Feng, Yiyang  and
      Mao, Yisong  and
      Hou, Yifan  and
      Faltings, Boi",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.368",
    pages = "6149--6174",
    abstract = "Crafting an appealing heading is crucial for attracting readers and marketing work or products. A popular way is to summarize the main idea with a refined description and a memorable acronym. However, there lacks a systematic study and a formal benchmark including datasets and metrics. Motivated by this absence, we introduce LOgogram, a novel benchmark comprising 6,653 paper abstracts with corresponding descriptions and acronyms. To measure the quality of heading generation, we propose a set of evaluation metrics from three aspects: summarization, neology, and algorithm. Additionally, we explore three strategies for heading generation(generation ordering, tokenization of acronyms, and framework design) under various prevalent learning paradigms(supervised fine-tuning, in-context learning with Large Language Models(LLMs), and reinforcement learning) on our benchmark. Our experimental results indicate the difficulty in identifying a practice that excels across all summarization, neologistic, and algorithmic aspects.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
baselines		baselines
data		data
dataset		dataset
icl		icl
image		image
metrics		metrics
model		model
prediction		prediction
raw-data		raw-data
reward		reward
src		src
training		training
trl		trl
utils		utils
LICENSE		LICENSE
README.md		README.md
csv_to_jsonl.py		csv_to_jsonl.py
finetune.py		finetune.py
generate.py		generate.py
gitattributes.txt		gitattributes.txt
icl_main.py		icl_main.py
jsonl_to_csv.py		jsonl_to_csv.py
main.py		main.py
requirements.txt		requirements.txt
rl_train.py		rl_train.py
run_eval.py		run_eval.py
t5_brute_finetune.py		t5_brute_finetune.py
t5_brute_inference.py		t5_brute_inference.py
t5_inferencer.py		t5_inferencer.py
t5_ppo_finetune.py		t5_ppo_finetune.py
t5_trainer.py		t5_trainer.py
train_baselines.py		train_baselines.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(LOgogram)

Environment Setup

1. Dataset Processing

1.1 Collection of Paper whose Heading Contains Acronyms

1.2 Apply Filtering Rules and Replace Acronyms in Abstracts with Masks

1.3 Dataset Statistics

2. Justification of Metrics

3. Apply Strategies under Learning Paradigms

3.1 Supervised Fine-Tuning (SFT) Paradigm

3.2 Reinforcement Learning (RL) Paradigm

3.3 In-Context Learning with Large Language Models (ICL) Paradigm

4. Evaluation

5. Citation

About

Releases

Packages

Languages

License

Wind-2375-like/logogram

Folders and files

Latest commit

History

Repository files navigation

(LOgogram)

Environment Setup

1. Dataset Processing

1.1 Collection of Paper whose Heading Contains Acronyms

1.2 Apply Filtering Rules and Replace Acronyms in Abstracts with Masks

1.3 Dataset Statistics

2. Justification of Metrics

3. Apply Strategies under Learning Paradigms

3.1 Supervised Fine-Tuning (SFT) Paradigm

3.2 Reinforcement Learning (RL) Paradigm

3.3 In-Context Learning with Large Language Models (ICL) Paradigm

4. Evaluation

5. Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages