This repository hosts the work in progress related to our paper Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks
The original responses and abstracts are made available at summaries.csv
and abstracts_final.csv
. Additionally, the code to re-create the human intellgience task (HIT) is at hit.html
. The processed data, new batches, and predictions are left in data/
.
Here's a breakdown of the notebooks included in the repository and the purpose of each:
0_exploration.ipynb
- Processing the original responses.1_generation.ipynb
- Generating new responses using ChatGPT.2_prepare_for_training.ipynb
- Preparing the training data.3_finetuned_classifying.ipynb
- Plotting and detection post-training.
The process of fine-tuning the model was largely based on the code repository found here. We are in the process of integrating the relevant fine-tuning code into this repository.
Note: Information about the training of the model will be added once the process is completed.
@misc{veselovsky2023artificial,
title={Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks},
author={Veniamin Veselovsky and Manoel Horta Ribeiro and Robert West},
year={2023},
eprint={2306.07899},
archivePrefix={arXiv},
primaryClass={cs.CL}
}