A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia

This repository contains the data and code to reproduce the results of our paper: https://arxiv.org/abs/2312.02073

Please use the following citation:

@misc{monea2023glitch,
      title={A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia}, 
      author={Giovanni Monea and Maxime Peyrard and Martin Josifoski and Vishrav Chaudhary and Jason Eisner and Emre Kıcıman and Hamid Palangi and Barun Patra and Robert West},
      year={2023},
      eprint={2312.02073},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Abstract: Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context. Yet the mechanisms underlying this contextual grounding remain unknown, especially in situations where contextual information contradicts factual knowledge stored in the parameters, which LLMs also excel at recalling. Favoring the contextual information is critical for retrieval-augmented generation methods, which enrich the context with up-to-date information, hoping that grounding can rectify outdated or noisy stored knowledge. We present a novel method to study grounding abilities using Fakepedia, a dataset of counterfactual texts constructed to clash with a model's internal parametric knowledge. We benchmark various LLMs with Fakepedia and then we conduct a causal mediation analysis, based on our Masked Grouped Causal Tracing (MGCT), on LLM components when answering Fakepedia queries. Within this analysis, we identify distinct computational patterns between grounded and ungrounded responses. We finally demonstrate that distinguishing grounded from ungrounded responses is achievable through computational analysis alone. Our results, together with existing findings about factual recall mechanisms, provide a coherent narrative of how grounding and factual recall mechanisms interact within LLMs.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
plots/causal_tracing		plots/causal_tracing
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
filter_data.py		filter_data.py
generate_base_fakepedia.py		generate_base_fakepedia.py
generate_extended_pararel.py		generate_extended_pararel.py
generate_multihop_fakepedia.py		generate_multihop_fakepedia.py
run_causal_tracing.py		run_causal_tracing.py
run_descriptive_analysis.py		run_descriptive_analysis.py
run_detection_analysis.py		run_detection_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia

About

Releases

Packages

Languages

License

epfl-dlab/llm-grounding-analysis

Folders and files

Latest commit

History

Repository files navigation

A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages