DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination

QUniversity of Edinburgh, United Kingdom KCentre for AI, Data Science & Artificial Intelligence, R&D, AstraZeneca, United Kingdom VUniversity College London, United Kingdom AMiniml.AI, United Kingdom
DeCoRe reduces hallucinations in LLMs

DeCoRe reduces hallucinations in LLMs by contrasting outputs from the base model and a variant without retrieval heads, using predictive entropy to dynamically reject unfaithful responses

Abstract

Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context or incorrectly recalling internal knowledge.

Recent studies have identified specific attention heads within the Transformer architecture, known as retrieval heads, responsible for extracting relevant contextual information. We hypothesise that masking these retrieval heads can induce hallucinations and that contrasting the outputs of the base LLM and the masked LLM can reduce hallucinations.

To this end, we propose Decoding by Contrasting Retrieval Heads (DeCoRe), a novel training-free decoding strategy that amplifies information found in the context and model parameters. DeCoRe mitigates potentially hallucinated responses by dynamically contrasting the outputs of the base LLM and the masked LLM, using conditional entropy as a guide.

Our extensive experiments confirm that DeCoRe significantly improves performance on tasks requiring high contextual faithfulness, such as summarisation (XSum by 18.6%), instruction following (MemoTrap by 10.9%), and open-book question answering (NQ-Open by 2.4% and NQ-Swap by 5.5%)

Masking Retrieval Heads Induces Hallucination

We observe that the masking retrieval heads produces hallucinated responses, as shown in the example below.

Masking retrieval heads induces hallucination

The base model retrieves the correct answer from the substituted context, while the masked model provides a seemingly plausible, yet incorrect answer (neither the substituted nor the original answer)

Dynamic Contrastive Decoding

Base model can be uncertain of its own prediction. Conditional entropy provides a natural way to quantify the uncertainty of the base model’s predictions.

Conditional entropy to control contrastive decoding

DeCoRe incorporates conditional entropy into the contrastive decoding process to dynamically adjust the strength of the penalty based on the conditional entropy of the base model.

Results and Analyses

DeCoRe significantly improves performance in tasks requiring contextual faithfulness.




BibTeX


@article{gema2024decore,
  title={DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations}, 
  author={Aryo Pradipta Gema and Chen Jin and Ahmed Abdulaal and Tom Diethe and Philip Teare and Beatrice Alex and Pasquale Minervini and Amrutha Saseendran},
  year={2024},
  eprint={2410.18860},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2410.18860}, 
}