Skip to content

MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

Arxiv Link - 2024-04-16 17:59:10

Abstract

Recognizing if LLM output can be grounded in evidence is central to many tasks in NLP: retrieval-augmented generation, summarization, document-grounded dialogue, and more. Current approaches to this kind of "fact-checking" are based on verifying each piece of a model generation against potential evidence using an LLM. However, this process can be very computationally expensive, requiring many calls to LLMs to check a single response. In this work, we show how to build small models that have GPT-4-level performance but for 400x lower cost. We do this by constructing synthetic training data with GPT-4, which involves creating realistic yet challenging instances of factual errors via a structured generation procedure. Training on this data teaches models to check each fact in the claim and recognize synthesis of information across sentences. For evaluation, we unify pre-existing datasets into a benchmark LLM-AggreFact, collected from recent work on fact-checking and grounding LLM generations. Our best system MiniCheck-FT5 (770M parameters) outperforms all systems of comparable size and reaches GPT-4 accuracy. We release LLM-AggreFact, code for data synthesis, and models.

Socials

LinkedIn X
🚀 Exciting news in the world of NLP and LLMs! Researchers have developed a groundbreaking approach to fact-checking LLM outputs, significantly reducing computational costs while maintaining GPT-4-level performance. By training small models on synthetic data generated with GPT-4, they have successfully improved the efficiency of verifying facts in model generations. The newly introduced benchmark LLM-AggreFact, along with the MiniCheck-FT5 system, outperforms comparable models and achieves GPT-4 accuracy. Learn more about this innovative work at: http://arxiv.org/abs/2404.10774v1 #NLP #LLM #AI #FactChecking #Innovation 🔍📊🔬 🚀 Exciting new research on fact-checking in NLP! Learn how small models with GPT-4-level performance are built at 400x lower cost. Check out MiniCheck-FT5, outperforming others of comparable size. Find out more at: http://arxiv.org/abs/2404.10774v1 #AI #NLP #LLM #research #factchecking

PDF