-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Rule-Based Hallucination Detection
Currently, LettuceDetect supports hallucination detection primarily through transformer-based inference trained on annotated datasets (RAGTruth, RAGBench), and simple prompt-based methods leveraging large language models (LLMs).
Motivation
We could incorporate a simple, interpretable, and efficient rule-based baseline into the framework. Findings from this paper showed that rule-based methods, relying on lexical or semantic overlap, can serve as strong, interpretable baselines, especially in structured factual domains (e.g., numerical data, durations, named entities). Such an approach would enable real-time hallucination detection and could complement the existing transformer-based approach.
Proposed Implementation
The proposed rule-based approach operates on the simple heuristic:
If a token in the answer lacks explicit support from the context documents, mark it as hallucinated.
This can be implemented using:
- Lexical overlap checks: Exact or fuzzy matching of answer tokens with context tokens.
- Semantic matching: Comparing tokens using embeddings or similarity measures.
Example
An actual example from the RAGTruth dataset:
Context (excerpt):
Louis Jordan ... was stranded at sea after his boat capsized ... crew members on a German-flagged container ship spotted Jordan about 200 miles off the North Carolina coast on Thursday.
Hallucinated Answer:
Louis Jordan ... was stranded at sea for five days after his boat capsized...
Annotation:
- Hallucinated Span: "for five days"
Why a Rule-based model would work here
- No explicit lexical support for the duration “five days” (no equivalent mentions such as “nearly a week,” “5 days,” etc.).
- Simple overlap checks or semantic matching could efficiently detect unsupported details.
Code Example
from lettucedetect.models.inference import HallucinationDetector
# Initialize a rule-based detector
detector = HallucinationDetector(method="rule")
contexts = ["France is a country in Europe. The capital of France is Paris. The population of France is 67 million."]
question = "What is the capital of France? What is the population of France?"
answer = "The capital of France is Paris. The population of France is 69 million."
# Get span-level predictions indicating hallucinated parts
predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans")
print("Predictions:", predictions)
# Expected Output
# [{'start': 31, 'end': 71, 'confidence': 0.99, 'text': 'The population of France is 69 million.'}]
Extensions & Further Ideas
Additional points to consider:
- Number-specific hallucination detection: Narrow domain targeting numerical facts.
- Graph-based methods: Leveraging dependency parsing (Universal Dependencies or SpaCy) to detect unsupported relations.
- Triplet extraction and OpenIE: Extracting semantic triplets (subject-predicate-object) to detect unsupported claims.
- Zero-shot NER/Relation Extraction: Integrating tools such as GLiNER for enhanced entity-level verification.
- Static Embedding Matching: Comparing static word embeddings instead of strings of the tokens.
Known Limitations
- Limited reasoning capability (doesn't capture implicit facts).
- Potential for many false positives.
- Limited handling of synonyms or paraphrases without advanced semantic matching.
Despite these limitations, a rule-based baseline would provide a fast, transparent, and interpretable solution, complementing existing methods.