Skip to content

MiuLab/KG-Survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

LLM Inference Enhanced by External Knowledge: A Survey

We kindly invite you to read our full paper. (Preprint coming soon on arXiv.)

Table of Contents

Introduction

Large Language Models (LLMs) exhibit remarkable abilities in understanding and generating human language. However, they remain fundamentally constrained by their parametric memory and are prone to hallucination, especially in tasks requiring up-to-date or domain-specific knowledge. To overcome these limitations, this survey investigates methods for enhancing LLM inference through the integration of external knowledge, particularly structured data such as tables and knowledge graphs (KGs).

Taxonomy of Knowledge Sources

To ground the discussion, the figure below introduces a comprehensive taxonomy that divides external knowledge into unstructured and structured data. The structured category is further divided into two primary sources: tables and knowledge graphs. Each data type is linked to its relevant reasoning methodologies—symbolic, neural, or hybrid for tables, and loosely or tightly coupled models for KGs.

Knowledge source taxonomy

External Knowledge Integration

Table Integration

The integration of tables with LLMs can be categorized based on the reasoning paradigm adopted. The three main approaches are symbolic reasoning, neural reasoning, and hybrid reasoning.

Symbolic Reasoning

Symbolic reasoning approaches translate natural language questions into SQL queries that are executed over the input table. These methods are interpretable and precise but often struggle with semantic ambiguity or reasoning beyond basic table operations.

Neural Reasoning

In contrast, neural reasoning relies purely on the LLM to perform reasoning in the language space, using approaches like few-shot prompting or chain-of-thought (CoT). While these methods are more flexible and suitable for ambiguous or commonsense queries, they tend to suffer from hallucinations and lack interpretability.

Hybrid Reasoning

Hybrid methods combine the precision of symbolic execution (e.g., SQL) with the adaptability of neural inference. They typically generate SQL to filter relevant rows and then use the LLM to perform further reasoning.

The figure below illustrates these three paradigms in terms of data flow and execution:

Table reasoning paradigms

Knowledge Graph Integration

Knowledge graph integration strategies can be grouped into two categories depending on the degree of interaction between the LLM and the KG: loose coupling and tight coupling.

Loose Coupling

Loose coupling treats the KG as a separate retrieval module. The LLM queries the KG, retrieves relevant facts, and then processes them as part of its prompt. This method is modular and easy to implement but lacks interactive reasoning capabilities.

Tight Coupling

Tight coupling methods integrate the LLM and KG more deeply. The LLM acts as an agent that iteratively explores the KG, using retrieved entities and relations as context for step-by-step reasoning.

The following figure visually compares these two coupling strategies:

Loose vs tight coupling of KGs and LLMs

Discussions

To assess the performance of different integration strategies, we include benchmark results from prior studies.

The figure below summarizes the performance of various table-based reasoning methods (symbolic, neural, hybrid) across two popular benchmarks: WikiTQ and TabFact. Notably, hybrid methods like H-STAR and ALTER outperform others, combining the strengths of both symbolic execution and neural reasoning.

Table-based reasoning benchmarks

In the context of knowledge graphs, the table below (from the survey) compares methods across three dimensions: objective, performance, and efficiency. Tight coupling methods such as ToG and GoG demonstrate superior reasoning depth, especially for multi-hop queries, while loosely coupled methods like KAPING offer simplicity and low latency.

KG integration methods comparison

Challenges and Future Directions

Error Propagation

Hybrid systems that rely on symbolic execution as an intermediate step are prone to cascading errors. If the symbolic stage produces incorrect results (e.g., due to faulty SQL), the subsequent neural reasoning may be misled.

Input Length and Information Loss

Structured knowledge like tables and graphs often exceed LLM input limits. Existing solutions typically extract a subset of relevant content, but this may exclude crucial information, harming accuracy.

Efficiency and Complexity

While hybrid systems tend to outperform pure symbolic or neural methods, they also introduce higher latency and complexity. Future methods must balance reasoning depth with scalability.

Multimodal and Real-Time Reasoning

Real-world applications demand real-time reasoning with up-to-date knowledge. Integrating multimodal data (e.g., images, audio) and streaming updates remains an open challenge. Future research may explore lightweight architectures that dynamically update external knowledge and align it with LLM inference.

Citation

If you find this work helpful, please cite:

@article{lin2025llmsurvey,
  title     = {LLM Inference Enhanced by External Knowledge: A Survey},
  author    = {Yu-Hsuan Lin and Qian-Hui Chen and Yi-Jie Cheng and Jia-Ren Zhang and Yi-Hung Liu and Liang-Yu Hsia and Yun-Nung Chen},
  year      = {2025},
  note      = {Manuscript under review}
}

About

Survey Paper for KG and Table RAG

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •