Most Retrieval-Augmented Generation systems start with a simple idea: split documents into chunks, turn those chunks into embeddings, store them in a vector database, and retrieve the most similar chunks when someone asks a question.
That approach works surprisingly well. It is also limited.
Traditional RAG is good at finding text that looks semantically similar to the query. It is much weaker at understanding how things are connected. If your documents talk about people, companies, legal roles, products, decisions, events, dependencies, or causes, the important information often lives in the relationships between those things, not only in the individual paragraphs.
LightRAG is interesting because it tries to fix that problem without abandoning the normal RAG pipeline. It still chunks documents. It still embeds text. It still uses vector search. But it also extracts entities and relationships, stores them in a graph, and uses that graph during retrieval.
In plain English, LightRAG does not just ask, "Which chunks are similar to this question?" It also asks, "Which concepts are involved, how are they connected, and what nearby information should be pulled in because of those connections?"
That is the core idea behind LightRAG.
The Problem With Plain Chunk-Based RAG
A normal RAG pipeline usually looks like this:
Document -> Chunk -> Embed -> Vector database -> Retrieve chunks -> LLM answerThe system splits a document into chunks, embeds each chunk, and stores the embeddings. At query time, it embeds the user query and searches for chunks with similar vectors.
This is clean and efficient. It is also very flat.
The vector database does not really know that ACME Corp is a company, that Jane Smith is its legal representative, that Gmail is a product, or that one chunk defines a term while another chunk explains its implications. It only knows that certain pieces of text are mathematically close in embedding space.
That can cause several issues.
First, relevant information may be spread across multiple chunks. A chunk may mention a company, another chunk may describe a role, and a third chunk may explain the actual responsibility. If none of those chunks alone is an obvious semantic match for the query, the retriever may miss one of them.
Second, some questions are relational by nature. "Who has signing authority?" is not just asking for similar prose. It is asking for entities, roles, and relationships.
Third, broad questions often require synthesis. "How does corporate governance work in this document set?" needs more than one matching paragraph. It needs themes, relationships, and supporting evidence.
LightRAG adds a graph layer to help with exactly these cases.
What LightRAG Adds
LightRAG keeps the familiar RAG foundation, but it adds a second path during indexing.
Document -> Chunk -> Embed chunks
-> Extract entities and relationships
-> Build graph
-> Embed entities and relationshipsThis means a document produces more than just chunk embeddings. It also produces graph data.
An entity might be a person, company, product, legal role, regulation, concept, or location. A relationship describes how two entities are connected. For example, Jane Smith may be the legal representative of ACME Corp, or ACME Corp may operate a specific service.
Once those entities and relationships exist, LightRAG can retrieve information in more ways than plain vector search. It can match entities. It can match relationships. It can fetch neighboring nodes from the graph. It can still retrieve normal chunks. Then it combines those sources into a richer context for the final language model.
The rough query flow looks like this:
Query -> Extract keywords
-> Match entities
-> Match relationships
-> Retrieve chunks
-> Fetch graph neighbors
-> Combine context
-> Generate answerThis is why LightRAG is often described as graph-enhanced RAG. The graph does not replace retrieval. It gives retrieval more structure.
LightRAG Is Not Contextual Embedding
It is easy to confuse LightRAG with contextual embedding, but they are different techniques.
In Anthropic's contextual retrieval approach, each chunk gets extra document-level context before it is embedded. A normal chunk like this:
The company's revenue grew by 3% over the previous quarter.might become this before embedding:
This chunk is from an SEC filing on ACME Corp's performance in Q2 2023. The previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter.The key point is that the added context is baked into the chunk embedding itself. The retriever is still searching over chunks, but those chunks have been enriched before embedding.
LightRAG works differently. It does not prepend a custom summary to every chunk before embedding. Its context comes from the graph.
Instead of saying, "Let me make this chunk more self-contained before embedding it," LightRAG says, "Let me extract the things this chunk talks about, store their relationships, and use those relationships later when someone asks a question."
That distinction matters. Contextual embedding enriches the vector. LightRAG enriches the retrieval process.
| Question | Contextual embedding | LightRAG |
|---|---|---|
| Where does context come from? | A generated chunk-specific summary | Extracted entities and relationships |
| When is context added? | Before chunk embedding | During indexing and query-time retrieval |
| How is context retrieved? | Through the chunk vector | Through graph search plus vector search |
A simple way to phrase it is this: LightRAG trades embedded context for structural context.
The Storage Model
LightRAG stores different kinds of information in different storage systems. That can look redundant at first, but each storage type has a separate job.
There are four main storage categories:
| Storage type | Purpose |
|---|---|
| Key-value storage | Stores raw text, metadata, descriptions, and cache entries |
| Vector storage | Stores embeddings for semantic search |
| Graph storage | Stores nodes, edges, and graph structure |
| Document status storage | Tracks document processing state |
The default setup can use simple local storage, such as JSON files, NanoVectorDB, and NetworkX. Production setups can swap in PostgreSQL, Redis, MongoDB, Qdrant, Milvus, Faiss, Neo4j, Memgraph, or PostgreSQL with graph/vector extensions.
The important thing is not the specific backend. The important thing is the separation of responsibilities.
Key-value storage is for direct lookup. Vector storage is for similarity search. Graph storage is for traversal. Document status storage is for knowing what has already been processed.
What Gets Stored Where
The easiest way to understand LightRAG is to follow a single document through the system.
First, the document is split into chunks. Those chunks are stored twice: once as text and once as embeddings.
# Key-value storage keeps the readable chunk content.
{
"chunk_id_123": {
"content": "The company's revenue grew by 3%...",
"source_id": "doc-1",
"file_path": "report.pdf"
}
}
# Vector storage keeps the embedding used for semantic search.
{
"id": "chunk_id_123",
"vector": [0.023, -0.156, 0.089],
"payload": {"source_id": "doc-1"}
}The key-value entry lets the system recover the actual text. The vector entry lets the system find the chunk when a query is semantically similar.
Next, LightRAG extracts entities. An entity is stored in three places.
# Key-value storage keeps metadata and descriptions.
{
"ACME CORP": {
"entity_type": "company",
"description": "ACME Corp is a technology company...",
"source_id": "doc-1"
}
}
# Vector storage makes the entity searchable by meaning.
{
"id": "entity_ACME_CORP",
"vector": [0.045, -0.234, 0.112],
"payload": {
"entity_name": "ACME CORP",
"entity_type": "company"
}
}
# Graph storage represents the entity as a node.
Node("ACME CORP", type="company", description="...")Relationships follow the same pattern.
# Key-value storage keeps relationship metadata.
{
"ACME_CORP->GMAIL": {
"description": "ACME Corp develops Gmail",
"keywords": "develops operates service",
"weight": 2.0,
"source_id": "doc-1"
}
}
# Vector storage makes the relationship searchable.
{
"id": "rel_ACME_CORP_GMAIL",
"vector": [0.078, -0.189, 0.234],
"payload": {
"src": "ACME CORP",
"tgt": "GMAIL",
"keywords": "develops"
}
}
# Graph storage represents the relationship as an edge.
Edge("ACME CORP" -> "GMAIL", relation="develops", weight=2.0)This duplication is intentional. A relationship needs to be readable, searchable, and traversable. No single storage model is ideal for all three jobs.
The Indexing Flow
During indexing, LightRAG does two things in parallel conceptually.
One path stores the original chunks. This preserves the source text and creates chunk embeddings for normal retrieval.
The other path asks a language model to extract entities and relationships. Those extracted objects are then stored as metadata, embeddings, and graph nodes or edges.
Document input
|
v
Chunking
|
+----------------------------+
| |
v v
Store chunk text and vectors Extract entities and relationships
|
v
Store entity and relation data
in KV, vector, and graph storageThis explains why LightRAG can answer questions that plain vector search often struggles with. The system has both the original language and a structured representation of the concepts inside that language.
Query Modes
LightRAG exposes multiple query modes because not every question needs the same retrieval strategy.
| Mode | What it does | Best fit |
|---|---|---|
naive | Uses chunk vector search only | Simple lookups |
local | Focuses on entities and nearby relationships | Questions about a specific thing |
global | Focuses on relationships and broader themes | Questions about how things connect |
hybrid | Combines local and global graph retrieval | Graph-centered answers |
mix | Combines local, global, naive retrieval, graph expansion, and reranking | Best overall answer quality |
The distinction between local and global retrieval is one of the most useful parts of the system.
Local retrieval is entity-centered. If the question is "Who is the legal representative of ACME Corp?", the system should focus on specific entities and their immediate relationships.
Global retrieval is relationship-centered. If the question is "How does corporate governance work across these documents?", the system should look for broader relationship patterns and themes.
Neither approach is always better. They answer different kinds of questions.
Why Mix Mode Is Usually the Most Complete
Mix mode is the "use everything" option.
It combines entity retrieval, relationship retrieval, normal chunk vector search, graph neighbor expansion, reranking, deduplication, and token-budget trimming.
The query flow is roughly this:
User query
|
v
Extract local and global keywords
|
+------------+-------------+-------------+
| | | |
v v v v
Local search Global search Naive search Graph expansion
entities relations chunks neighbors
| | | |
+------------+-------------+-------------+
|
v
Merge and deduplicate
|
v
Rerank
|
v
Trim to token budget
|
v
Generate final answerThis is more expensive than naive retrieval. It can take longer and use more tokens. But if the goal is the best possible answer over a complicated document set, mix mode is usually the mode you reach for first.
The tradeoff is straightforward:
| Mode | Latency | Completeness | Token cost |
|---|---|---|---|
naive | Lowest | Basic | Lowest |
hybrid | Medium | Good | Medium |
mix | Highest | Best | Highest |
In other words, mix mode is not magic. It is just willing to spend more retrieval effort before asking the final model to answer.
Reading LightRAG Logs
LightRAG logs can look noisy until you know what each line means. Once you understand the retrieval flow, the logs become a useful debugging tool.
For example:
== LLM cache == saving: mix:keywords:21e0b3d64bffb71be5e2d47b38025abfThis means LightRAG used the language model to extract keywords from the query and cached the result.
Embedding func: 24 new workers initializedThis means embedding workers were started for vector search work.
Query nodes: Definition, Signing authority, Legal representative...
Local query: 40 entities, 116 relationsThis is local retrieval. LightRAG identified entity-like query terms, found matching entities, and pulled connected relationships.
Query edges: Authorized officer, Legal roles, Corporate governance...
Global query: 47 entities, 40 relationsThis is global retrieval. LightRAG searched relationship-level concepts and then brought in the connected entities.
Naive query: 20 chunks (chunk_top_k:20 cosine:0.2)This is normal vector search over chunks.
Raw search results: 70 entities, 129 relations, 20 vector chunks
After truncation: 70 entities, 129 relationsAt this point, the system has combined results from the different retrieval paths and started trimming them to fit configured limits.
Successfully reranked: 20 chunks from 27 original chunks
Final context: 70 entities, 129 relations, 13 chunksThis means the reranker reduced the chunk set, and then the token budget reduced the final context further. If you expected 20 chunks but only see 13 in the final context, the token budget is probably the reason.
Token Budgets Matter More Than You Think
Graph-enhanced retrieval can produce a lot of context. That is both the advantage and the danger.
LightRAG may retrieve entities, relationships, graph neighbors, and chunks. All of that has to fit into the model's context window. If the budget is too small, useful chunks may be dropped before the final answer is generated.
A typical query configuration looks like this:
from lightrag import QueryParam
param = QueryParam(
mode="mix",
max_total_tokens=30000,
max_entity_tokens=6000,
max_relation_tokens=8000,
chunk_top_k=20,
top_k=60,
)If the reranker returns 20 chunks but the final context only includes 13, increasing the total token budget may help.
param = QueryParam(
mode="mix",
max_total_tokens=50000,
max_entity_tokens=4000,
max_relation_tokens=6000,
chunk_top_k=20,
top_k=60,
)Notice the second example does two things. It increases the total budget, but it also reduces the entity and relationship budgets. That leaves more room for chunks.
You can also configure these values server-wide:
MAX_TOTAL_TOKENS=50000
MAX_ENTITY_TOKENS=6000
MAX_RELATION_TOKENS=8000
CHUNK_TOP_K=20
TOP_K=60The obvious warning applies: your model still needs a large enough context window. Sending 50,000 tokens to a model with a 32,000-token context limit will fail.
LightRAG vs GraphRAG
LightRAG and GraphRAG are solving a similar problem: plain chunk retrieval is not enough for complex knowledge work. They both use graph ideas to improve retrieval. But they do it differently.
GraphRAG typically builds higher-level community summaries and uses those communities during retrieval. That can be powerful, but it can also be expensive. It may require large preprocessing steps and costly query-time traversal or summarization.
LightRAG takes a lighter path. It extracts entities and relationships, embeds them, and uses vector matching plus graph traversal. It is designed to be more incremental and less expensive at query time.
The practical difference is that LightRAG can merge new nodes and edges as documents arrive. It does not need to rebuild an entire hierarchy of community reports every time the corpus changes.
That makes LightRAG attractive for systems where documents are added continuously.
What Kind of Model Does LightRAG Need?
LightRAG depends on a language model for entity and relationship extraction, keyword extraction, and final answer generation. The quality of those steps matters.
The common recommendation is to use at least a 32B parameter model, with a minimum 32K context window. A 64K context window is better if you plan to use mix mode heavily or retrieve large graph contexts.
LightRAG can work with multiple model providers and runtimes, including OpenAI, Azure, Gemini, Ollama, HuggingFace, and LlamaIndex integrations. The exact setup matters less than the model's ability to reliably extract structured information and handle a large final context.
If the model extracts bad entities or weak relationships, the graph becomes less useful. Graph-enhanced RAG is only as good as the structure it builds.
A Practical Production Storage Setup
For local experiments, the default storage options are enough. For a larger system, it makes sense to split storage across specialized backends.
One possible setup looks like this:
LIGHTRAG_KV_STORAGE=PGKVStorage
LIGHTRAG_VECTOR_STORAGE=QdrantVectorDBStorage
LIGHTRAG_GRAPH_STORAGE=Neo4JStorage
LIGHTRAG_DOC_STATUS_STORAGE=PGDocStatusStorageIn that setup, PostgreSQL stores chunk text, metadata, entity descriptions, relationship descriptions, and document status. Qdrant stores embeddings for chunks, entities, and relationships. Neo4j stores graph nodes, edges, and traversal structure.
That separation is clean because each database is doing what it is good at. PostgreSQL handles durable structured records. Qdrant handles vector similarity search. Neo4j handles graph traversal.
You do not need this architecture on day one. But it is a useful mental model for how LightRAG scales beyond a toy example.
Where LightRAG Helps Most
LightRAG is most useful when your corpus contains connected knowledge.
Legal documents are a good example. They contain parties, obligations, definitions, authorities, exceptions, and references across sections. A plain chunk search may find the right paragraph. A graph-aware system has a better chance of connecting the paragraph to the relevant party, role, and obligation.
Technical documentation is another good example. APIs, services, dependencies, configuration values, and failure modes often form a graph. Questions about impact, ownership, or dependency chains benefit from relationship-aware retrieval.
Company knowledge bases also fit well. People, teams, projects, decisions, documents, and systems are connected. Users rarely ask questions in a way that maps neatly to one chunk.
LightRAG is less necessary when the corpus is small, the questions are simple, or the answer usually lives in one obvious chunk. In those cases, normal RAG is easier, cheaper, and probably good enough.
The Main Tradeoff
LightRAG gives the retriever more ways to find useful context. That is the upside.
The cost is complexity.
You now have entity extraction, relationship extraction, graph storage, multiple vector collections, several retrieval modes, reranking, deduplication, and token budgets to tune. There are more moving parts than plain RAG.
That extra complexity is worth it only if the questions require it. If your users ask simple factual questions over short documents, LightRAG may be overkill. If they ask broad, relational, multi-hop questions over a large corpus, the graph layer can be the difference between a shallow answer and a useful one.
Final Takeaways
LightRAG is best understood as normal RAG plus a structured memory of entities and relationships.
It still stores and retrieves chunks. It does not throw away the original text. But it also builds a graph that lets the system reason about what the text is connected to.
The most important design choice is that entities and relationships live in three forms at once: readable metadata in key-value storage, searchable embeddings in vector storage, and traversable nodes or edges in graph storage. That is not accidental redundancy. It is how LightRAG supports lookup, similarity search, and structural retrieval at the same time.
Mix mode is the most complete retrieval mode because it uses all of those paths together. It is also the most expensive. Token budgets then decide how much of the retrieved context actually reaches the final model.
The simplest summary is this: traditional RAG retrieves similar chunks, while LightRAG retrieves similar chunks plus connected knowledge.
That difference matters when the answer is not sitting in one paragraph, but spread across a web of concepts.
References
Read more
Coder Tasks, Workspaces, and OpenCode: A Practical Mental Model
Build a clearer mental model for Coder workspaces, templates, provisioners, and AI coding tasks.
Architecting Reliable Mobile Billing: What I Learned the Hard Way
A real-world look at fixing mobile subscription billing when webhooks, sandbox purchases, and user identity break down.
OAuth vs OIDC: The Difference Finally Explained
A practical explanation of OAuth, OIDC, access tokens, and ID tokens without the usual authentication confusion.
