LightRAG Deep Dive: RAG With a Memory of Relationships

Most Retrieval-Augmented Generation systems start with a simple idea: split documents into chunks, turn those chunks into embeddings, store them in a vector database, and retrieve the most similar chunks when someone asks a question.

That approach works surprisingly well. It is also limited.

Traditional RAG is good at finding text that looks semantically similar to the query. It is much weaker at understanding how things are connected. If your documents talk about people, companies, legal roles, products, decisions, events, dependencies, or causes, the important information often lives in the relationships between those things, not only in the individual paragraphs.

LightRAG is interesting because it tries to fix that problem without abandoning the normal RAG pipeline. It still chunks documents. It still embeds text. It still uses vector search. But it also extracts entities and relationships, stores them in a graph, and uses that graph during retrieval.

In plain English, LightRAG does not just ask, "Which chunks are similar to this question?" It also asks, "Which concepts are involved, how are they connected, and what nearby information should be pulled in because of those connections?"

That is the core idea behind LightRAG.

The Problem With Plain Chunk-Based RAG

A normal RAG pipeline usually looks like this:

Document -> Chunk -> Embed -> Vector database -> Retrieve chunks -> LLM answer

The system splits a document into chunks, embeds each chunk, and stores the embeddings. At query time, it embeds the user query and searches for chunks with similar vectors.

This is clean and efficient. It is also very flat.

The vector database does not really know that ACME Corp is a company, that Jane Smith is its legal representative, that Gmail is a product, or that one chunk defines a term while another chunk explains its implications. It only knows that certain pieces of text are mathematically close in embedding space.

That can cause several issues.

First, relevant information may be spread across multiple chunks. A chunk may mention a company, another chunk may describe a role, and a third chunk may explain the actual responsibility. If none of those chunks alone is an obvious semantic match for the query, the retriever may miss one of them.

Second, some questions are relational by nature. "Who has signing authority?" is not just asking for similar prose. It is asking for entities, roles, and relationships.

Third, broad questions often require synthesis. "How does corporate governance work in this document set?" needs more than one matching paragraph. It needs themes, relationships, and supporting evidence.

LightRAG adds a graph layer to help with exactly these cases.

What LightRAG Adds

LightRAG keeps the familiar RAG foundation, but it adds a second path during indexing.

Document -> Chunk -> Embed chunks
                   -> Extract entities and relationships
                   -> Build graph
                   -> Embed entities and relationships

This means a document produces more than just chunk embeddings. It also produces graph data.

An entity might be a person, company, product, legal role, regulation, concept, or location. A relationship describes how two entities are connected. For example, Jane Smith may be the legal representative of ACME Corp, or ACME Corp may operate a specific service.

Once those entities and relationships exist, LightRAG can retrieve information in more ways than plain vector search. It can match entities. It can match relationships. It can fetch neighboring nodes from the graph. It can still retrieve normal chunks. Then it combines those sources into a richer context for the final language model.

The rough query flow looks like this:

Query -> Extract keywords
      -> Match entities
      -> Match relationships
      -> Retrieve chunks
      -> Fetch graph neighbors
      -> Combine context
      -> Generate answer

This is why LightRAG is often described as graph-enhanced RAG. The graph does not replace retrieval. It gives retrieval more structure.

LightRAG Is Not Contextual Embedding

It is easy to confuse LightRAG with contextual embedding, but they are different techniques.

In Anthropic's contextual retrieval approach, each chunk gets extra document-level context before it is embedded. A normal chunk like this:

The company's revenue grew by 3% over the previous quarter.

might become this before embedding:

This chunk is from an SEC filing on ACME Corp's performance in Q2 2023. The previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter.

The key point is that the added context is baked into the chunk embedding itself. The retriever is still searching over chunks, but those chunks have been enriched before embedding.

LightRAG works differently. It does not prepend a custom summary to every chunk before embedding. Its context comes from the graph.

Instead of saying, "Let me make this chunk more self-contained before embedding it," LightRAG says, "Let me extract the things this chunk talks about, store their relationships, and use those relationships later when someone asks a question."

That distinction matters. Contextual embedding enriches the vector. LightRAG enriches the retrieval process.

Question	Contextual embedding	LightRAG
Where does context come from?	A generated chunk-specific summary	Extracted entities and relationships
When is context added?	Before chunk embedding	During indexing and query-time retrieval
How is context retrieved?	Through the chunk vector	Through graph search plus vector search

A simple way to phrase it is this: LightRAG trades embedded context for structural context.

The Storage Model

LightRAG stores different kinds of information in different storage systems. That can look redundant at first, but each storage type has a separate job.

There are four main storage categories:

Storage type	Purpose
Key-value storage	Stores raw text, metadata, descriptions, and cache entries
Vector storage	Stores embeddings for semantic search
Graph storage	Stores nodes, edges, and graph structure
Document status storage	Tracks document processing state

The default setup can use simple local storage, such as JSON files, NanoVectorDB, and NetworkX. Production setups can swap in PostgreSQL, Redis, MongoDB, Qdrant, Milvus, Faiss, Neo4j, Memgraph, or PostgreSQL with graph/vector extensions.

The important thing is not the specific backend. The important thing is the separation of responsibilities.

Key-value storage is for direct lookup. Vector storage is for similarity search. Graph storage is for traversal. Document status storage is for knowing what has already been processed.

What Gets Stored Where

The easiest way to understand LightRAG is to follow a single document through the system.

First, the document is split into chunks. Those chunks are stored twice: once as text and once as embeddings.

# Key-value storage keeps the readable chunk content.
{
    "chunk_id_123": {
        "content": "The company's revenue grew by 3%...",
        "source_id": "doc-1",
        "file_path": "report.pdf"
    }
}

# Vector storage keeps the embedding used for semantic search.
{
    "id": "chunk_id_123",
    "vector": [0.023, -0.156, 0.089],
    "payload": {"source_id": "doc-1"}
}

The key-value entry lets the system recover the actual text. The vector entry lets the system find the chunk when a query is semantically similar.

Next, LightRAG extracts entities. An entity is stored in three places.

# Key-value storage keeps metadata and descriptions.
{
    "ACME CORP": {
        "entity_type": "company",
        "description": "ACME Corp is a technology company...",
        "source_id": "doc-1"
    }
}

# Vector storage makes the entity searchable by meaning.
{
    "id": "entity_ACME_CORP",
    "vector": [0.045, -0.234, 0.112],
    "payload": {
        "entity_name": "ACME CORP",
        "entity_type": "company"
    }
}

# Graph storage represents the entity as a node.
Node("ACME CORP", type="company", description="...")

Relationships follow the same pattern.

# Key-value storage keeps relationship metadata.
{
    "ACME_CORP->GMAIL": {
        "description": "ACME Corp develops Gmail",
        "keywords": "develops operates service",
        "weight": 2.0,
        "source_id": "doc-1"
    }
}

# Vector storage makes the relationship searchable.
{
    "id": "rel_ACME_CORP_GMAIL",
    "vector": [0.078, -0.189, 0.234],
    "payload": {
        "src": "ACME CORP",
        "tgt": "GMAIL",
        "keywords": "develops"
    }
}

# Graph storage represents the relationship as an edge.
Edge("ACME CORP" -> "GMAIL", relation="develops", weight=2.0)

This duplication is intentional. A relationship needs to be readable, searchable, and traversable. No single storage model is ideal for all three jobs.

The Indexing Flow

During indexing, LightRAG does two things in parallel conceptually.

One path stores the original chunks. This preserves the source text and creates chunk embeddings for normal retrieval.

The other path asks a language model to extract entities and relationships. Those extracted objects are then stored as metadata, embeddings, and graph nodes or edges.

Document input
      |
      v
Chunking
      |
      +----------------------------+
      |                            |
      v                            v
Store chunk text and vectors       Extract entities and relationships
                                   |
                                   v
                           Store entity and relation data
                           in KV, vector, and graph storage

This explains why LightRAG can answer questions that plain vector search often struggles with. The system has both the original language and a structured representation of the concepts inside that language.

Query Modes

LightRAG exposes multiple query modes because not every question needs the same retrieval strategy.

Mode	What it does	Best fit
`naive`	Uses chunk vector search only	Simple lookups
`local`	Focuses on entities and nearby relationships	Questions about a specific thing
`global`	Focuses on relationships and broader themes	Questions about how things connect
`hybrid`	Combines local and global graph retrieval	Graph-centered answers
`mix`	Combines local, global, naive retrieval, graph expansion, and reranking	Best overall answer quality

The distinction between local and global retrieval is one of the most useful parts of the system.

Local retrieval is entity-centered. If the question is "Who is the legal representative of ACME Corp?", the system should focus on specific entities and their immediate relationships.

Global retrieval is relationship-centered. If the question is "How does corporate governance work across these documents?", the system should look for broader relationship patterns and themes.

Neither approach is always better. They answer different kinds of questions.

Why Mix Mode Is Usually the Most Complete

Mix mode is the "use everything" option.

It combines entity retrieval, relationship retrieval, normal chunk vector search, graph neighbor expansion, reranking, deduplication, and token-budget trimming.

The query flow is roughly this:

User query
    |
    v
Extract local and global keywords
    |
    +------------+-------------+-------------+
    |            |             |             |
    v            v             v             v
Local search    Global search  Naive search  Graph expansion
entities        relations      chunks        neighbors
    |            |             |             |
    +------------+-------------+-------------+
                 |
                 v
          Merge and deduplicate
                 |
                 v
              Rerank
                 |
                 v
          Trim to token budget
                 |
                 v
          Generate final answer

This is more expensive than naive retrieval. It can take longer and use more tokens. But if the goal is the best possible answer over a complicated document set, mix mode is usually the mode you reach for first.

The tradeoff is straightforward:

Mode	Latency	Completeness	Token cost
`naive`	Lowest	Basic	Lowest
`hybrid`	Medium	Good	Medium
`mix`	Highest	Best	Highest

In other words, mix mode is not magic. It is just willing to spend more retrieval effort before asking the final model to answer.

Reading LightRAG Logs

LightRAG logs can look noisy until you know what each line means. Once you understand the retrieval flow, the logs become a useful debugging tool.

For example:

== LLM cache == saving: mix:keywords:21e0b3d64bffb71be5e2d47b38025abf

This means LightRAG used the language model to extract keywords from the query and cached the result.

Embedding func: 24 new workers initialized

This means embedding workers were started for vector search work.

Query nodes: Definition, Signing authority, Legal representative...
Local query: 40 entities, 116 relations

This is local retrieval. LightRAG identified entity-like query terms, found matching entities, and pulled connected relationships.

Query edges: Authorized officer, Legal roles, Corporate governance...
Global query: 47 entities, 40 relations

This is global retrieval. LightRAG searched relationship-level concepts and then brought in the connected entities.

Naive query: 20 chunks (chunk_top_k:20 cosine:0.2)

This is normal vector search over chunks.

Raw search results: 70 entities, 129 relations, 20 vector chunks
After truncation: 70 entities, 129 relations

At this point, the system has combined results from the different retrieval paths and started trimming them to fit configured limits.

Successfully reranked: 20 chunks from 27 original chunks
Final context: 70 entities, 129 relations, 13 chunks

This means the reranker reduced the chunk set, and then the token budget reduced the final context further. If you expected 20 chunks but only see 13 in the final context, the token budget is probably the reason.

Token Budgets Matter More Than You Think

Graph-enhanced retrieval can produce a lot of context. That is both the advantage and the danger.

LightRAG may retrieve entities, relationships, graph neighbors, and chunks. All of that has to fit into the model's context window. If the budget is too small, useful chunks may be dropped before the final answer is generated.

A typical query configuration looks like this:

from lightrag import QueryParam

param = QueryParam(
    mode="mix",
    max_total_tokens=30000,
    max_entity_tokens=6000,
    max_relation_tokens=8000,
    chunk_top_k=20,
    top_k=60,
)

If the reranker returns 20 chunks but the final context only includes 13, increasing the total token budget may help.

param = QueryParam(
    mode="mix",
    max_total_tokens=50000,
    max_entity_tokens=4000,
    max_relation_tokens=6000,
    chunk_top_k=20,
    top_k=60,
)

Notice the second example does two things. It increases the total budget, but it also reduces the entity and relationship budgets. That leaves more room for chunks.

You can also configure these values server-wide:

MAX_TOTAL_TOKENS=50000
MAX_ENTITY_TOKENS=6000
MAX_RELATION_TOKENS=8000
CHUNK_TOP_K=20
TOP_K=60

The obvious warning applies: your model still needs a large enough context window. Sending 50,000 tokens to a model with a 32,000-token context limit will fail.

LightRAG vs GraphRAG

LightRAG and GraphRAG are solving a similar problem: plain chunk retrieval is not enough for complex knowledge work. They both use graph ideas to improve retrieval. But they do it differently.

GraphRAG typically builds higher-level community summaries and uses those communities during retrieval. That can be powerful, but it can also be expensive. It may require large preprocessing steps and costly query-time traversal or summarization.

LightRAG takes a lighter path. It extracts entities and relationships, embeds them, and uses vector matching plus graph traversal. It is designed to be more incremental and less expensive at query time.

The practical difference is that LightRAG can merge new nodes and edges as documents arrive. It does not need to rebuild an entire hierarchy of community reports every time the corpus changes.

That makes LightRAG attractive for systems where documents are added continuously.

What Kind of Model Does LightRAG Need?

LightRAG depends on a language model for entity and relationship extraction, keyword extraction, and final answer generation. The quality of those steps matters.

The common recommendation is to use at least a 32B parameter model, with a minimum 32K context window. A 64K context window is better if you plan to use mix mode heavily or retrieve large graph contexts.

LightRAG can work with multiple model providers and runtimes, including OpenAI, Azure, Gemini, Ollama, HuggingFace, and LlamaIndex integrations. The exact setup matters less than the model's ability to reliably extract structured information and handle a large final context.

If the model extracts bad entities or weak relationships, the graph becomes less useful. Graph-enhanced RAG is only as good as the structure it builds.

A Practical Production Storage Setup

For local experiments, the default storage options are enough. For a larger system, it makes sense to split storage across specialized backends.

One possible setup looks like this:

LIGHTRAG_KV_STORAGE=PGKVStorage
LIGHTRAG_VECTOR_STORAGE=QdrantVectorDBStorage
LIGHTRAG_GRAPH_STORAGE=Neo4JStorage
LIGHTRAG_DOC_STATUS_STORAGE=PGDocStatusStorage

In that setup, PostgreSQL stores chunk text, metadata, entity descriptions, relationship descriptions, and document status. Qdrant stores embeddings for chunks, entities, and relationships. Neo4j stores graph nodes, edges, and traversal structure.

That separation is clean because each database is doing what it is good at. PostgreSQL handles durable structured records. Qdrant handles vector similarity search. Neo4j handles graph traversal.

You do not need this architecture on day one. But it is a useful mental model for how LightRAG scales beyond a toy example.

Where LightRAG Helps Most

LightRAG is most useful when your corpus contains connected knowledge.

Legal documents are a good example. They contain parties, obligations, definitions, authorities, exceptions, and references across sections. A plain chunk search may find the right paragraph. A graph-aware system has a better chance of connecting the paragraph to the relevant party, role, and obligation.

Technical documentation is another good example. APIs, services, dependencies, configuration values, and failure modes often form a graph. Questions about impact, ownership, or dependency chains benefit from relationship-aware retrieval.

Company knowledge bases also fit well. People, teams, projects, decisions, documents, and systems are connected. Users rarely ask questions in a way that maps neatly to one chunk.

LightRAG is less necessary when the corpus is small, the questions are simple, or the answer usually lives in one obvious chunk. In those cases, normal RAG is easier, cheaper, and probably good enough.

The Main Tradeoff

LightRAG gives the retriever more ways to find useful context. That is the upside.

The cost is complexity.

You now have entity extraction, relationship extraction, graph storage, multiple vector collections, several retrieval modes, reranking, deduplication, and token budgets to tune. There are more moving parts than plain RAG.

That extra complexity is worth it only if the questions require it. If your users ask simple factual questions over short documents, LightRAG may be overkill. If they ask broad, relational, multi-hop questions over a large corpus, the graph layer can be the difference between a shallow answer and a useful one.

Final Takeaways

LightRAG is best understood as normal RAG plus a structured memory of entities and relationships.

It still stores and retrieves chunks. It does not throw away the original text. But it also builds a graph that lets the system reason about what the text is connected to.

The most important design choice is that entities and relationships live in three forms at once: readable metadata in key-value storage, searchable embeddings in vector storage, and traversable nodes or edges in graph storage. That is not accidental redundancy. It is how LightRAG supports lookup, similarity search, and structural retrieval at the same time.

Mix mode is the most complete retrieval mode because it uses all of those paths together. It is also the most expensive. Token budgets then decide how much of the retrieved context actually reaches the final model.

The simplest summary is this: traditional RAG retrieves similar chunks, while LightRAG retrieves similar chunks plus connected knowledge.

That difference matters when the answer is not sitting in one paragraph, but spread across a web of concepts.