LightMem: Lightweight and Efficient Memory-Augmented Generation
⭐ If you like our project, please give us a star on GitHub for the latest updates!
LightMem is a lightweight and efficient memory management framework designed for Large Language Models and AI Agents. It provides a simple yet powerful memory storage, retrieval, and update mechanism to help you quickly build intelligent applications with long-term memory capabilities.
🚀 Lightweight & Efficient Minimalist design with minimal resource consumption and fast response times
🎯 Easy to Use Simple API design - integrate into your application with just a few lines of code
LightMem adopts a modular design, breaking down the memory management process into several pluggable components. The core directory structure exposed to users is outlined below, allowing for easy customization and extension:
The following table lists the backends values currently recognized by each configuration module. Use the model_name field (or the corresponding config object) to select one of these backends.
### Add Memory
session = {
"timestamp": "2025-01-10",
"turns": [
[
{"role": "user", "content": "My favorite ice cream flavor is pistachio, and my dog's name is Rex."},
{"role": "assistant", "content": "Got it. Pistachio is a great choice."}],
]
}
for turn_messages in session["turns"]:
timestamp = session["timestamp"]
for msg in turn_messages:
msg["time_stamp"] = timestamp
store_result = lightmem.add_memory(
messages=turn_messages,
force_segment=True,
force_extract=True
)
question = "What is the name of my dog?"
related_memories = lightmem.retrieve(question, limit=5)
print(related_memories)
📁 Experimental Results
For transparency and reproducibility, we have shared the results of our experiments on Google Drive. This includes model outputs, evaluation logs, and predictions used in our study.
All behaviors of LightMem are controlled via the BaseMemoryConfigs configuration class. Users can customize aspects like pre-processing, memory extraction, retrieval strategy, and update mechanisms by providing a custom configuration.
Key Configuration Options (Usage)
Option
Default
Usage (allowed values and behavior)
pre_compress
False
True / False. If True, input messages are pre-compressed using the pre_compressor configuration before being stored. This reduces storage and indexing cost but may remove fine-grained details. If False, messages are stored without pre-compression.
pre_compressor
None
dict / object. Configuration for the pre-compression component (PreCompressorConfig) with fields like model_name (e.g., llmlingua-2, entropy_compress) and configs (model-specific parameters). Effective only when pre_compress=True.
topic_segment
False
True / False. Enables topic-based segmentation of long conversations. When True, long conversations are split into topic segments and each segment can be indexed/stored independently (requires topic_segmenter). When False, messages are stored sequentially.
precomp_topic_shared
False
True / False. If True, pre-compression and topic segmentation can share intermediate results to avoid redundant processing. May improve performance but requires careful configuration to avoid cross-topic leakage.
topic_segmenter
None
dict / object. Configuration for topic segmentation (TopicSegmenterConfig), including model_name and configs (segment length, overlap, etc.). Used when topic_segment=True.
messages_use
'user_only'
'user_only' / 'assistant_only' / 'hybrid'. Controls which messages are used to generate metadata and summaries: user_only uses user inputs, assistant_only uses assistant responses, hybrid uses both. Choosing hybrid increases processing but yields richer context.
metadata_generate
True
True / False. If True, metadata such as keywords and entities are extracted and stored to support attribute-based and filtered retrieval. If False, no metadata extraction occurs.
text_summary
True
True / False. If True, a text summary is generated and stored alongside the original text (reduces retrieval cost and speeds review). If False, only the original text is stored. Summary quality depends on memory_manager.
memory_manager
MemoryManagerConfig()
dict / object. Controls the model used to generate summaries and metadata (MemoryManagerConfig), e.g., model_name (openai, ollama, etc.) and configs. Changing this affects summary style, length, and cost.
extract_threshold
0.5
float (0.0 - 1.0). Threshold used to decide whether content is important enough to be extracted as metadata or highlight. Higher values (e.g., 0.8) mean more conservative extraction; lower values (e.g., 0.2) extract more items (may increase noise).
index_strategy
None
'embedding' / 'context' / 'hybrid' / None. Determines how memories are indexed: ‘embedding’ uses vector-based indexing (requires embedders/retriever) for semantic search; ‘context’ uses text-based/contextual retrieval (requires context_retriever) for keyword/document similarity; and ‘hybrid’ combines context filtering and vector reranking for robustness and higher accuracy.
text_embedder
None
dict / object. Configuration for text embedding model (TextEmbedderConfig) with model_name (e.g., huggingface) and configs (batch size, device, embedding dim). Required when index_strategy or retrieve_strategy includes 'embedding'.
multimodal_embedder
None
dict / object. Configuration for multimodal/image embedder (MMEmbedderConfig). Used for non-text modalities.
history_db_path
os.path.join(lightmem_dir, "history.db")
str. Path to persist conversation history and lightweight state. Useful to restore state across restarts.
retrieve_strategy
'embedding'
'embedding' / 'context' / 'hybrid'. Strategy used at query time to fetch relevant memories. Pick based on data and query type: semantic queries -> 'embedding'; keyword/structured queries -> 'context'; mixed -> 'hybrid'.
context_retriever
None
dict / object. Configuration for context-based retriever (ContextRetrieverConfig), e.g., model_name='BM25' and configs like top_k. Used when retrieve_strategy includes 'context'.
embedding_retriever
None
dict / object. Vector store configuration (EmbeddingRetrieverConfig), e.g., model_name='qdrant' and connection/index params. Used when retrieve_strategy includes 'embedding'.
update
'offline'
'online' / 'offline'. 'online': update memories immediately after each interaction (low latency for fresh memories but higher operational cost). 'offline': batch or scheduled updates to save cost and aggregate changes.
kv_cache
False
True / False. If True, attempt to precompute and persist model KV caches to accelerate repeated LLM calls (requires support from the LLM runtime and may increase storage). Uses kv_cache_path to store cache.
kv_cache_path
os.path.join(lightmem_dir, "kv_cache.db")
str. File path for KV cache storage when kv_cache=True.
graph_mem
False
True / False. When True, some memories will be organized as a graph (nodes and relationships) to support complex relation queries and reasoning. Requires additional graph processing/storage.
version
'v1.1'
str. Configuration/API version. Only change if you know compatibility implications.
We welcome contributions from the community! If you'd like to contribute, please fork the repository and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.
LightMem: Lightweight and Efficient Memory-Augmented Generation
⭐ If you like our project, please give us a star on GitHub for the latest updates!
LightMem is a lightweight and efficient memory management framework designed for Large Language Models and AI Agents. It provides a simple yet powerful memory storage, retrieval, and update mechanism to help you quickly build intelligent applications with long-term memory capabilities.
🚀 Lightweight & Efficient
Minimalist design with minimal resource consumption and fast response times
🎯 Easy to Use
Simple API design - integrate into your application with just a few lines of code
🔌 Flexible & Extensible
Modular architecture supporting custom storage engines and retrieval strategies
🌐 Broad Compatibility
Support for cloud APIs (OpenAI, DeepSeek) and local models (Ollama, vLLM, etc.)
📢 News
☑️ Todo List
LightMem is continuously evolving! Here’s what’s coming:
📑 Table of Contents
🔧 Installation
Installation Steps
Option 1: Install from Source
Option 2: Install via pip
⚡ Quick Start
🏗️ Architecture
🗺️ Core Modules Overview
LightMem adopts a modular design, breaking down the memory management process into several pluggable components. The core directory structure exposed to users is outlined below, allowing for easy customization and extension:
🧩 Supported Backends per Module
The following table lists the backends values currently recognized by each configuration module. Use the
model_namefield (or the corresponding config object) to select one of these backends.PreCompressorConfigllmlingua-2,entropy_compressTopicSegmenterConfigllmlingua-2MemoryManagerConfigopenai,deepseek,ollama,vllm, etc.TextEmbedderConfighuggingfaceMMEmbedderConfighuggingfaceEmbeddingRetrieverConfigqdrant💡 Examples
Initialize LightMem
Add Memory
Offline Update
Retrieve Memory
📁 Experimental Results
For transparency and reproducibility, we have shared the results of our experiments on Google Drive. This includes model outputs, evaluation logs, and predictions used in our study.
🔗 Access the data here: Google Drive - Experimental Results
Please feel free to download, explore, and use these resources for research or reference purposes.
LOCOMO:
Overview
backbone:
gpt-4o-mini, judge model:gpt-4o-mini&qwen2.5-32b-instructbackbone:
qwen3-30b-a3b-instruct-2507, judge model:gpt-4o-mini&qwen2.5-32b-instructDetails
backbone:
gpt-4o-mini, judge model:gpt-4o-mini&qwen2.5-32b-instructbackbone:
qwen3-30b-a3b-instruct-2507, judge model:gpt-4o-mini&qwen2.5-32b-instructPerformance metrics
backbone:
gpt-4o-mini, judge model:gpt-4o-minibackbone:
gpt-4o-mini, judge model:qwen2.5-32b-instructbackbone:
qwen3-30b-a3b-instruct-2507, judge model:gpt-4o-minibackbone:
qwen3-30b-a3b-instruct-2507, judge model:qwen2.5-32b-instruct⚙️ Configuration
All behaviors of LightMem are controlled via the BaseMemoryConfigs configuration class. Users can customize aspects like pre-processing, memory extraction, retrieval strategy, and update mechanisms by providing a custom configuration.
Key Configuration Options (Usage)
pre_compressFalsepre_compressorconfiguration before being stored. This reduces storage and indexing cost but may remove fine-grained details. If False, messages are stored without pre-compression.pre_compressorNonePreCompressorConfig) with fields likemodel_name(e.g.,llmlingua-2,entropy_compress) andconfigs(model-specific parameters). Effective only whenpre_compress=True.topic_segmentFalsetopic_segmenter). When False, messages are stored sequentially.precomp_topic_sharedFalsetopic_segmenterNoneTopicSegmenterConfig), includingmodel_nameandconfigs(segment length, overlap, etc.). Used whentopic_segment=True.messages_use'user_only''user_only'/'assistant_only'/'hybrid'. Controls which messages are used to generate metadata and summaries:user_onlyuses user inputs,assistant_onlyuses assistant responses,hybriduses both. Choosinghybridincreases processing but yields richer context.metadata_generateTruetext_summaryTruememory_manager.memory_managerMemoryManagerConfig()MemoryManagerConfig), e.g.,model_name(openai,ollama, etc.) andconfigs. Changing this affects summary style, length, and cost.extract_threshold0.5index_strategyNone'embedding'/'context'/'hybrid'/None. Determines how memories are indexed: ‘embedding’ uses vector-based indexing (requires embedders/retriever) for semantic search; ‘context’ uses text-based/contextual retrieval (requires context_retriever) for keyword/document similarity; and ‘hybrid’ combines context filtering and vector reranking for robustness and higher accuracy.text_embedderNoneTextEmbedderConfig) withmodel_name(e.g.,huggingface) andconfigs(batch size, device, embedding dim). Required whenindex_strategyorretrieve_strategyincludes'embedding'.multimodal_embedderNoneMMEmbedderConfig). Used for non-text modalities.history_db_pathos.path.join(lightmem_dir, "history.db")retrieve_strategy'embedding''embedding'/'context'/'hybrid'. Strategy used at query time to fetch relevant memories. Pick based on data and query type: semantic queries ->'embedding'; keyword/structured queries ->'context'; mixed ->'hybrid'.context_retrieverNoneContextRetrieverConfig), e.g.,model_name='BM25'andconfigsliketop_k. Used whenretrieve_strategyincludes'context'.embedding_retrieverNoneEmbeddingRetrieverConfig), e.g.,model_name='qdrant'and connection/index params. Used whenretrieve_strategyincludes'embedding'.update'offline''online'/'offline'.'online': update memories immediately after each interaction (low latency for fresh memories but higher operational cost).'offline': batch or scheduled updates to save cost and aggregate changes.kv_cacheFalsekv_cache_pathto store cache.kv_cache_pathos.path.join(lightmem_dir, "kv_cache.db")kv_cache=True.graph_memFalseversion'v1.1'logging'None'🏆 Contributors
JizhanFang
Xinle-Deng
Xubqpanda
HaomingX
453251
James-TYQ
evy568
Norah-Feathertail
🔗 Related Projects
Mem0
Memos
Zep
MIRIX
MemU
Memobase