Built-in testing – Validate prompts with promptfoo before deployment.
Auto-optimization – Continuously evaluate and refine model performance.
Chat UI – Deploy chatbot interface with API, persistance and user feedback.
Installation
Stable Release
To get started quickly, you can install the latest stable release with:
pip install ragbits
Nightly Builds
For the latest development features, you can install nightly builds that are automatically published from the main branch:
pip install ragbits --pre
Note: Nightly builds include the latest features and bug fixes but may be less stable than official releases. They follow the version format X.Y.Z.devYYYYMMDDHHMM.
Package Contents
This is a starter bundle of packages, containing:
ragbits-core - fundamental tools for working with prompts, LLMs and vector databases.
ragbits-agents - abstractions for building agentic systems.
ragbits-evaluate - unified evaluation framework for Ragbits components.
ragbits-guardrails - utilities for ensuring the safety and relevance of responses.
ragbits-chat - full-stack infrastructure for building conversational AI applications.
ragbits-cli - ragbits shell command for interacting with Ragbits components.
Alternatively, you can use individual components of the stack by installing their respective packages.
Quickstart
Basics
To define a prompt and run LLM:
import asyncio
from pydantic import BaseModel
from ragbits.core.llms import LiteLLM
from ragbits.core.prompt import Prompt
class QuestionAnswerPromptInput(BaseModel):
question: str
class QuestionAnswerPrompt(Prompt[QuestionAnswerPromptInput, str]):
system_prompt = """
You are a question answering agent. Answer the question to the best of your ability.
"""
user_prompt = """
Question: {{ question }}
"""
llm = LiteLLM(model_name="gpt-4.1-nano")
async def main() -> None:
prompt = QuestionAnswerPrompt(QuestionAnswerPromptInput(question="What are high memory and low memory on linux?"))
response = await llm.generate(prompt)
print(response)
if __name__ == "__main__":
asyncio.run(main())
Document Search
To build and query a simple vector store index:
import asyncio
from ragbits.core.embeddings import LiteLLMEmbedder
from ragbits.core.vector_stores import InMemoryVectorStore
from ragbits.document_search import DocumentSearch
embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
vector_store = InMemoryVectorStore(embedder=embedder)
document_search = DocumentSearch(vector_store=vector_store)
async def run() -> None:
await document_search.ingest("web://https://arxiv.org/pdf/1706.03762")
result = await document_search.search("What are the key findings presented in this paper?")
print(result)
if __name__ == "__main__":
asyncio.run(run())
Retrieval-Augmented Generation
To build a simple RAG pipeline:
import asyncio
from collections.abc import Iterable
from pydantic import BaseModel
from ragbits.core.embeddings import LiteLLMEmbedder
from ragbits.core.llms import LiteLLM
from ragbits.core.prompt import Prompt
from ragbits.core.vector_stores import InMemoryVectorStore
from ragbits.document_search import DocumentSearch
from ragbits.document_search.documents.element import Element
class QuestionAnswerPromptInput(BaseModel):
question: str
context: Iterable[Element]
class QuestionAnswerPrompt(Prompt[QuestionAnswerPromptInput, str]):
system_prompt = """
You are a question answering agent. Answer the question that will be provided using context.
If in the given context there is not enough information refuse to answer.
"""
user_prompt = """
Question: {{ question }}
Context: {% for chunk in context %}{{ chunk.text_representation }}{%- endfor %}
"""
llm = LiteLLM(model_name="gpt-4.1-nano")
embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
vector_store = InMemoryVectorStore(embedder=embedder)
document_search = DocumentSearch(vector_store=vector_store)
async def run() -> None:
question = "What are the key findings presented in this paper?"
await document_search.ingest("web://https://arxiv.org/pdf/1706.03762")
chunks = await document_search.search(question)
prompt = QuestionAnswerPrompt(QuestionAnswerPromptInput(question=question, context=chunks))
response = await llm.generate(prompt)
print(response)
if __name__ == "__main__":
asyncio.run(run())
Agentic RAG
To build an agentic RAG pipeline:
import asyncio
from ragbits.agents import Agent
from ragbits.core.embeddings import LiteLLMEmbedder
from ragbits.core.llms import LiteLLM
from ragbits.core.vector_stores import InMemoryVectorStore
from ragbits.document_search import DocumentSearch
embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
vector_store = InMemoryVectorStore(embedder=embedder)
document_search = DocumentSearch(vector_store=vector_store)
llm = LiteLLM(model_name="gpt-4.1-nano")
agent = Agent(llm=llm, tools=[document_search.search])
async def main() -> None:
await document_search.ingest("web://https://arxiv.org/pdf/1706.03762")
response = await agent.run("What are the key findings presented in this paper?")
print(response.content)
if __name__ == "__main__":
asyncio.run(main())
Chat UI
To expose your GenAI application through Ragbits API:
from collections.abc import AsyncGenerator
from ragbits.agents import Agent, ToolCallResult
from ragbits.chat.api import RagbitsAPI
from ragbits.chat.interface import ChatInterface
from ragbits.chat.interface.types import ChatContext, ChatResponse, LiveUpdateType
from ragbits.core.embeddings import LiteLLMEmbedder
from ragbits.core.llms import LiteLLM, ToolCall
from ragbits.core.prompt import ChatFormat
from ragbits.core.vector_stores import InMemoryVectorStore
from ragbits.document_search import DocumentSearch
embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
vector_store = InMemoryVectorStore(embedder=embedder)
document_search = DocumentSearch(vector_store=vector_store)
llm = LiteLLM(model_name="gpt-4.1-nano")
agent = Agent(llm=llm, tools=[document_search.search])
class MyChat(ChatInterface):
async def setup(self) -> None:
await document_search.ingest("web://https://arxiv.org/pdf/1706.03762")
async def chat(
self,
message: str,
history: ChatFormat,
context: ChatContext,
) -> AsyncGenerator[ChatResponse]:
async for result in agent.run_streaming(message):
match result:
case str():
yield self.create_live_update(
update_id="1",
type=LiveUpdateType.START,
label="Answering...",
)
yield self.create_text_response(result)
case ToolCall():
yield self.create_live_update(
update_id="2",
type=LiveUpdateType.START,
label="Searching...",
)
case ToolCallResult():
yield self.create_live_update(
update_id="2",
type=LiveUpdateType.FINISH,
label="Search",
description=f"Found {len(result.result)} relevant chunks.",
)
yield self.create_live_update(
update_id="1",
type=LiveUpdateType.FINISH,
label="Answer",
)
if __name__ == "__main__":
api = RagbitsAPI(MyChat)
api.run()
Rapid development
Create Ragbits projects from templates:
uvx create-ragbits-app
Explore create-ragbits-app repo here. If you have a new idea for a template, feel free to contribute!
Documentation
Tutorials - Get started with Ragbits in a few minutes
How-to - Learn how to use Ragbits in your projects
🐰 Ragbits
Building blocks for rapid development of GenAI applications
Homepage | Documentation | Contact
Features
🔨 Build Reliable & Scalable GenAI Apps
📚 Fast & Flexible RAG Processing
🤖 Build Multi-Agent Workflows with Ease
🚀 Deploy & Monitor with Confidence
Installation
Stable Release
To get started quickly, you can install the latest stable release with:
Nightly Builds
For the latest development features, you can install nightly builds that are automatically published from the
mainbranch:Note: Nightly builds include the latest features and bug fixes but may be less stable than official releases. They follow the version format
X.Y.Z.devYYYYMMDDHHMM.Package Contents
This is a starter bundle of packages, containing:
ragbits-core- fundamental tools for working with prompts, LLMs and vector databases.ragbits-agents- abstractions for building agentic systems.ragbits-document-search- retrieval and ingestion piplines for knowledge bases.ragbits-evaluate- unified evaluation framework for Ragbits components.ragbits-guardrails- utilities for ensuring the safety and relevance of responses.ragbits-chat- full-stack infrastructure for building conversational AI applications.ragbits-cli-ragbitsshell command for interacting with Ragbits components.Alternatively, you can use individual components of the stack by installing their respective packages.
Quickstart
Basics
To define a prompt and run LLM:
Document Search
To build and query a simple vector store index:
Retrieval-Augmented Generation
To build a simple RAG pipeline:
Agentic RAG
To build an agentic RAG pipeline:
Chat UI
To expose your GenAI application through Ragbits API:
Rapid development
Create Ragbits projects from templates:
Explore
create-ragbits-apprepo here. If you have a new idea for a template, feel free to contribute!Documentation
Contributing
We welcome contributions! Please read CONTRIBUTING.md for more information.
License
Ragbits is licensed under the MIT License.