While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic/query, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as Search Engines (SearxNG, Tavily, LinkUp), Slack, Linear, Jira, ClickUp, Confluence, BookStack, Gmail, Notion, YouTube, GitHub, Discord, Airtable, Google Calendar, Luma, Elasticsearch and more to come.
English | 简体中文
SurfSense
While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic/query, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as Search Engines (SearxNG, Tavily, LinkUp), Slack, Linear, Jira, ClickUp, Confluence, BookStack, Gmail, Notion, YouTube, GitHub, Discord, Airtable, Google Calendar, Luma, Elasticsearch and more to come.
Video
https://github.com/user-attachments/assets/42a29ea1-d4d8-4213-9c69-972b5b806d58
Podcast Sample
https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7
Key Features
💡 Idea:
📁 Multiple File Format Uploading Support
🔍 Powerful Search
💬 Chat with your Saved Content
📄 Cited Answers
🔔 Privacy & Local LLM Support
🏠 Self Hostable
👥 Team Collaboration with RBAC
🎙️ Podcasts
🤖 Deep Agent Architecture
Built-in Agent Tools
Extensible Tools Registry
Contributors can easily add new tools via the registry pattern:
surfsense_backend/app/agents/new_chat/tools/BUILTIN_TOOLSlist inregistry.pyConfigurable System Prompts
📊 Advanced RAG Techniques
ℹ️ External Sources
📄 Supported File Extensions
Audio/Video (via STT Service):
.mp3,.wav,.mp4,.webm, etc.🔖 Cross Browser Extension
FEATURE REQUESTS AND FUTURE
SurfSense is actively being developed. While it’s not yet production-ready, you can help us speed up the process.
Join the SurfSense Discord and help shape the future of SurfSense!
🚀 Roadmap
Stay up to date with our development progress and upcoming features!
Check out our public roadmap and contribute your ideas or feedback:
📋 Roadmap Discussion: SurfSense 2025-2026 Roadmap: Deep Agents, Real-Time Collaboration & MCP Servers
📊 Kanban Board: SurfSense Project Board
How to get started?
Quick Start with Docker 🐳
Linux/macOS:
Windows (PowerShell):
With Custom Configuration (e.g., OpenAI Embeddings):
After starting, access SurfSense at:
Useful Commands:
Installation Options
SurfSense provides multiple options to get started:
SurfSense Cloud - The easiest way to try SurfSense without any setup.
Quick Start Docker (Above) - Single command to get SurfSense running locally.
Docker Compose (Production) - Full stack deployment with separate services.
.envfileManual Installation - For users who prefer more control over their setup or need to customize their deployment.
Docker and manual installation guides include detailed OS-specific instructions for Windows, macOS, and Linux.
Before self-hosting installation, make sure to complete the prerequisite setup steps including:
Tech Stack
BackEnd
FastAPI: Modern, fast web framework for building APIs with Python
PostgreSQL with pgvector: Database with vector search capabilities for similarity searches
SQLAlchemy: SQL toolkit and ORM (Object-Relational Mapping) for database interactions
Alembic: A database migrations tool for SQLAlchemy.
FastAPI Users: Authentication and user management with JWT and OAuth support
Deep Agents: Custom agent framework built on LangGraph for reasoning and acting AI agents with configurable tools
LangGraph: Framework for developing stateful AI agents with conversation persistence
LangChain: Framework for developing AI-powered applications.
LiteLLM: Universal LLM integration supporting 100+ models (OpenAI, Anthropic, Ollama, etc.)
Rerankers: Advanced result ranking for improved search relevance
Hybrid Search: Combines vector similarity and full-text search for optimal results using Reciprocal Rank Fusion (RRF)
Vector Embeddings: Document and text embeddings for semantic search
pgvector: PostgreSQL extension for efficient vector similarity operations
Redis: In-memory data structure store used as message broker and result backend for Celery
Celery: Distributed task queue for handling asynchronous background jobs (document processing, podcast generation, etc.)
Flower: Real-time monitoring and administration tool for Celery task queues
Chonkie: Advanced document chunking and embedding library
Uses
AutoEmbeddingsfor flexible embedding model selectionLateChunkerfor optimized document chunking based on embedding model’s max sequence lengthFrontEnd
Next.js: React framework featuring App Router, server components, automatic code-splitting, and optimized rendering.
React: JavaScript library for building user interfaces.
TypeScript: Static type-checking for JavaScript, enhancing code quality and developer experience.
Vercel AI SDK Kit UI Stream Protocol: To create scalable chat UI.
Tailwind CSS: Utility-first CSS framework for building custom UI designs.
Shadcn: Headless components library.
Motion (Framer Motion): Animation library for React.
DevOps
Docker: Container platform for consistent deployment across environments
Docker Compose: Tool for defining and running multi-container Docker applications
pgAdmin: Web-based PostgreSQL administration tool included in Docker setup
Extension
Manifest v3 on Plasmo
Contribute
Contributions are very welcome! A contribution can be as small as a ⭐ or even finding and creating issues. Fine-tuning the Backend is always desired.
Adding New Agent Tools
Want to add a new tool to the SurfSense agent? It’s easy:
surfsense_backend/app/agents/new_chat/tools/my_tool.pyregistry.py:For detailed contribution guidelines, please see our CONTRIBUTING.md file.
Star History