目录
contributions-history

Realtime AI

A high-performance real-time AI framework for audio and video processing

Go Version License


Overview

Realtime AI is a WebRTC-based framework for building low-latency AI applications with audio and video. It features a modular pipeline architecture inspired by GStreamer, enabling you to compose processing elements for speech recognition, LLM interactions, and text-to-speech.

Architecture:

Client (Browser) → WebRTC Gateway → AI Pipeline
                                    (Decode → STT → LLM → TTS → Encode)

Features

  • 🎯 Low Latency - WebRTC for real-time audio/video streaming
  • 🔌 Modular Pipelines - Composable processing elements
  • 🤖 AI Integrations - Gemini, OpenAI Realtime API, Azure STT/TTS
  • Interruption Support - Natural conversation flow

Quick Start

Installation

macOS:

brew install opus ffmpeg go

Ubuntu/Debian (推荐使用安装脚本):

# 使用预编译 FFmpeg (更稳定)
./scripts/setup-ffmpeg.sh
eval "$(./scripts/setup-ffmpeg.sh --env)"

# 安装其他依赖
apt-get install pkg-config libopus-dev

Ubuntu/Debian (手动安装):

apt-get install pkg-config libopus-dev libavcodec-dev libavformat-dev libavutil-dev libswresample-dev

Setup:

git clone https://github.com/realtime-ai/realtime-ai.git
cd realtime-ai
go mod download

Run Example

# Set API key
export GOOGLE_API_KEY="your_api_key"

# Run Gemini assistant
go run examples/gemini-assis/main.go

# Open browser
open http://localhost:8080

Basic Usage

// Create pipeline
pipeline := pipeline.NewPipeline("assistant")

// Add and link elements
resample := elements.NewAudioResampleElement("resample")
gemini := elements.NewGeminiElement("gemini", apiKey)
audioPacer := elements.NewAudioPacerSinkElement("audioPacer")

pipeline.Link(resample, gemini)
pipeline.Link(gemini, audioPacer)

// Start processing
pipeline.Start(ctx)

Documentation

Project Structure

pkg/
├── pipeline/      # Core pipeline system
├── elements/      # AI, codecs, and processing elements
├── connection/    # WebRTC abstractions
├── server/        # HTTP/WebRTC server
└── audio/         # Audio utilities

examples/
├── gemini-assis/  # Gemini multimodal assistant
├── local-assis/   # Local connection example
└── openai-realtime/ # OpenAI Realtime API

License

Apache License 2.0 - see LICENSE for details.

Status

⚠️ Active Development - APIs may change without notice.


Made with ❤️ by the Realtime AI Team
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号