Open-source framework for building real-time voice and multimodal AI agents with speech-to-text, LLM processing, and text-to-speech pipelines.
Build AI agents that participate in live voice and video calls β your AI can speak, listen, and respond in real-time conversations.
LiveKit Agents Framework is an open-source Python framework for building real-time voice and multimodal AI agents. It provides the complete pipeline for voice-based agent interactions: speech-to-text transcription, LLM processing, text-to-speech synthesis, and real-time audio/video streaming β all integrated into a coherent framework with low-latency performance.
The framework is built on LiveKit's real-time communication infrastructure, which handles the complex networking, codec management, and streaming protocols required for low-latency audio/video. This means developers focus on agent logic rather than WebRTC, audio processing, and network engineering.
The VoicePipelineAgent is the framework's flagship component. It orchestrates the STTβLLMβTTS pipeline with built-in turn detection, interruption handling, and conversation flow management. The agent can detect when a user stops speaking, process their input, generate a response, and speak it β all with sub-second latency when using optimized providers.
The framework supports multiple STT providers (Deepgram, AssemblyAI, Azure, Google, OpenAI Whisper), LLM providers (OpenAI, Anthropic, Google, local models), and TTS providers (ElevenLabs, OpenAI, Azure, Google, Cartesia). This mix-and-match approach lets developers optimize for quality, latency, and cost across each pipeline stage.
Multimodal support extends beyond voice. Agents can process video input, share their screen, send images, and use vision models β enabling use cases like visual assistants, remote diagnostics, and interactive tutoring. The framework handles the complexity of synchronizing multiple media streams.
LiveKit Agents Framework includes built-in function calling, allowing voice agents to execute tools during conversations. An agent can search databases, call APIs, process files, and control external systems while maintaining a natural voice conversation β a critical capability for practical voice agent applications.
The framework deploys on LiveKit's infrastructure or self-hosted LiveKit servers. It supports horizontal scaling, session management, and load balancing for production deployments serving many concurrent voice sessions.
For teams building voice-first AI agents β customer support, virtual receptionists, voice assistants, interactive tutoring, or accessibility tools β LiveKit Agents Framework provides the most complete open-source solution for real-time, low-latency voice AI.
Was this helpful?
Orchestrates the complete STTβLLMβTTS pipeline with turn detection, interruption handling, and natural conversation flow.
Use Case:
Building a voice customer support agent that listens, thinks, and responds naturally with sub-second latency.
Choose independently among STT, LLM, and TTS providers to optimize each pipeline stage for quality, latency, or cost.
Use Case:
Using Deepgram for fast STT, Claude for quality reasoning, and ElevenLabs for natural-sounding TTS.
Built on LiveKit's WebRTC infrastructure for low-latency audio and video streaming with automatic codec and network management.
Use Case:
Deploying a voice agent accessible via browser, mobile app, or phone with consistent low-latency performance.
Agents can execute tools and API calls during voice conversations while maintaining natural conversational flow.
Use Case:
A voice assistant that can look up order status, update reservations, or control smart home devices mid-conversation.
Process video input, share screens, send images, and use vision models alongside voice for rich multimodal interactions.
Use Case:
Building a visual assistant that can see what the user is showing on camera and provide guidance.
Intelligent detection of when users start and stop speaking, with graceful handling of interruptions and overlapping speech.
Use Case:
Creating a natural voice conversation where the agent responds at appropriate times and handles user interruptions gracefully.
Free
forever
Free
month
Ready to get started with LiveKit Agents Framework?
View Pricing Options βVoice customer support agents
Virtual receptionists
Interactive voice assistants
Multimodal tutoring and coaching
We believe in transparent reviews. Here's what LiveKit Agents Framework doesn't handle well:
End-to-end voice response latency depends on providers chosen. With optimized providers (Deepgram STT, fast LLM, streaming TTS), sub-second response times are achievable.
Yes, LiveKit supports SIP trunking for phone call integration. Agents can receive and make phone calls through connected telephony providers.
LiveKit Agents Framework is open-source and self-hostable with full control. Vapi and Bland are managed platforms that are faster to set up but less flexible and customizable.
Yes, the framework supports local STT models (Whisper), local LLMs (Ollama), and can integrate with local TTS solutions.
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
People who use this tool also find these helpful
Standardized communication protocol for AI agents enabling interoperability and coordination across different agent frameworks.
CLI tool for scaffolding, building, and deploying AI agent projects with best-practice templates, tool integrations, and framework support.
Full-stack platform for building, testing, and deploying AI agents with built-in memory, tools, and team orchestration capabilities.
Lightweight Python framework for building modular AI agents with schema-driven I/O using Pydantic and Instructor.
Latest version of the pioneering autonomous AI agent with enhanced planning, tool usage, and memory capabilities.
IBM's open-source TypeScript framework for building production AI agents with structured tool use, memory management, and observability.
See how LiveKit Agents Framework compares to Vapi and other alternatives
View Full Comparison βVoice Agents
Developer platform for real-time voice AI agents.
Voice Agents
Phone calling API for scalable conversational voice agents.
Voice Agents
Conversational voice infrastructure for call center automation. - Enhanced AI-powered platform providing advanced capabilities for modern development and business workflows. Features comprehensive tooling, integrations, and scalable architecture designed for professional teams and enterprise environments.
Voice Agents
Real-time media infrastructure platform with an integrated agent framework for building voice and video AI assistants that can participate in live conversations. Enables developers to create AI agents that can see, hear, and speak in real-time video calls, with support for spatial audio, screen sharing, and multi-participant interactions.
No reviews yet. Be the first to share your experience!
Get started with LiveKit Agents Framework and see if it's the right fit for your needs.
Get Started βTake our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack βExplore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates β