Streaming Avatars with WebRTC

This guide explains how to integrate bitHuman avatars with WebRTC using LiveKit. This integration enables you to stream interactive avatars to multiple remote viewers in real-time while controlling the avatar’s speech and behavior.

Introduction

WebRTC integration allows you to:

  • Stream high-quality avatar animations to multiple viewers simultaneously
  • Control avatar speech and expressions in real-time
  • Create interactive experiences for virtual events, customer service, education, and more

Architecture Overview

The integration consists of three main components working together:

  1. bitHuman Runtime: Generates the avatar animations and audio
  2. LiveKit Server: Handles WebRTC streaming and room management
  3. Client Applications: View and interact with the avatar

Data Flow

The process works as follows:

  1. The bitHuman server initializes the runtime with your avatar model
  2. The server connects to a LiveKit room and publishes video and audio tracks
  3. A WebSocket server accepts audio input to control the avatar
  4. Audio data is processed by the bitHuman runtime to generate animations
  5. The animated avatar is streamed to the LiveKit room in real-time
  6. Viewers connect to the LiveKit room to see and interact with the avatar

Prerequisites

Before you begin, ensure you have:

  • Python 3.10 or higher
  • bitHuman Runtime API token
  • LiveKit server (cloud or self-hosted)
  • LiveKit API key and secret

Implementation Guide

Step 1: Install Dependencies

First, install the required Python packages:

pip install bithuman-runtime livekit livekit-api websockets python-dotenv

Step 2: Set Up Environment Variables

Create a .env file with your API credentials:

# bitHuman Configuration
BITHUMAN_RUNTIME_TOKEN=your_bithuman_token

# LiveKit Configuration
LIVEKIT_URL=wss://your-livekit-server.com
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret

Step 3: Create the bitHuman Server

The bitHuman server (download example) handles:

  • Initializing the bitHuman runtime
  • Connecting to LiveKit
  • Publishing video and audio tracks
  • Accepting audio input via WebSocket

Key components of the server:

class bitHumanLiveKitServer:
    def __init__(self, avatar_model, room_name, token=None, livekit_url=None, 
                 api_key=None, api_secret=None, identity="bithuman-avatar", 
                 ws_port=8765):
        # Initialize configuration
        self.avatar_model = avatar_model
        self.room_name = room_name
        self.token = token or os.getenv("BITHUMAN_RUNTIME_TOKEN")
        self.livekit_url = livekit_url or os.getenv("LIVEKIT_URL")
        self.api_key = api_key or os.getenv("LIVEKIT_API_KEY")
        self.api_secret = api_secret or os.getenv("LIVEKIT_API_SECRET")
        self.identity = identity
        self.ws_port = ws_port
        
        # Initialize runtime and connections later
        self.runtime = None
        self._room = None
        self._ws_server = None
        
    async def start(self):
        # Initialize bitHuman runtime
        self.runtime = await bitHumanRuntime.create(token=self.token)
        await self.runtime.load_avatar(self.avatar_model)
        
        # Connect to LiveKit and start WebSocket server
        await self._connect_to_livekit()
        await self._start_websocket_server()
        
    # Additional methods for LiveKit connection, WebSocket handling, etc.

Step 4: Create the WebSocket Client

The WebSocket client (download example) sends audio data to control the avatar:

class AudioStreamerClient:
    def __init__(self, ws_url="ws://localhost:8765", chunk_size_ms=100, sample_rate=16000):
        self.ws_url = ws_url
        self.chunk_size_ms = chunk_size_ms
        self.sample_rate = sample_rate
        
    async def stream_audio_file(self, audio_file_path):
        # Load and stream audio file to WebSocket server
        audio_data, file_sample_rate = load_audio_file(audio_file_path)
        
        # Resample if needed
        if file_sample_rate != self.sample_rate:
            audio_data = resample(audio_data, file_sample_rate, self.sample_rate)
            
        # Calculate chunk size in samples
        chunk_size = int(self.sample_rate * self.chunk_size_ms / 1000)
        
        # Stream chunks to WebSocket
        async with websockets.connect(self.ws_url) as websocket:
            for i in range(0, len(audio_data), chunk_size):
                chunk = audio_data[i:i+chunk_size]
                await websocket.send(chunk.tobytes())
                await asyncio.sleep(self.chunk_size_ms / 1000)  # Simulate real-time
                
            # Signal end of audio
            await websocket.send(json.dumps({"type": "end"}))

Step 5: Run the Server

Start the bitHuman server with your avatar model and room name:

python bithuman_server.py --avatar-model "default" --room "my-avatar-room"

Command-line options:

OptionDescriptionDefault
--avatar-modelbitHuman avatar model to use(required)
--roomLiveKit room name(required)
--tokenbitHuman runtime tokenFrom .env
--livekit-urlLiveKit server URLFrom .env
--api-keyLiveKit API keyFrom .env
--api-secretLiveKit API secretFrom .env
--identityIdentity in LiveKit room”bithuman-avatar”
--ws-portWebSocket server port8765

Step 6: Control the Avatar

Use the WebSocket client to send audio to the avatar:

# Stream an audio file
python websocket_client.py stream path/to/audio.wav

# Send an interrupt command
python websocket_client.py interrupt

Client options:

OptionDescriptionDefault
--ws-urlWebSocket server URL”ws://localhost:8765”
--sample-rateTarget sample rate for audio16000
--chunk-sizeSize of audio chunks in milliseconds100

Step 7: View the Avatar

To view the avatar, you can:

  1. Use the LiveKit Playground
  2. Create a custom web client using LiveKit Web SDK
  3. Build a mobile app with LiveKit iOS/Android SDKs

Advanced Configuration

WebSocket Protocol

The WebSocket server accepts:

  • Binary data: Raw audio bytes (16kHz, mono, int16 format)
  • JSON commands:
    {"type": "interrupt"}  // Stop current speech
    {"type": "end"}        // End of audio stream
    

LiveKit Room Settings

Customize video quality by modifying the TrackPublishOptions:

video_options = rtc.TrackPublishOptions(
    source=rtc.TrackSource.SOURCE_CAMERA,
    video_encoding=rtc.VideoEncoding(
        max_framerate=25,
        max_bitrate=5_000_000,  // 5 Mbps
    ),
)

Using LiveKit Data Streams Instead of WebSockets

For a more integrated approach, you can use LiveKit’s Data Streams feature instead of a separate WebSocket server:

async def setup_livekit_data_streams(self):
    # Register handler for audio data
    self._room.register_byte_stream_handler("audio-data", self._handle_audio_stream)

async def _handle_audio_stream(self, reader, participant_info):
    """Handle incoming audio data from LiveKit Data Stream."""
    logger.info(f"Receiving audio stream from {participant_info.identity}")
    
    # Process chunks as they arrive
    async for chunk in reader:
        await self.runtime.push_audio(
            chunk,
            sample_rate=16000,
            last_chunk=False
        )
    
    # End of stream
    await self.runtime.flush()

Learn more about LiveKit Data Streams and Remote Method Calls.

Troubleshooting

Common Issues

IssueSolution
Connection errorsVerify your LiveKit server URL is correct and accessible
Authentication errorsCheck your API key and secret
Audio not playingEnsure audio files are in a supported format (WAV recommended)
WebSocket connection issuesCheck if the WebSocket port is blocked by a firewall

Logging

Both scripts use the loguru library for logging. Adjust the log level for more detailed information:

logger.remove()
logger.add(sys.stdout, level="DEBUG")  // Change from "INFO" to "DEBUG"

Resources