Runtime
Get started with bitHuman Runtime to create interactive avatars that respond dynamically to audio input.
Getting Started with bitHuman Runtime
bitHuman Runtime enables you to build interactive avatars that respond realistically to audio input. This guide covers installation instructions, a hands-on example, and an overview of the core API features.
Installation
System Requirements
Install bithuman-runtime
Install Additional Dependencies (Audio Playback)
Avatar Model
To run the example, you’ll need a bitHuman avatar model file (.imx
). Avatar models define the visual appearance and behavior of your avatar.
Sample Models
Here are a few sample models you can download and test:
Running the Example
Download the example script and run it using the following command:
Example Controls
Use these keyboard shortcuts during the example:
1
: Play the audio file through the avatar2
: Interrupt current playbackq
: Exit the application
How It Works
This example covers the following steps:
- Initializing bitHuman Runtime using your API token
- Configuring audio and video playback
- Processing audio input and rendering the avatar’s reactions
- Managing user interactions
The avatar animates in sync with the audio input, providing a realistic interactive experience.
API Overview
bitHuman Runtime offers a straightforward yet powerful API for creating interactive avatars.
Core API Components
AsyncBithumanRuntime
Main class to interact with bitHuman services.
- Authenticate using your API token
- Load and manage avatar models
- Process audio input to animate the avatar
- Interrupt ongoing speech using
runtime.interrupt()
AudioChunk
Represents audio data for processing.
- Supports 16kHz, mono, int16 audio format
- Compatible with bytes and numpy arrays
- Provides utilities for duration and format conversions
VideoFrame
Represents visual output data from runtime.
- Contains image data in BGR format (numpy array)
- Includes synchronized audio chunks
- Provides frame metadata such as frame index and message ID
Input/Output Flow
Input
- Send audio data (16kHz, mono, int16 format) into the runtime
- Runtime processes this input to generate corresponding avatar animations
Processing
- Runtime analyzes audio to produce natural facial movements and expressions
- Frames generated at 25 FPS synchronized with audio playback
Output
- Each frame includes visual data (BGR image) and corresponding audio chunk
- Demonstrates real-time frame rendering and audio playback
Runtime WebSocket Example
This example module demonstrates how to stream bitHuman Runtime output through WebSocket. It shows how to implement a web server that handles real-time frame streaming, audio processing, and video recording.
- WebSocket Example: Download