Getting Started with bitHuman Runtime

bitHuman Runtime enables you to build interactive avatars that respond realistically to audio input. This guide covers installation instructions, a hands-on example, and an overview of the core API features.

Installation

System Requirements

Install bithuman-runtime

pip install bithuman-runtime

Install Additional Dependencies (Audio Playback)

pip install sounddevice

Avatar Model

To run the example, you’ll need a bitHuman avatar model file (.imx). Avatar models define the visual appearance and behavior of your avatar.

Sample Models

Here are a few sample models you can download and test:

Running the Example

Download the example script and run it using the following command:

export BITHUMAN_RUNTIME_TOKEN='your_access_token'

python example.py --audio-file '/path/to/audio/file.mp3' --avatar-model '/path/to/model/avatar.imx'

Example Controls

Use these keyboard shortcuts during the example:

  • 1: Play the audio file through the avatar
  • 2: Interrupt current playback
  • q: Exit the application

How It Works

This example covers the following steps:

  1. Initializing bitHuman Runtime using your API token
  2. Configuring audio and video playback
  3. Processing audio input and rendering the avatar’s reactions
  4. Managing user interactions

The avatar animates in sync with the audio input, providing a realistic interactive experience.

API Overview

bitHuman Runtime offers a straightforward yet powerful API for creating interactive avatars.

Core API Components

AsyncBithumanRuntime

Main class to interact with bitHuman services.

runtime = AsyncBithumanRuntime(token="your_token")
await runtime.set_avatar_model("path/to/model.imx")
  • Authenticate using your API token
  • Load and manage avatar models
  • Process audio input to animate the avatar
  • Interrupt ongoing speech using runtime.interrupt()

AudioChunk

Represents audio data for processing.

  • Supports 16kHz, mono, int16 audio format
  • Compatible with bytes and numpy arrays
  • Provides utilities for duration and format conversions

VideoFrame

Represents visual output data from runtime.

  • Contains image data in BGR format (numpy array)
  • Includes synchronized audio chunks
  • Provides frame metadata such as frame index and message ID

Input/Output Flow

Input

  • Send audio data (16kHz, mono, int16 format) into the runtime
  • Runtime processes this input to generate corresponding avatar animations

Processing

  • Runtime analyzes audio to produce natural facial movements and expressions
  • Frames generated at 25 FPS synchronized with audio playback

Output

  • Each frame includes visual data (BGR image) and corresponding audio chunk
  • Demonstrates real-time frame rendering and audio playback

Runtime WebSocket Example

This example module demonstrates how to stream bitHuman Runtime output through WebSocket. It shows how to implement a web server that handles real-time frame streaming, audio processing, and video recording.