Quick Start Guide

bitHuman SDK enables you to build interactive agents that respond realistically to audio input. This guide covers installation instructions, a hands-on example, and an overview of the core API features.

Installation

System Requirements

Install bithuman

pip install bithuman

Agent Model Setup

To run the example, you’ll need to obtain the necessary credentials and models from the bitHuman platform. Follow these steps:

1

Access Developer Settings

  1. Login to bitHuman Platform
  2. Click the “Developer settings” button in the top-right corner of the page

2

Get API Secret

  1. In the Developer Settings page, navigate to the “API Secrets” section in the left sidebar
  2. Click the “Reveal” button to view your API secret

3

Download Model

We provide several pre-trained models that you can use directly:

Running the Example

Download the example script and run it using the following command:

export BITHUMAN_API_SECRET='your_secret'

python example.py \
    --audio-file '/path/to/audio/file.mp3' \
    --model '/path/to/model/agent.imx'

Example Controls

Use these keyboard shortcuts during the example:

  • 1: Play the audio file through the agent
  • 2: Interrupt current playback
  • q: Exit the application

How It Works

This example covers the following steps:

  1. Initializing bitHuman SDK using your API token
  2. Configuring audio and video playback
  3. Processing audio input and rendering the agent’s reactions
  4. Managing user interactions

The agent animates in sync with the audio input, providing a realistic interactive experience.

API Overview

bitHuman SDK offers a straightforward yet powerful API for creating interactive agents.

Core API Components

AsyncBithuman

Main class to interact with bitHuman services.

runtime = await AsyncBithuman.create(
    model_path="path/to/model.imx",
    api_secret="your_api_secret",
)
  • You can authenticate using an API secret
  • Load and manage models
  • Process audio input to animate the agent
  • Interrupt ongoing speech using runtime.interrupt()

AudioChunk

Represents audio data for processing.

  • Supports 16kHz, mono, int16 audio format
  • Compatible with bytes and numpy arrays
  • Provides utilities for duration and format conversions

VideoFrame

Represents visual output data from sdk.

  • Contains image data in BGR format (numpy array)
  • Includes synchronized audio chunks
  • Provides frame metadata such as frame index and message ID

Input/Output Flow

Input

  • Send audio data (16kHz, mono, int16 format) into the sdk
  • SDK processes this input to generate corresponding agent animations

Processing

  • SDK analyzes audio to produce natural character movements and expressions
  • Frames generated at 25 FPS synchronized with audio playback

Output

  • Each frame includes visual data (BGR image) and corresponding audio chunk
  • Demonstrates real-time frame rendering and audio playback