Text to speech
POST
/v1/audio/speechGenerate speech from textSynthesizes speech with Kokoro-82M — a sentence takes about a second. The request and response match OpenAI's /v1/audio/speech, so official SDKs work unchanged.
curl https://runanything.ai/v1/audio/speech \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kokoro-82m",
"input": "The quick brown fox jumped over the lazy dog.",
"voice": "bm_george",
"response_format": "wav",
"speed": 1.1
}' \
--output speech.wavRequest body
| Parameter | Type | Description |
|---|---|---|
inputrequired | string | The text to speak. 1–4,096 characters. |
voicerequired | string | A Kokoro voice id (af_heart, bm_george, full list) or an OpenAI voice name (alloy, nova, onyx, …). |
model | string | kokoro-82m (default). tts-1, tts-1-hd, and gpt-4o-mini-tts are accepted aliases for compatibility. |
response_format | string | mp3 (default), wav, aac, or pcm. opus and flac aren't supported yet and return a 400. |
speed | number | Playback speed, 0.25–4.0. Default 1.0. |
Output formats
- mp3 — 96 kbps mono. The default, and what OpenAI SDK code expects when it doesn't pass a format.
- wav — 16-bit PCM, 24 kHz mono, standard RIFF file. Largest output, zero decode cost.
- aac — ADTS AAC. Smallest output at comparable quality.
- pcm — raw signed 16-bit little-endian samples, 24 kHz mono, no container. The only streaming format — see below.
Streaming with pcm
With response_format: "pcm" the response body streams raw audio as it's synthesized — the first bytes arrive before the full clip exists, which is what you want for assistants and anything conversational. The response includes X-Sample-Rate: 24000; samples are s16le mono.
import pyaudio
from openai import OpenAI
client = OpenAI(base_url="https://runanything.ai/v1", api_key="YOUR_API_KEY")
player = pyaudio.PyAudio().open(
format=pyaudio.paInt16, channels=1, rate=24000, output=True
)
with client.audio.speech.with_streaming_response.create(
model="kokoro-82m",
voice="af_heart",
input="Audio starts playing before this sentence finishes generating.",
response_format="pcm",
) as response:
for chunk in response.iter_bytes(chunk_size=4096):
player.write(chunk)const res = await fetch("https://runanything.ai/v1/audio/speech", {
method: "POST",
headers: {
Authorization: "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "kokoro-82m",
voice: "af_heart",
input: "Audio starts playing before this sentence finishes generating.",
response_format: "pcm",
}),
});
// res.body is a ReadableStream of raw s16le mono PCM at 24 kHz
const reader = res.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
feedToAudioPipeline(value); // e.g. an AudioWorklet or speaker stream
}The other formats (mp3, wav, aac) are delivered as complete files — fine for clips you save or play whole. If you need time-to-first-audio, use pcm.