# runanything.ai

OpenAI-compatible speech APIs: text-to-speech (Kokoro-82M, 28 voices) and
speech-to-text (distil-whisper large-v3). Drop-in for OpenAI's audio
endpoints — change the base URL and key, keep the SDK. Private beta: request
an API key by emailing help@runanything.ai (include your name, project, and
expected requests/day).

Base URL: https://runanything.ai/v1
Auth: `Authorization: Bearer sk-ra-...` on every request.
Errors: OpenAI wire format `{"error": {"message", "type", "param", "code"}}`.
401 => invalid_api_key, 429 => rate_limit_exceeded (honor Retry-After),
413 => file_too_large, 502/504 => upstream failure/timeout (retryable).

## POST /v1/audio/speech — text to speech

JSON body:
- input (string, required): text to speak, 1–4096 chars
- voice (string, required): a Kokoro id (af_heart, af_bella, af_nicole,
  af_aoede, af_kore, af_sarah, af_nova, af_sky, af_alloy, af_jessica,
  af_river, am_michael, am_fenrir, am_puck, am_echo, am_eric, am_liam,
  am_onyx, am_santa, am_adam, bf_emma, bf_isabella, bf_alice, bf_lily,
  bm_george, bm_fable, bm_lewis, bm_daniel) or an OpenAI alias
  (alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse)
- model (string, optional): kokoro-82m (default); aliases tts-1, tts-1-hd,
  gpt-4o-mini-tts
- response_format (string, optional): mp3 (default) | wav | aac | pcm.
  opus/flac unsupported (400). pcm streams raw s16le mono 24 kHz progressively
  (header X-Sample-Rate: 24000) — the format to use for time-to-first-audio.
- speed (number, optional): 0.25–4.0, default 1.0

Response: audio bytes (audio/mpeg, audio/wav, audio/aac, or audio/pcm).

Example:
curl https://runanything.ai/v1/audio/speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"kokoro-82m","input":"Hello world.","voice":"af_heart"}' \
  --output speech.mp3

## POST /v1/audio/transcriptions — speech to text

multipart/form-data body:
- file (required): webm, mp4, ogg, wav, or mp3; max 4 MB (beta limit)
- model (optional): distil-whisper-large-v3 (default); aliases whisper-1,
  gpt-4o-transcribe, gpt-4o-mini-transcribe
- language (optional): ISO-639-1 hint; defaults to English (the model is
  English-optimized)
- response_format (optional): json (default, {"text": "..."}) | text
  (plain text) | verbose_json ({task, language, duration, text, segments}).
  srt/vtt unsupported (400).
- prompt, temperature: accepted, ignored

Silence returns 200 with empty text (not an error).

Example:
curl https://runanything.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file=@recording.wav \
  -F model=distil-whisper-large-v3

## GET /v1/models

Lists available model ids (canonical + accepted aliases), OpenAI list shape.

## OpenAI SDK usage

Python:  OpenAI(base_url="https://runanything.ai/v1", api_key="sk-ra-...")
JS:      new OpenAI({ baseURL: "https://runanything.ai/v1", apiKey: "sk-ra-..." })
Then client.audio.speech.create(...) / client.audio.transcriptions.create(...)
work as with OpenAI.

## Docs

- https://runanything.ai/docs (quickstart)
- https://runanything.ai/docs/text-to-speech
- https://runanything.ai/docs/speech-to-text
- https://runanything.ai/docs/voices
- https://runanything.ai/docs/errors-and-limits

Contact: help@runanything.ai