# runanything.ai OpenAI-compatible speech APIs: text-to-speech (Kokoro-82M, 28 voices) and speech-to-text (distil-whisper large-v3). Drop-in for OpenAI's audio endpoints — change the base URL and key, keep the SDK. Private beta: request an API key by emailing help@runanything.ai (include your name, project, and expected requests/day). Base URL: https://runanything.ai/v1 Auth: `Authorization: Bearer sk-ra-...` on every request. Errors: OpenAI wire format `{"error": {"message", "type", "param", "code"}}`. 401 => invalid_api_key, 429 => rate_limit_exceeded (honor Retry-After), 413 => file_too_large, 502/504 => upstream failure/timeout (retryable). ## POST /v1/audio/speech — text to speech JSON body: - input (string, required): text to speak, 1–4096 chars - voice (string, required): a Kokoro id (af_heart, af_bella, af_nicole, af_aoede, af_kore, af_sarah, af_nova, af_sky, af_alloy, af_jessica, af_river, am_michael, am_fenrir, am_puck, am_echo, am_eric, am_liam, am_onyx, am_santa, am_adam, bf_emma, bf_isabella, bf_alice, bf_lily, bm_george, bm_fable, bm_lewis, bm_daniel) or an OpenAI alias (alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse) - model (string, optional): kokoro-82m (default); aliases tts-1, tts-1-hd, gpt-4o-mini-tts - response_format (string, optional): mp3 (default) | wav | aac | pcm. opus/flac unsupported (400). pcm streams raw s16le mono 24 kHz progressively (header X-Sample-Rate: 24000) — the format to use for time-to-first-audio. - speed (number, optional): 0.25–4.0, default 1.0 Response: audio bytes (audio/mpeg, audio/wav, audio/aac, or audio/pcm). Example: curl https://runanything.ai/v1/audio/speech \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"kokoro-82m","input":"Hello world.","voice":"af_heart"}' \ --output speech.mp3 ## POST /v1/audio/transcriptions — speech to text multipart/form-data body: - file (required): webm, mp4, ogg, wav, or mp3; max 4 MB (beta limit) - model (optional): distil-whisper-large-v3 (default); aliases whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe - language (optional): ISO-639-1 hint; defaults to English (the model is English-optimized) - response_format (optional): json (default, {"text": "..."}) | text (plain text) | verbose_json ({task, language, duration, text, segments}). srt/vtt unsupported (400). - prompt, temperature: accepted, ignored Silence returns 200 with empty text (not an error). Example: curl https://runanything.ai/v1/audio/transcriptions \ -H "Authorization: Bearer YOUR_API_KEY" \ -F file=@recording.wav \ -F model=distil-whisper-large-v3 ## GET /v1/models Lists available model ids (canonical + accepted aliases), OpenAI list shape. ## OpenAI SDK usage Python: OpenAI(base_url="https://runanything.ai/v1", api_key="sk-ra-...") JS: new OpenAI({ baseURL: "https://runanything.ai/v1", apiKey: "sk-ra-..." }) Then client.audio.speech.create(...) / client.audio.transcriptions.create(...) work as with OpenAI. ## Docs - https://runanything.ai/docs (quickstart) - https://runanything.ai/docs/text-to-speech - https://runanything.ai/docs/speech-to-text - https://runanything.ai/docs/voices - https://runanything.ai/docs/errors-and-limits Contact: help@runanything.ai