Fish Audio
FreemiumHanabi AI · Audio & Voice
Expressive TTS and 15-second voice cloning with emotion control
Overview
Fish Audio (by Hanabi AI) is an AI voice platform built around expressive, real-time text-to-speech models: emotion tags like [angry], [sad] and [whispering] make narration genuinely lively, voice cloning needs just 15 seconds of audio, speech-to-text includes multispeaker detection, and a community library offers 2,000,000+ voices across 30+ languages including Arabic. The S1 and S2 research models are open-sourced, and a low-latency streaming API serves developers. Plans: Free (8,000 credits ≈ 7 minutes/month, personal use only), Plus at $15/month (250K credits ≈ 200 minutes with commercial use), Pro at $100/month (2M credits, 3 team seats) and Max at $999/month; each generated minute costs roughly 600–625 credits.
Features & specs
- Free plan
- Yes — 8,000 credits (≈7 min)/month
- API
- Yes (pay-as-you-go)
- Context size
- Up to 15,000 chars/generation (Plus)
- Languages
- 30+ languages incl. Arabic
- Mobile app
- Web platform
- Plugins
- REST & streaming API
Pros
- +Emotion tags make narration genuinely expressive
- +Voice cloning from just 15 seconds of audio
- +2,000,000+ community voice library
- +Open-source S1/S2 models
- +30+ languages including Arabic
- +Low-latency streaming API for developers
Cons
- -Free tier is personal-use only (≈7 minutes/month)
- -Unused credits don't roll over
- -Younger ecosystem than ElevenLabs
Pricing plans
8,000 credits ≈ 7 min/month — personal use