FA

Fish Audio

Freemium

Hanabi AI · Audio & Voice

Expressive TTS and 15-second voice cloning with emotion control

4.5$15/mo
Supports ArabicAPI
Visit website

Overview

Fish Audio (by Hanabi AI) is an AI voice platform built around expressive, real-time text-to-speech models: emotion tags like [angry], [sad] and [whispering] make narration genuinely lively, voice cloning needs just 15 seconds of audio, speech-to-text includes multispeaker detection, and a community library offers 2,000,000+ voices across 30+ languages including Arabic. The S1 and S2 research models are open-sourced, and a low-latency streaming API serves developers. Plans: Free (8,000 credits ≈ 7 minutes/month, personal use only), Plus at $15/month (250K credits ≈ 200 minutes with commercial use), Pro at $100/month (2M credits, 3 team seats) and Max at $999/month; each generated minute costs roughly 600–625 credits.

Features & specs

Free plan
Yes — 8,000 credits (≈7 min)/month
API
Yes (pay-as-you-go)
Context size
Up to 15,000 chars/generation (Plus)
Languages
30+ languages incl. Arabic
Mobile app
Web platform
Plugins
REST & streaming API

Pros

  • Emotion tags make narration genuinely expressive
  • Voice cloning from just 15 seconds of audio
  • 2,000,000+ community voice library
  • Open-source S1/S2 models
  • 30+ languages including Arabic
  • Low-latency streaming API for developers

Cons

  • Free tier is personal-use only (≈7 minutes/month)
  • Unused credits don't roll over
  • Younger ecosystem than ElevenLabs

Pricing plans

Free
Free

8,000 credits ≈ 7 min/month — personal use

Plus
$15/mo
Pro
$100/mo

Compare Fish Audio with