Voice APIs

Speech to Text

Popular

Convert audio to text with punctuation and speaker diarization.

POST /api/v1/voice/speech-to-text
{ "audio_url": "https://...audio.mp3",
  "language": "en-US",
  "diarization": true }

// Response
{ "transcript": "Hello, how are you?",
  "speakers": [
    { "speaker": 1, "text": "Hello" },
    { "speaker": 2, "text": "how are you?" }
  ], "confidence": 0.96 }

Punctuation Speaker Diarization Multi-Language

Text to Speech

Generate natural sounding speech from text.

POST /api/v1/voice/text-to-speech
{ "text": "Welcome to our service",
  "voice": "en-US-Neural2-F",
  "format": "mp3",
  "speed": 1.0 }

// Response
{ "audio_url": "https://...output.mp3",
  "duration_seconds": 2.4,
  "characters_used": 23 }

Neural Voices Multiple Formats SSML Support

Voice Cloning

Advanced

Clone voices for personalized TTS.

POST /api/v1/voice/clone
{ "sample_audio": "https://...sample.mp3",
  "voice_name": "custom-voice-1" }

// Response
{ "voice_id": "vc_abc123",
  "status": "ready",
  "quality_score": 0.92 }

// Use cloned voice
POST /api/v1/voice/text-to-speech
{ "text": "Hello",
  "voice_id": "vc_abc123" }

Custom Voices High Fidelity Quick Training

Audio Transcription

Transcribe audio files with timestamps.

POST /api/v1/voice/transcribe
{ "audio_url": "https://...meeting.mp3",
  "timestamps": true,
  "format": "srt" }

// Response
{ "segments": [
  { "start": 0.0, "end": 2.5,
    "text": "Welcome everyone" },
  { "start": 2.8, "end": 5.1,
    "text": "Let's begin the meeting" }
], "duration": 3600 }

Timestamps SRT/VTT Export Long Audio

Voice Analysis

Analyze voice for emotion, age, gender.

POST /api/v1/voice/analyze
{ "audio_url": "https://...voice.mp3" }

// Response
{ "emotion": {
    "primary": "happy",
    "confidence": 0.87 },
  "demographics": {
    "gender": "female",
    "age_range": "25-35" },
  "audio_quality": 0.94 }

Emotion Detection Demographics Quality Score

Real-time STT

WebSocket

Streaming speech recognition via WebSocket.

// Connect to WebSocket
ws://api.webfunctions.net/v1/voice/stream
{ "api_key": "your_key",
  "language": "en-US",
  "interim_results": true }

// Stream audio chunks...

// Receive transcripts
{ "type": "interim",
  "text": "Hello" }
{ "type": "final",
  "text": "Hello, world!",
  "confidence": 0.98 }