Speech-to-text, TTS, voice analysis
Voice and audio APIs for speech recognition, text-to-speech, and audio analysis.
Convert audio to text with punctuation and speaker diarization.
POST /api/v1/voice/speech-to-text
{ "audio_url": "https://...audio.mp3",
"language": "en-US",
"diarization": true }
// Response
{ "transcript": "Hello, how are you?",
"speakers": [
{ "speaker": 1, "text": "Hello" },
{ "speaker": 2, "text": "how are you?" }
], "confidence": 0.96 }
Generate natural sounding speech from text.
POST /api/v1/voice/text-to-speech
{ "text": "Welcome to our service",
"voice": "en-US-Neural2-F",
"format": "mp3",
"speed": 1.0 }
// Response
{ "audio_url": "https://...output.mp3",
"duration_seconds": 2.4,
"characters_used": 23 }
Clone voices for personalized TTS.
POST /api/v1/voice/clone
{ "sample_audio": "https://...sample.mp3",
"voice_name": "custom-voice-1" }
// Response
{ "voice_id": "vc_abc123",
"status": "ready",
"quality_score": 0.92 }
// Use cloned voice
POST /api/v1/voice/text-to-speech
{ "text": "Hello",
"voice_id": "vc_abc123" }
Transcribe audio files with timestamps.
POST /api/v1/voice/transcribe
{ "audio_url": "https://...meeting.mp3",
"timestamps": true,
"format": "srt" }
// Response
{ "segments": [
{ "start": 0.0, "end": 2.5,
"text": "Welcome everyone" },
{ "start": 2.8, "end": 5.1,
"text": "Let's begin the meeting" }
], "duration": 3600 }
Analyze voice for emotion, age, gender.
POST /api/v1/voice/analyze
{ "audio_url": "https://...voice.mp3" }
// Response
{ "emotion": {
"primary": "happy",
"confidence": 0.87 },
"demographics": {
"gender": "female",
"age_range": "25-35" },
"audio_quality": 0.94 }
Streaming speech recognition via WebSocket.
// Connect to WebSocket
ws://api.webfunctions.net/v1/voice/stream
{ "api_key": "your_key",
"language": "en-US",
"interim_results": true }
// Stream audio chunks...
// Receive transcripts
{ "type": "interim",
"text": "Hello" }
{ "type": "final",
"text": "Hello, world!",
"confidence": 0.98 }
Get your API key and start using Voice APIs in minutes.