Studio · Two-Speaker Conversation (Colombian Spanish)
Spontaneous two-speaker Colombian Spanish recorded in studio. Stereo, channel-separated (one speaker per channel), 16-bit / 44.1 kHz — the format diarization and ASR pipelines expect.
Each clip below is roughly two minutes of consented, naturally-spoken audio captured for AI training. Together they cover studio capture, remote multispeaker sessions, domain dialogue, code-switching, and accent variation. Below the samples is a snapshot of our active locale coverage. Press play to listen — more samples are available on request.
Spontaneous two-speaker Colombian Spanish recorded in studio. Stereo, channel-separated (one speaker per channel), 16-bit / 44.1 kHz — the format diarization and ASR pipelines expect.
Two audition clips showing how we vet voice talent before booking — a scripted line-read plus a contextual read. Pulled from C8C's PII-redacted auditions library covering New York, Bogotá, and Medellín pools.
A solo-speaker reference — one talent, headset-quality remote capture or booth, demonstrating the per-speaker signal we deliver in every multi-speaker session as channel-separated tracks.
Native Scottish English between two speakers — strong regional accent, naturally paced, the kind of accent diversity off-the-shelf datasets rarely cover.
Three-or-more speaker live Spanish session — natural overlap, back-channelling, and interruption. Each speaker captured on a separate track; mixed for preview here, channel-separated source on request.
Natural mid-utterance code-switching from a bilingual speaker — how multilingual users actually talk, and the kind of input ASR/NLU systems must handle if they're going to ship to global markets.
Welsh English call-center scenario — agent-and-customer dialogue with realistic turn-taking, clarification requests, and natural emotion. Stereo channel-separated for clean speaker separation.
Irish English healthcare scenario — a clinician-and-patient exchange in the telemedicine register. Two-speaker, stereo, channel-separated.
Highly emotive, performance-driven speech — wider pitch and intensity range than conversational data, useful for emotion classifiers, expressive TTS, and prosody-aware models.
Studio-quality scripted read — clean signal, controlled pacing, and consistent loudness. The format we deliver for TTS training, prompt banks, and QA reference recordings.
Short, isolated utterances — wake words, commands, prompt phrases — delivered as clip-per-utterance with metadata. Used for keyword spotters, voice-control models, and Lombard / noise-robustness sets.
en-US American Englishen-IE Irish Englishen-WL Welsh Englishen-AB Scottish Englishen-SG Singapore Englishes-SV Salvadoran Spanishes-CO Colombian Spanishes-MX Mexican Spanishes-US US Spanish ↔ ENes-ES European Spanish ↔ ENpt-BR Brazilian Portuguese ↔ ENpt-PT European Portuguese ↔ ENfr-CA Canadian French ↔ ENfr-FR European French ↔ ENde-DE Standard German ↔ ENit-IT Italian ↔ ENhi-IN Hindi ↔ ENhe-IL Hebrew ↔ ENtl-PH Tagalog (code-switch)ar-AE Arabic (UAE) ↔ ENru-RU Russian ↔ ENnl-NL Dutch ↔ ENno-NO Norwegian ↔ ENda-DK Danish ↔ ENfi-FI Finnish ↔ ENsv-SE Swedish ↔ ENta-IN Tamil ↔ ENth-TH Thai ↔ ENvi-VN Vietnamese ↔ ENtr-TR Turkish ↔ ENko-KR Korean ↔ ENja-JP Japanese ↔ ENzh-CN Chinese (Simplified) ↔ ENzh-HK Chinese (Hong Kong) ↔ ENzh-TW Chinese (Taiwan) ↔ EN