We run speech collection projects for teams that need exact formats, speaker targeting and controlled delivery. Projects can be remote, on-site or hybrid and can include scripted prompts, natural dialogues, dual-channel conversations, separate speaker tracks, or video with audio. Recruitment, recording, transcription, review and file delivery are managed in one workflow.
Collection can be designed around technical requirements, speaker criteria and recording conditions. We support both large-volume recruitment and smaller controlled studies where the brief is narrow and the acceptance criteria are strict.
Final setup depends on the project. We can work to fixed collection standards or define the specification together before launch.
| Audio formats | WAV delivery at 16 kHz, 24 kHz or 48 kHz depending on the brief. Mono, stereo or dual-channel capture can be used. Bit depth can be prepared to match project requirements, including 16-bit, 24-bit and 32-bit workflows where required. |
| Channel setup | Single-speaker recordings, shared-track conversations, or dual-channel capture with one speaker per channel at source. Separate speaker tracks can be delivered directly instead of relying on post-session diarization alone. |
| Video with audio | Projects can include synchronized video and audio when facial movement, screen interaction or other visual context is needed alongside speech. |
| Metadata | Metadata can include speaker ID, language, dialect, age band, gender, region, device, environment, topic, session ID, consent link and project-specific labels. Delivery can be structured as CSV, JSON or manifest-based packaging. |
| Transcripts | Transcription can be included as part of the delivery package, with optional speaker split, timestamps, utterance segmentation and review workflow. Acceptance thresholds and review depth can be defined per project. |
| Delivery | Files can be delivered in batches through secure bucket transfer, API handover or other agreed delivery methods. Naming conventions, manifests and checksums can be included to support ingestion into existing pipelines. |
Projects vary from tightly scripted collections to large dialogue datasets with domain split, speaker balance and multi-country recruitment.
Wake words, short commands, follow-up prompts and in-car voice scenarios collected across languages, accents and age groups. Suitable when the goal is consistent phrasing, controlled prompt coverage and clean file structure.
Natural two-speaker conversations or guided dialogues recorded in one or multiple markets with speaker pairing, scenario control, topic distribution and separate speaker channels where required.
Recruitment can be narrowed to specific dialect groups or geographic areas with local validation, making it possible to build balanced collections for markets where regional variation matters.
Projects can be split by topic and domain such as finance, customer support, navigation, healthcare screening or daily conversation, with separate quotas and metadata tags for each category.
For projects that need both facial and vocal information, contributors can record video and audio in the same session with synchronized delivery and structured metadata.
For briefs that require tighter control over room acoustics, equipment or supervision, we can run on-site collections with local staff and pre-defined quality checks.
The production flow is kept simple: define the brief, recruit the right speakers, monitor quality during collection and deliver a package that is ready to use.
We define format, channel setup, languages, dialects, speaker quotas, recording mode, transcript needs and delivery structure before launch.
Contributors are sourced against the brief through our existing crowd and screened on metadata, device requirements and availability.
Sessions are monitored with automated checks and reviewer sampling so issues can be identified during production rather than after it ends.
Final batches are prepared with the agreed naming, metadata, transcripts and handover format so integration is straightforward on the client side.
Send over the target languages, speaker criteria, file format and delivery requirements. We will map the collection setup around the actual specification.