Speech data collection

Speech collection built to specification

We run speech collection projects for teams that need exact formats, speaker targeting and controlled delivery. Projects can be remote, on-site or hybrid and can include scripted prompts, natural dialogues, dual-channel conversations, separate speaker tracks, or video with audio. Recruitment, recording, transcription, review and file delivery are managed in one workflow.

10,000+ Available recorders and transcribers across multiple markets

EU-wide Coverage across languages, dialects, regions and age groups

Remote + on-site Flexible setup for home recording, field recording or controlled environments

Structured delivery Audio, transcripts, metadata and manifests prepared for direct handover

What we collect

Collection can be designed around technical requirements, speaker criteria and recording conditions. We support both large-volume recruitment and smaller controlled studies where the brief is narrow and the acceptance criteria are strict.

Recording modes

Scripted prompts and sentence reading
Wake words, command phrases and voice assistant scenarios
Free speech and prompted responses
Natural dialogues between two speakers
Role-play conversations for domain-specific speech
On-site recording with controlled acoustics

Speaker targeting

Language, country, region and dialect
Age bands and gender split
Native or near-native fluency requirements
Device type and recording environment
Recruitment by topic familiarity or profession where relevant
Local review and dialect verification when needed

Quality control

Hardware and microphone checks before recording starts
Monitoring for clipping, low signal, silence and background noise
Prompt completion and format validation
Metadata checks against the project brief
Reviewer sampling during production
Batch approval before final delivery

Technical specifications

Final setup depends on the project. We can work to fixed collection standards or define the specification together before launch.

Audio formats	WAV delivery at 16 kHz, 24 kHz or 48 kHz depending on the brief. Mono, stereo or dual-channel capture can be used. Bit depth can be prepared to match project requirements, including 16-bit, 24-bit and 32-bit workflows where required.
Channel setup	Single-speaker recordings, shared-track conversations, or dual-channel capture with one speaker per channel at source. Separate speaker tracks can be delivered directly instead of relying on post-session diarization alone.
Video with audio	Projects can include synchronized video and audio when facial movement, screen interaction or other visual context is needed alongside speech.
Metadata	Metadata can include speaker ID, language, dialect, age band, gender, region, device, environment, topic, session ID, consent link and project-specific labels. Delivery can be structured as CSV, JSON or manifest-based packaging.
Transcripts	Transcription can be included as part of the delivery package, with optional speaker split, timestamps, utterance segmentation and review workflow. Acceptance thresholds and review depth can be defined per project.
Delivery	Files can be delivered in batches through secure bucket transfer, API handover or other agreed delivery methods. Naming conventions, manifests and checksums can be included to support ingestion into existing pipelines.

Typical use cases

Projects vary from tightly scripted collections to large dialogue datasets with domain split, speaker balance and multi-country recruitment.

In-car and device commands

Wake words, short commands, follow-up prompts and in-car voice scenarios collected across languages, accents and age groups. Suitable when the goal is consistent phrasing, controlled prompt coverage and clean file structure.

Multilingual dialogue projects

Natural two-speaker conversations or guided dialogues recorded in one or multiple markets with speaker pairing, scenario control, topic distribution and separate speaker channels where required.

Dialect and regional coverage

Recruitment can be narrowed to specific dialect groups or geographic areas with local validation, making it possible to build balanced collections for markets where regional variation matters.

Domain-specific speech

Projects can be split by topic and domain such as finance, customer support, navigation, healthcare screening or daily conversation, with separate quotas and metadata tags for each category.

Video + speech capture

For projects that need both facial and vocal information, contributors can record video and audio in the same session with synchronized delivery and structured metadata.

On-site controlled recording

For briefs that require tighter control over room acoustics, equipment or supervision, we can run on-site collections with local staff and pre-defined quality checks.

How projects are run

The production flow is kept simple: define the brief, recruit the right speakers, monitor quality during collection and deliver a package that is ready to use.

Specification

We define format, channel setup, languages, dialects, speaker quotas, recording mode, transcript needs and delivery structure before launch.

Recruitment

Contributors are sourced against the brief through our existing crowd and screened on metadata, device requirements and availability.

Collection and review

Sessions are monitored with automated checks and reviewer sampling so issues can be identified during production rather than after it ends.

Delivery

Final batches are prepared with the agreed naming, metadata, transcripts and handover format so integration is straightforward on the client side.

Need a collection brief reviewed?

Send over the target languages, speaker criteria, file format and delivery requirements. We will map the collection setup around the actual specification.

Book meeting