Datasets — Midcentury

Egocentric Industrial Vision

One of the largest available first-person industrial datasets. Recorded across real manufacturing, logistics, and field service environments — the kind of data that teaches models how people actually move through and manipulate the world.

Annotated with task boundaries, hand-object interaction labels, environment metadata, and gaze proxies. Designed for training embodied agents, world models, and long-horizon planning systems.

1M+hours of footage

30+industrial environments

4Kresolution

Conversational Voice Suite

A large-scale collection of natural, multi-speaker conversational audio spanning languages that are chronically underrepresented in frontier model training. Every recording is paired with dialect metadata, speaker diarization, and transcription.

Languages include Hindi, Arabic, Finnish, and others. Structured for conversational AI training, ASR fine-tuning, TTS voice modeling, and dialogue system evaluation.

10k+hours of audio

5+languages

Multispeaker formats

3D Gameplay Interaction

Licensed in partnership with several leading AAA-quality game studios — physically consistent 3D worlds with perfect ground truth at impossible-to-replicate scale. Designed for training spatial reasoning, physics intuition, and interaction priors for embodied agents and world models.

AAAstudio partners

3Dground truth

Multienvironment types

Robotics Manipulation

A structured collection of dexterous manipulation demonstrations across tabletop, assembly, and unstructured environments. Paired teleoperation and autonomous rollouts with proprioceptive, visual, and force-torque streams. Built for imitation learning, sim-to-real transfer, and manipulation policy evaluation.

Currently in collection. If you have specific requirements, we are taking input from prospective partners now.

Custom data

If none of the above fits what you are building, we design and collect custom datasets. Tell us what capability you are trying to develop.

Get in touch