Last updated:

Introducing DI-0

Robotics and embodied AI training—multimodal sensing, manipulation, and operator workflows for VLA-scale datasets

DI-0 is our first robotics data library: paired human demonstrations and teleoperation traces with synchronized vision, language, and action—built so teams can train and fine-tune advanced vision-language-action (VLA) models for foundational manipulation, mobile robots, and humanoids without stitching ad hoc exports.

What is DI-0?

A curated multimodal dataset—not a weights release. Episodes combine what cameras see, what human demonstrators or teleoperators do, and how robot state evolves over time. That alignment is what modern VLAs need to connect natural-language intent to contact-rich behavior in the real world.

What the library includes

DI-0 is structured for embodied pretraining and fine-tuning:

  • Human demonstration segments: synchronized video or depth, proprioception, gripper or hand state, and time-aligned language intent where available.
  • Teleoperation data: expert pilots driving arms, mobile manipulators, or humanoids—including recoveries and corrections you rarely capture in scripted simulation-only corpora.
  • Embodiment-aware tracks: calibrations and action formats that map across industrial arms, AMRs with manipulators, and bipedal stacks sharing common training interfaces.
  • Operational metadata: scene and skill tags, success and failure boundaries, consent and retention flags—so teams know which slices belong in public research mixes versus locked-down fine-tunes.

How teams use DI-0

Typical workflows look like this:

  • Pretraining mix-ins: add DI-0 slices to large VLA pools to improve contact, clutter, and language grounding before site-specific tuning.
  • Private fine-tuning: combine DI-0 with your own Ground-Log exports to close the sim-to-real gap without leaking sensitive facility detail.
  • Humanoid bootstrapping: lean on teleoperation-heavy trajectories to teach whole-body coordination before closed-loop policies run on hardware.
  • Shared benchmarks: evaluate checkpoints on held-out DI-0 splits so labs compare architectures in one embodied metric space—not only internet video baselines.

Curation and access

Useful physical AI data is as much filtering as collection:

  • Quality gates: remove degenerate episodes, sensor dropouts, and redundant traversals so every hour of disk teaches something new.
  • Deduplication: embedding- and trajectory-space dedup to prevent overfitting to the same aisle or workcell.
  • Label contracts: sparse language spans tied to temporal segments instead of vague file-level captions.
  • Licensed access: partner and academic programs with audit trails, aligned with how we expand into larger DI-1 releases.

Who it is for

DI-0 supports teams who need robotics-native corpora:

  • Research labs training open or proprietary VLAs for dexterous manipulation and human-scale robots.
  • Manufacturers and 3PL pilots validating policies before fleet-wide OTA schedules.
  • Partners benchmarking new model families against shared embodied data—not only scraped web video.
  • Humanoid programs that require rich teleoperation and demonstration signal beyond static pose datasets.

Roadmap

DI-0 is the first slice of a growing program: more sites, embodiments, languages, and edge cases—and tighter coupling to Bench-Fabric regression suites so library improvements show up directly in promotion gates.

Pair DI-0 with Dynamic intelligence capture products—Ground-Log for governed multimodal truth, Fleet-Tape for production replay, and Bench-Fabric for scenario gates—so fine-tunes stay anchored to the floors where robots actually work.

Frequently Asked Questions

A curated library of human demonstration and teleoperation data for training and evaluating VLAs that power foundational robotics and humanoid models.
No. DI-0 accelerates research, pretraining, and public benchmarks; your site-specific Ground-Log data still wins for final sim-to-real closure and safety evidence.
Typically synchronized RGB or depth video, robot state and actions, and aligned instructions or operator commentary—exact sensor stacks vary by collection tranche.
No. The formatting supports arms, mobile manipulators, and humanoids; many teams share representations so policies transfer with fine-tuning.
DI-0 is available to select research and industry partners under license—contact Dynamic intelligence or join the waitlist for academic, evaluation, or commercial programs.