Last updated:
Embodied AI evaluation & sim-to-real

Robot learning is converging on large multimodal policies—vision-language-action models, memory over long horizons, and online RL after deployment. Public research writeups show how quickly the policy side is moving.
Dynamic intelligence’s research program asks the complementary question: how do we know those policies are improving in *your* building? Project Alpha-Index is our umbrella for benchmarks that stress manipulation, navigation, latency, and recovery—not leaderboard scores alone.
We publish scenario suites that pair photorealistic simulation with slices of real telemetry, so teams can measure transfer before shipping a new checkpoint. Metrics emphasize success under clutter, human proximity, sensor dropout, and multi-hour tasks.
The goal is an open-style reporting surface for partners: compare runs, replay failures, and attach evidence to promotion decisions. Model groups iterate on nets; we iterate on the data and evaluation contracts that make iteration safe.