The Deployment Gap: Why Real-World Failure Data Is the Binding Constraint in Commercial Robotics

There are roughly 3 billion people on earth doing physical labor for a living. The money is following the thesis — $13.8 billion in robotics venture funding in 2025 alone. But the robots aren't ready. The models work in the lab. The problem is what happens when you put them in a real setting. The only thing that closes that gap is a very specific kind of data that almost nobody is producing.

Skim the paper

Eight stops. One argument.

Use skim mode to collapse the long-form body into the thesis, signals, and key numbers for each chapter. Open any section when you want the full argument.

Core claim

The $10 Trillion Opportunity Nobody Can Reach

Capital, hardware progress, and broad pretraining are real. The missing high-signal layer is failure-rich deployment data from real workplaces.

3B physical workers$13.8B robotics funding

Capital

The Capital Is Here

Investor conviction is not the blocker. Billions have already poured into robotics, and round sizes are still rising.

$13.8B in 2025$3.7B humanoids

Constraint

The Data Gap That's Holding Everything Back

Robotics lacks the raw material that made language AI possible: abundant, structured real-world interaction data.

270K real-world hours7 years at Covariant

Hierarchy

Not All Data Is Equal

Cheap data is becoming abundant. Recovery data is the apex layer, not the whole pyramid.

30x pretraining efficiency10x slower teleop

Gap

The Deployment Gap

Lab performance collapses under real-world variance. Deployment, not more demo polish, is what closes the reliability gap.

95% -> 60%99.9% commercial target

Method

Capturing Failure Is the Method

The loop is simple: deploy, fail, correct, retrain. The hard part is collecting those corrections with taste and consistency.

50-70% -> 90%+13-hour deployment run

Geography

Why Ethiopia. Why Now.

Ethiopia is the wedge because it delivers more environment diversity per dollar, not because it is a cheap-labor story.

More diversity per dollar50+ labs deploying now

Ashmyr

Ashmyr: The Infrastructure Layer That's Missing

Ashmyr is the shared rollout layer inside a broader robotics stack: pre-contracted sites, trained operators, recovery capture, and reusable field evaluation.

AWS for deploymentNetwork compounds

Opening argument

The $10 Trillion Opportunity Nobody Can Reach

Core claim

Capital, hardware progress, and broad pretraining are real. The missing high-signal layer is failure-rich deployment data from real workplaces.

Language models inherited the internet. Robotics has no equivalent corpus of physical interaction.
The constraint is operational: repeated deployment, observed failures, and captured recoveries outside the lab.

3B physical workers$13.8B robotics funding

There are roughly 3 billion people on earth doing physical labor for a living. Manufacturing, warehousing, agriculture, food service, construction, logistics. Jensen Huang calls it "likely the next multitrillion-dollar industry."⁵ Goldman Sachs revised their humanoid robot market projection upward by 6x.³ The money is following the thesis, $13.8 billion in robotics venture funding in 2025 alone.¹

But here's the thing. The robots aren't ready.

It's not the hardware not the AI models, the models are actually remarkably good in the lab. The problem is what happens when you take them out of the lab and put them in a real setting. They break. They get confused by lighting they've never seen. They fail on objects that weren't in the training set. They compound small errors into big ones.

The gap between "works in the lab" and "works in the real world" is the single largest obstacle standing between where robotics is today and the multitrillion-dollar market everyone is betting on. And the only thing that closes that gap is data — a very specific kind of data that almost nobody is producing.

This paper explains what that data is, why it matters, and why the infrastructure to produce it at scale doesn't exist yet.

01 — THE CAPITAL IS HERE

The Capital Is Here

Capital

Investor conviction is not the blocker. Billions have already poured into robotics, and round sizes are still rising.

Robotics funding jumped from $7.8B in 2024 to $13.8B in 2025.
Humanoid funding went from niche to institutional scale in three years.
Sovereign wealth, big tech, and top-tier VC are all in.

$13.8B in 2025$3.7B humanoids

In 2024, robotics startups raised $7.8 billion. In 2025, that number jumped to $13.8 billion — blowing past the previous high of $13.1B set during the 2021 peak.¹ Q1 2026 alone saw $2.26 billion flow in, over 70% of it going straight to warehouse and industrial automation.¹

Humanoid robotics funding went from $239M in 2022 to $3.7 billion in 2025. Average round size nearly tripled — from $49M to $144M.² These aren't seed checks. This is institutional money making conviction-driven bets.

Total robotics capex spend — hardware, installation, deployment — sits at roughly $54 billion, derived from Salesforce Ventures and IFR market data across industrial ($34B), service ($42B), and consumer ($14B) segments, with capex representing approximately 60% of total market value.³⁴ It's growing at 16% annually.

NVIDIA, Bezos, SoftBank, Google, Samsung, Mercedes-Benz, Qatar Investment Authority are all on recent cap tables. Sovereign wealth, big tech, and top-tier VC are all in. Nobody is questioning whether robots will automate physical labor. The question is how fast.

02 — THE DATA GAP THAT'S HOLDING EVERYTHING BACK

The Data Gap That's Holding Everything Back

Constraint

Robotics lacks the raw material that made language AI possible: abundant, structured real-world interaction data.

Leading labs already consume tens to hundreds of thousands of hours of robot experience.
Controlled-lab collection misses the messy failure modes that matter commercially.
Most current suppliers still collect in constrained settings, not live workplaces.

270K real-world hours7 years at Covariant

Think about what happened with language AI. Large language models were scalable because the internet had trillions of words of text sitting there — books, articles, conversations, code — ready to be scraped and trained on. The raw material existed and only needed some organization.

Now think about physical tasks. How many recorded hours exist of a robot picking up a dish, folding a shirt, or assembling a component in a real factory ? With full sensor data, force feedback, and error logs? Basically none. The internet is overflowing with text and images. It has essentially zero recorded physical interaction data.

That's the gap. Brett Adcock, CEO of Figure AI — the most valued humanoid company on earth at $39B — put it plainly: "We have a data problem and a robot problem."³⁷ Karol Hausman from Physical Intelligence said it even more bluntly on the Sequoia Training Data Podcast: "There is no data of robots actually operating in the real world."³⁸

And it's why the most well-funded robotics companies in the world are spending enormous resources just trying to get enough data to make their models work outside the lab.

Physical Intelligence consumed 10,000+ hours of robot experience just to train one model — across 7 robot platforms and 68 different tasks.¹⁵ Generalist AI trained GEN-0 on 270,000 hours of real-world data and is still adding 10,000+ hours every single week.¹⁶ Covariant needed seven years — hundreds of robots, four continents — before their system was reliable enough for commercial use.¹⁷

"There is no substitute for robots experiencing real-world physical variance at production scale."

— Covariant Research Team, RFM-1 (March 2024)

People are trying to fill this gap to little avail. The demand is massive and growing. The supply side is short and fragmented. And the specific data that matters most barely exists.

03 — NOT ALL DATA IS EQUAL

Not All Data Is Equal

Hierarchy

Cheap data is becoming abundant. Recovery data is the apex layer, not the whole pyramid.

Internet video, synthetic data, and open models push the lower tiers of the stack toward commodity.
Simulation is useful, but it breaks when real textures, lighting, and objects diverge.
Recovery trajectories are the training signal that teaches robots how to recover in deployment.

30x pretraining efficiency10x slower teleop

Not all robot training data is the same. There are several types, each serving a different purpose in how robots learn. Here's the hierarchy, from most available to most valuable:

Data Type	What It Is	Learning Phase	Key Example
Internet video	YouTube clips of humans doing tasks	Pretraining (general understanding)	LAPA¹⁹ — trains models 30x more efficiently than traditional approaches
Egocentric video	First-person recordings of human hands working	Pretraining (hand/object understanding)	Scale AI, Asimov¹²¹⁴
Simulated trajectories	Robot practice runs in virtual environments	Skill building (controlled repetition)	NVIDIA Isaac Sim, MuJoCo
Teleoperated demonstrations	A human physically guiding a robot through a task	Fine-tuning (task-specific learning)	DROID²⁵ — 76K demos across 52 buildings
Autonomous rollout data	A robot running on its own in a real environment	Evaluation + edge case discovery	1X, Figure AI factory pilots²¹³⁰
Recovery trajectories	A human correcting a robot at the exact moment it fails	Closing the deployment gap	Physical Intelligence pi0.6²⁷

The bottom rows are where all the value is — and where almost no supply exists.

Researchers proved it. Data scaling laws published at ICLR 2025 established that policy generalization follows a power law with the number of distinct environments in the training set — not the number of demonstrations.³⁹ Their central finding: "Diversity of environments and objects is far more important than the absolute number of demonstrations." Four data collectors working for a single afternoon across diverse settings produced policies that hit 90% success in novel environments. Thousands of demonstrations from a single lab didn't. If you want robots that generalize, you don't need more data from the same place. You need data from more places.

Further: when you separate "the bottleneck is deployment data" from "the bottleneck is just more data" as distinct hypotheses, the evidence splits dramatically. Practitioners who've actually deployed robots — 93% identify deployment-specific data as the binding constraint. Academic literature still frames it as "more data" 32% of the time.⁴⁰ The people closest to the problem know the bottleneck is type-specific, not volume-specific.

The cheap data is getting cheaper. Researchers can now train robot models on YouTube videos of humans doing tasks, and recent work shows this outperforms more expensive approaches by a wide margin.¹⁹ Open-source robot models are also emerging that anyone can download and fine-tune.²⁰ This bottom of the pyramid is becoming commodity.

Simulation is powerful, but has hard limits. Simulation engines like NVIDIA's Isaac Sim and MuJoCo let you build virtual factories and train a robot in them. The appeal is obvious: practice a task thousands of times per hour at zero marginal cost. And simulation is getting better. Robot Utility Models achieved 90% zero-shot success via simulation transfer.⁴¹ X-Sim achieved a 30% improvement without any teleoperation data. Real2Render2Real matched 150 teleoperated demos from a single human demonstration through simulation amplification.

But here's what simulation can't do: it provides task-level diversity — more angles on the same object in the same room — not environment-level diversity. A simulated warehouse is one warehouse with varied parameters. Ten real warehouses are ten genuinely different environments with genuinely different failure modes. The scaling laws³⁹ show that environment diversity is the binding variable. Simulation gets you to the starting line. Deployment data runs the race.

1X Technologies learned this firsthand: they trained a model entirely in simulation, deployed it in real homes, and watched it degrade over 50 days as real-world lighting, textures, and objects deviated from anything the sim had prepared it for.²¹

Sergey Levine — the most cited name in robot learning — framed it at CoRL 2024: real-world cross-embodiment data will become "the easiest and cheapest option," and models trained on deployment data outperform lab-specific models by 50%. But generating it requires deployed robot fleets at scale, creating a chicken-and-egg problem that only shared infrastructure resolves.⁴²

Recovery trajectories are the most valuable data in robotics. This is what happens when a robot gets stuck in the real world and a skilled human steps in to show it the right move. It's not just sensor data — it's a recorded correction at the exact point of failure, capturing what went wrong and what the correct recovery looks like. This is the data that teaches a robot not just how to do a task, but how to handle the thousands of things that can go wrong while doing it. It can only be produced by running real robots in real workplaces, with trained humans watching.

And it's painfully slow to collect. Kyle Vedder surveyed the state of data collection in December 2025: even the best method — having a human physically guide the robot — runs 10x slower than a person doing the task themselves and takes weeks to train operators.²²

The cheap data is getting cheaper. The simulated data has hard limits. And the data that gets robots to commercial reliability — recovery trajectories from real-world failures — is the scarcest and most valuable data in the entire stack.

04 — THE DEPLOYMENT GAP

The Deployment Gap

Gap

Lab performance collapses under real-world variance. Deployment, not more demo polish, is what closes the reliability gap.

A 95% success rate still implies dozens of failures per day at production volume.
Autonomous vehicles solved the same class of problem with real-world miles, not just simulation.
Covariant needed seven years of live operations to reach commercial precision.

95% -> 60%99.9% commercial target

A robot that hits 95% success in the lab drops to 60% the moment when you put it in a real workplace. Commercial deployment demands 99.9%+. At 95%, a robot doing a few thousand tasks a day still fails 50+ times — requiring a human supervisor standing by at all times.²³

The gap from 90% to 99.9% is a 100x reduction in failure rate. And the best humanoids in the world currently run for about two hours before they need help.⁴³ Current demos mask this by staging environments and pre-positioning objects. The jump from two-hour demos to eight-hour shifts requires a fundamentally different data regime.

This isn't a new problem. It's the same challenge autonomous vehicles faced. Tesla has logged billions of miles of real-world driving data. Waymo has driven 30+ million autonomous miles on public roads. Why? You can't simulate your way to reliability. You have to log real miles in real conditions, encounter real failures, and feed those corrections back into training. The entire autonomous vehicle industry is built on this principle and robotics is arriving at the same conclusion.

Covariant is the clearest example. They built one of the most commercially successful robot systems in the world, warehouse bin-picking at 1,000 cycles per hour, 99%+ precision. It took them seven years of continuous deployment across hundreds of robots on four continents to get there.¹⁷ Seven years of encountering failures that didn't exist in any simulation or lab. Seven years of collecting the exact kind of real-world recovery data we've been talking about. And when Covariant was acquired, the acquirer didn't buy the model they bought the deployment data the model was trained on. The data was worth more than the model. Peter Chen, Covariant's co-founder, stated it directly: "Factory-level deployment is the unlock."⁴⁹

What closes the deployment gap isn't more data from labs. It's not more simulation. It's deployment data — collected in real workplaces, from real failures, with real corrections. That's the only path from 60% to 99.9%. And right now, there is no scalable way to produce it.

Lastly. a trained model is a snapshot. The real world is a stream. Stanford researchers identified three mechanisms by which deployed policies degrade over time: open-ended variability in real environments, distribution shift between training and deployment conditions, and compounding errors where small inaccuracies cascade into task failure.⁴⁴ Their data flywheel demonstration improved a vision-language model from 32% to 71.8% through iterative deployment cycles that fed real-world experience back into training.⁴⁵ Commercial viability requires a data pipeline. Continuous, diverse, and failure-rich.

05 — CAPTURING FAILURE IS THE METHOD

Capturing Failure Is the Method

Method

The loop is simple: deploy, fail, correct, retrain. The hard part is collecting those corrections with taste and consistency.

DAgger-style correction at the exact moment of failure is what materially improves performance.
Recovery data quality matters more than raw demonstration count.
The scarce operator is the person who can curate the right correction, not just log an intervention.

50-70% -> 90%+13-hour deployment run

There's a well-established method for closing this gap. Researchers call it DAgger (Dataset Aggregation). You deploy a robot, let it work until it gets stuck, have a human show it the right move to recover, and feed that correction back into training. Repeat thousands of times:

Deploy → Robot Fails → Human Corrects → Retrain → Deploy Better Version

Researchers tested the components empirically and found that human corrections during real-world operation drive policy improvement more effectively than any other data type.⁴⁶ The correction at the point of failure is maximally informative — it targets exactly where the policy's training distribution breaks down.

Physical Intelligence — arguably the most credible robotics AI lab in the world — proved this works at scale. They published results from a system that does exactly this: let the robot run in the real world, have an expert step in at the exact moment of failure, record the correction, and feed it back into training.²⁷

Task success went from 50–70% to over 90%. Their robot pulled espresso drinks for 13 hours straight. Folded laundry items it had never seen before for two hours without stopping. Assembled boxes for actual factory packaging. All of it driven by recovery data.²⁷

"The DAgger cycle is an art. It requires many hours of very smart and discerning human time — hustle and taste, not just hours."

— Kyle Vedder, Physical Intelligence (December 2025)²²

Vedder coined a term for the person who makes this loop work: the "data sommelier." Someone who watches the robot fail, understands why, curates the right recovery demo, tags the failure mode, and decides what the model needs to see next. That person, in tandem with the hardware and the environment, is the real bottleneck. Most training datasets don't have one. They're full of clean, successful runs. Models trained on that kind of data crumble when something goes wrong.²⁸

"A dataset of one million noisy demonstrations is not an asset — it is a liability."²⁹ Quality has to be designed into the collection process from day one.

06 — WHY ETHIOPIA. WHY NOW.

Why Ethiopia. Why Now.

Geography

Ethiopia is the wedge because it delivers more environment diversity per dollar, not because it is a cheap-labor story.

Assembly, packaging, warehousing, food processing, and light manufacturing sit in one reachable network.
Capital goes further when the goal is many distinct environments, not one flagship site.
Geographic diversity itself increases dataset value and reduces western-environment bias.

More diversity per dollar50+ labs deploying now

The ICLR 2025 data scaling laws proved what practitioners already suspected: policy generalization follows a power law with the number of distinct environments in the training set — not the number of demonstrations.³⁹ The question isn't "who can collect the most data." It's who can build the most diverse network of deployment sites, the fastest, on a given amount of capital.

Ethiopia delivers more diversity per dollar than anywhere else on earth.

Assembly lines, packaging plants, warehouses, food processing, agriculture, light manufacturing concentrated in a single geography. Running continuous deployment operations costs real money — site contracts, on-site engineers, safety protocols, task resets. In Ethiopia, the same $500K that buys you one or two sites in the US covers a network of diverse factories with longer operational runs. Ethiopia applies capital efficiency to the exact variable that matters most: environment diversity. The perfect environment for collecting deployment data at internet scale.

Lastly, because of the delta in cost of living, we can sustainably pay workplaces to host the robots in a way tat aligns incentives more than anywhere else in the world.

The timing is now

Over 50 funded robotics companies are transitioning from lab research to real-world deployment simultaneously. Figure, 1X, Apptronik, Sunday, and dozens more are all looking for places to put their robots — and all discovering that deployment infrastructure doesn't exist. The window to build the shared layer is right now, before each lab locks into its own bespoke solution. Two years from now, the early contracts will be signed and the relationships will be set. The infrastructure provider that moves first sets the standard.

07 — ASHMYR: THE INFRASTRUCTURE LAYER THAT'S MISSING

Ashmyr: The Infrastructure Layer That's Missing

Ashmyr

Ashmyr is the shared rollout layer inside a broader robotics stack: pre-contracted sites, trained operators, recovery capture, and reusable field evaluation.

Labs send model plus task definitions, not a field team.
Ashmyr returns training-ready recovery trajectories generated in real factories.
The moat compounds as operations expertise, site diversity, and evaluation coverage accumulate together.

AWS for deploymentNetwork compounds

Every section of this paper points to the same gap. The field needs millions of deployment hours across diverse real-world environments, with skilled humans catching every failure and turning it into training data. No single lab can build that. Their job is to build better models. Someone else needs to build the deployment infrastructure.

That's Ashmyr.

"A lab sends its model and task definitions. Ashmyr deploys the robot at a pre-contracted factory from its network. The robot runs autonomously until it gets stuck. An Ashmyr engineer — a trained "data sommelier" — steps in, recovers it, and logs the correction with full sensor capture. That recovery trajectory goes back to the lab. They retrain their model, send a better version, and the cycle continues. Every hour of operation produces the exact data type that the research says matters most."

Think of it as Scale AI for robot deployment. Scale didn't invent data labeling but it made labeling a shared service. Every ML team went from "hire annotators and build tools" to "get a Scale API key." Scale AI is now worth $14B+ because the infrastructure layer turned out to be more durable than any individual model. Ashmyr is the same play for physical deployment data. Labs don't build their own data centers — they use shared infrastructure. Ashmyr provides the same thing for the physical world: a pre-contracted network of real workplaces where any lab can deploy robots, collect failure data, and iterate on their models without building field ops from scratch.

The flywheel compounds:

More deployments → Better ops expertise → More diverse sites → Higher-value data → More lab customers

Ashmyr starts in Ethiopia, where the founder's existing factory network provides immediate access to diverse industrial environments at a cost structure that makes the unit economics work from day one. The first deployment sites are already contracted. The plan is to prove the model with one design partner, then open the network to every lab that needs real-world deployment data.

The robotics breakthrough probably won't look like the iPhone — one product that changes everything. It'll look more like AWS: shared infrastructure that makes the whole ecosystem possible.²³

Ashmyr is building it now.

References

Sources and notes

1

Crunchbase News. "Robotics AI Funding Hits New High." 2025. Robotics startups raised $13.8B in 2025; Q1 2026 saw $2.26B with 70%+ to warehouse/industrial.

Back 1 Back 2 Back 3
2

NewMarketPitch. "Humanoid Robotics Funding Trends." 2025. Humanoid-specific funding grew from $239M (2022) to $3.7B (2025); average round size rose from $49M to $144M.

Back 1
3

Goldman Sachs. "The Global Market for Robots Could Reach $38 Billion by 2035." Goldman Sachs Research, 2025. Up 6x from prior $6B estimate. Analyst: Jacqueline Du.

Back 1
5

Jensen Huang, CEO of NVIDIA. VivaTech 2025 and CES 2026. "One billion humanoid robots by 2035." "Humanoid Robotics is going to potentially be one of the largest industries ever."

Back 1
6

Figure AI. Series C announcement, September 2025. \~$1.9B total raised, $39B valuation. Investors include NVIDIA, Intel Capital, Microsoft, OpenAI, Bezos Expeditions.
7

BusinessWire. "Skild AI Raises $1.4B, Now Valued Over $14B." January 2026. Led by SoftBank. Investors include NVentures, Bezos, Samsung, LG, Sequoia, Coatue.
8

Bloomberg / The Robot Report. Physical Intelligence raises $600M Series B, November 2025. \~$1B total raised, $5.6B valuation. Led by CapitalG, with Lux Capital, Thrive Capital, Jeff Bezos, Index Ventures, T. Rowe Price.
9

Apptronik. Series A extension announcement, February 2026. $935M+ total raised, $5.5B valuation. Investors include Google, Mercedes-Benz, AT&T Ventures, John Deere, Qatar Investment Authority.
10

TechCrunch. "1X Struck a Deal to Send Its Home Humanoids to Factories and Warehouses." December 2025. Up to 10,000 NEO robots across EQT's 300+ portfolio companies, 2026–2030. Also: TechStartups, September 2025 — 1X seeking $1B at $10B valuation.
11

TechCrunch. "A Key DeepMind Robotics Researcher Left Google and NVIDIA Has Already Backed His Stealth Startup." March 2025. Generalist AI raised $140M seed. Investors: Bezos Expeditions, NVentures.
12

Scale AI. "Physical AI Data Platform." 2025. scale.com/physical-ai. 100,000+ production hours; customers include Physical Intelligence and Generalist AI.

Back 1
13

Y Combinator. Cortex AI company page (F25 batch). Building real-workplace robot + egocentric data marketplace with human-in-the-loop rollouts and recovery collection.
14

Y Combinator. Asimov company page (2026). Internet-scale marketplace for robot training data; 5,000+ contributors across households, restaurants, hotels, factories.

Back 1
15

Black, K., Brown, N., Driess, D., Finn, C., Hausman, K., Levine, S., et al. "pi0: A Vision-Language-Action Flow Model for General Robot Control." arXiv + Physical Intelligence Blog, October 31, 2024. 10,000+ hours, 903M timesteps, 7 platforms, 68 tasks.

Back 1
16

Generalist AI. "GEN-0: Embodied Foundation Models That Scale with Physical Interaction." Company blog, November 4, 2025. 270,000 hours of real-world data; scaling laws verified for real data, not simulation.

Back 1
17

Sohn, A., Nagabandi, A., Florensa, C., et al. "RFM-1: Robotics Foundation Models and the Role of Data." Covariant AI, March 11, 2024. 8B-parameter model; 7 years of warehouse data; 1,000 cycles/hour at 99%+ precision.

Back 1 Back 2
18

Mandlekar, J., et al. "GR00T N1: Open Foundation Model for Generalist Humanoid Robots." NVIDIA Research, March 2025. Formalizes three-tier data pyramid: internet video → synthetic → real proprioceptive (apex).
19

KAIST, Microsoft Research, UW, NVIDIA. "LAPA: Latent Action Pretraining from Videos." ICLR 2025. Outperforms SOTA VLA by 6.22% with 30x efficiency using internet video alone.

Back 1 Back 2
20

UC Berkeley, Stanford, CMU, Google DeepMind. "Octo: An Open-Source Generalist Robot Policy." RSS 2024. First fully open-source generalist policy; trained on 800K OXE trajectories; fine-tunable on consumer GPUs.

Back 1
21

Monas, J. & Jang, E. "World Model Degradation from Real Deployment." 1X Technologies blog, September 17, 2024. Simulation-trained model degraded over 50 days in homes/offices.

Back 1 Back 2
22

Vedder, K. "State of Robot Learning." vedder.io, December 2025. Teleop runs 10x slower than human speed; DAgger is "an art"; "data sommelier" is rarest resource.

Back 1 Back 2
23

Hsu, O. "The Physical AI Deployment Gap." a16z (Andreessen Horowitz), 2024. 95% lab → 60% deployment; 99.9%+ commercial threshold. "The robotics breakthrough may look more like Android than the iPhone."

Back 1 Back 2
25

Khazatsky, A., Pertsch, K., Nair, S., et al. "DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset." RSS 2024. 76,000 demos, 564 scenes, 52 buildings, 3 continents. 20% SOTA improvement from scene diversity.

Back 1
27

Levine, S., Finn, C., Ichter, B., Hausman, K., Pertsch, K., et al. "pi0.6: Learning from Deployment Failures via RECAP." arXiv + Physical Intelligence blog, November 18, 2025. 50–70% → 90%+ success; 13 hours espresso; 2+ hours novel laundry.

Back 1 Back 2 Back 3
28

SentientX. "Egocentric Data Bias in Robot Training." January 2025. Most datasets biased toward clean trajectories; models fail disproportionately on recovery.

Back 1
29

Aveva Robotics. "When 'More Data' Isn't Enough." Substack, 2024. Quality, accuracy, relevance, and consistency define useful training data.

Back 1
30

Figure AI. "Production at BMW." figure.ai/news, 2025. Figure 02 at Spartanburg: 30,000+ X3s, 1,250 hours, 1.2M steps, 90,000+ components.

Back 1
31

The AI Insider. "Sunday Emerges from Stealth with $35M for Household Robot Called Memo." November 20, 2025. 50-household beta planned for late 2026.
32

Xiao, T. "What's Missing for Robot Foundation Models?" Action-driven Intelligence (Substack), December 19, 2024. Scaling laws 6/10, context bandwidth 6/10, evaluation infrastructure 3/10.
33

Jasti, S. "On Evals at Scale." October 2025. Model-level vs. system-level evaluation problem; fuzzy metrics; "evals are one of the hardest things for labs to scale."
34

Salesforce Ventures / IFR. Market data: Industrial robotics ($34B), Service robotics ($42B), Consumer robotics ($14B). Total \~$90B; capex estimated at \~60% of total market value.

Back 1
35

Kim, M.J., et al. "OpenVLA: An Open-Source Vision-Language-Action Model." CoRL 2024. 7B open-source model outperforms RT-2 (55B closed) by 16.5% on standard manipulation benchmarks.
36

Fan, J. (NVIDIA Senior Research Director). "Robotics is NLP in 2018." Public remarks, 2025.
37

Adcock, B. (Figure AI CEO). Interview, New Atlas, 2024. "We have a data problem and a robot problem."

Back 1
38

Hausman, K. & Springenberg, T. "Training General Robots for Any Task." Sequoia Training Data Podcast, 2026. "There is no data of robots actually operating in the real world."

Back 1
39

Lin, F., et al. "Data Scaling Laws in Imitation Learning for Robotic Manipulation." ICLR 2025 Oral. Best Paper at CoRL 2024 Cross-Embodiment Workshop. Power-law relationship with environment count, not trajectory count. "Diversity of environments and objects is far more important than the absolute number of demonstrations."

Back 1 Back 2 Back 3
40

Samuel, A. "The Deployment Data Gap: A Systematic Review." Ashmyr, March 2026. 443-source mixed-methods review; H1 (deployment data) supported by 77% of corpus; 93% of practitioner evidence vs. 32% of academic literature identifies deployment-specific data as the binding constraint.

Back 1
41

"Robot Utility Models." 2025. 90% zero-shot success via simulation transfer.

Back 1
42

Levine, S. "How Real-World Cross-Embodiment Data Will Lead to Robotic Foundation Models." Invited talk, CoRL 2024. Cross-embodiment models trained on deployment data outperform lab-specific models by 50%.

Back 1
43

"Humanoid Robots: From Demos to Deployment." Humanoids Daily, 2025. Most humanoids operate approximately 2 hours before requiring intervention; current demos mask operational constraints.

Back 1
44

Ren, A., et al. "Deployment-Time Reliability for Learned Robot Policies." Stanford University, 2026. Identifies open-ended variability, distribution shift, and compounding errors as degradation mechanisms.

Back 1
45

Kim, J., et al. "Robot-Powered Data Flywheels." Stanford University, 2025. Deploy-collect-improve-redeploy cycle; VLM improved from 32% to 71.8% through iterative deployment.

Back 1
46

Hoque, R., et al. "What Matters in DAgger?" 2025. Human corrections during real-world operation drive policy improvement more effectively than any other data type.

Back 1
47

Humanoids Daily. "Figure Retires the F.02: Battle-Scarred Robots, 30,000 BMWs, and Hard Lessons Learned." 2025. Forearm identified as #1 failure point due to tight packaging, dexterity requirements, and thermal constraints, driving F.03 wrist re-architecture.
48

Agility Robotics. Deployment reports, 2025. Digit moved 100,000+ totes; real-world navigation in tightly constrained spaces required complete navigation stack redesign.
49

Chen, P. (formerly Covariant). Interview, 2024. "Physical world grounding requires deployment data; factory-level deployment is the unlock."

Back 1
50

Hsu, O. a16z, 2025. "The robotics equivalent of DevOps practices doesn't exist yet."
51

Garg, A. (Georgia Tech, formerly NVIDIA). X/Twitter, 2026. "This is a familiar and recurring cycle seen every 5–7 years. 90% success = a breakthrough in research, but 90% success = a liability in production."

Cite this paper

Samuel, A. "The Deployment Gap: Why Real-World Failure Data Is the Binding Constraint in Commercial Robotics." Ashmyr Research, March 2026. ashmyr.com/research

abenezer@ashmyr.com Schedule time