Building in the Open

Building in the Open #8: Why General AI Agents Are Failing in Production

By Banu 5 min read

95% of enterprise GenAI pilots are producing zero P&L impact. The era of the general-purpose 'do-everything' agent is ending — the future belongs to domain-specific systems and multi-agent orchestration.

If a large language model can pass the bar exam, it should be able to handle a catering quote or an electrical contractor's invoice with minimal prompting, right?

That is the assumption most companies made in 2024. And it is exactly why 95% of enterprise generative AI pilots are currently producing zero measurable P&L impact.

The industry is waking up to a hard truth: demo fluency is not operational reliability. The era of the general-purpose, "do-everything" AI agent is ending. The future belongs to domain-specific systems and multi-agent orchestration.

Here is why the generalist approach is breaking down in production — and what it means for the future of service operations.

## The Smarter Model Paradox

When an AI agent fails in a high-stakes workflow, the failure is rarely a visible, absurd hallucination.

The real danger is what engineers call "context debt" — the gap between what the agent thinks your data means and what your business actually means. When a general agent lacks the specific domain vocabulary, workflow rules, and historical context of your business, it doesn't just stop working. It extrapolates.

This creates a dangerous paradox: smarter models make the context problem worse, not better.

A weak model operating on incomplete context produces obvious errors that are easy to catch. A strong model operating on that same incomplete context produces outputs that are coherent, well-reasoned, and operationally wrong. It will confidently approve a refund that violates a regional policy, or misinterpret a line item on an ERP, because those specific constraints were not wired into its architecture.

## The Cashier Problem: Why Point Solutions Paralyze Operators

To solve this, many software vendors are rushing to build specialized "micro-agents" — one for drafting emails, one for reading PDFs, one for logging CRM notes.

But for the operator running a complex service business, this creates a new nightmare.

Imagine you are at the grocery store checkout. The cashier is scanning items at lightning speed and throwing them at you. But you are still struggling to open the plastic bag. The cashier is highly efficient at their specific task, but the overall system is broken, and you are paralyzed by the volume.

That is what it feels like to buy 15 different AI point-solutions. You don't need faster cashiers throwing more data at you. You need a system that bags the groceries. You need orchestration.

## The Orchestration Frontier

The biggest players in AI have realized that single agents are not enough. This week, Anthropic launched Claude Managed Agents, introducing "multi-agent orchestration" where a lead agent breaks a job into pieces and delegates it to specialist sub-agents working in parallel. They even introduced a "dreaming" feature that allows agents to review past sessions and self-improve overnight.

Simultaneously, IBM released a massive study of 2,000 C-suite executives, revealing that 60% of organizations are planning to adopt structures where AI agents coordinate workflows across finance, supply chain, and operations.

The consensus is clear: the workflow — not the department — is the new primary unit of value creation.

To actually impact margins, an agentic system must possess three structural attributes:

- **Orchestration** — unified intelligence working in harmony across the full workflow
- **Organization** — connecting data and creating structure where there was none
- **Domain grounding** — operating from the canonical model of the specific business it serves

## Stop Managing Data. Start Managing Margins.

At Toozy.ai, we saw this failure mode coming a year ago. That is why we didn't build a general chatbot.

We started in catering — one of the most complex, high-stakes, zero-margin-for-error service environments that exists. We built a canonical data model that understands the deep, specific reality of custom orders, fragmented workflows, and heavy customer coordination.

We didn't just build an agent that can talk. We built an intelligence layer that can orchestrate operations.

The market is shifting. Anthropic just formed a new enterprise AI services company with Blackstone and Goldman Sachs specifically to build custom, domain-grounded Claude deployments for mid-sized companies. The frontier of AI is no longer about building a smarter brain; it is about wiring that brain into the specific operational reality of a business.

Platforms like Claude are building the orchestration engine. Companies like Toozy are building the domain intelligence that makes that engine actually useful for operators who don't have engineering teams.

If your business runs on complex estimates, custom workflows, and a team that needs to stay coordinated under pressure, stop buying general AI tools that just create text. Start investing in systems that can actually operate.

## References

- MIT NANDA Initiative (2025). The GenAI Divide: State of AI in Business 2025.
- Atlan (2026). Why AI Agents Fail in Production: 5 Root Causes.
- Anthropic (2026). New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration.
- IBM Institute for Business Value (2026). Agentic AI workflows and enterprise operations.
- Anthropic (2026). Building a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs.

Learn more about AI catering software from Toozy.ai, or book a free demo to see how it works.

Ready to automate your catering business?

Stop missing orders. Start growing.

See how Toozy.ai's 5 AI agents handle calls, quotes, follow-ups, and payments — 24/7.

Book a Free Demo → View Pricing

More from the Blog

Building in the Open

Building in the Open #6: Whether It's a $10K Job or a $10M Project, You Can Still Lose Money

Whether the job is $10K or $10M, the pattern is identical: margin leaks between estimate, execution,…

Read article →
Building in the Open

Building in the Open #5: Creation is Easy. Maintenance is the Product.

Vibe coding gets you to 80% fast. But real catering businesses operate in the messy 20% — and that's…

Read article →
Industry Guide

Best Catering Software for 2026: CaterZen vs Curate vs Toozy.ai (Complete Comparison)

Comprehensive comparison of the top 5 catering software platforms in 2026 — CaterZen, Curate, Total …

Read article →
Home Solutions Pricing Compare Book a Demo All Posts Building in the Open #6: Whether It's a …Building in the Open #5: Creation is Eas…Best Catering Software for 2026: CaterZe…