The Tool Abstraction Problem: Key Insights and Tips

Agentic AI Foundation

Sam Partee, Head of Engineering at Arcade.dev, speaking at MCP Dev Summit North America 2026

TLDR: Most MCP tools are designed for programmers. Agents are not programmers. Building tools around tasks and intents, then obsessing over description quality, produces dramatically better selection accuracy and fewer failures at runtime.

Before MCP existed, Sam Partee and his team at Arcade had already made over 10,000 tools.

They had their own protocol, the Open Execution Protocol, and were focused on how to take the JSON output of a language model, at a time when JSON output wasn’t even guaranteed, and reliably get the right function called with the right arguments. They called it the machine experience which is akin to user experience, but for the model.

The problem is what Sam calls the “tool abstraction problem”, and is what most MCP builders run straight into today.

The API generator trap

Auto-generating MCP tools from API endpoints is a common pattern which unfortunately is also the fastest path to a broken agent.

An API is designed for programmers. A programmer knows to call getUserId before getCalendarEvents. A programmer sequences calls and holds state. An agent does not do those things reliably, and the benchmarks are clear about what happens when you ask it to try.

Multiple benchmarks show that six or more sequential tool calls produce a failure rate of more than 50%. This clearly indicates that chaining is hard for agents.

Instead of better chaining, perhaps the answer is moving composition logic inside the tool itself.

A task like “find the customer who complained last week and schedule a follow up” might touch five API endpoints. As five separate tools, that’s a chaining problem with odds worse than a coin flip. As one tool oriented around that specific task, selection accuracy goes up and the deterministic parts stay deterministic.

Build tools around tasks, not capabilities

Agents build to-do lists. The Gherkin-style format that Anthropic introduced into the ecosystem is a useful mental model here: a to-do list entry reads as a task, not a capability. “Track this order and return a report” is a task. “Get order ID” is a capability. Tools named and scoped to match how agents decompose work get selected more reliably.

The Redis and Block/Square papers both show measurable accuracy gains from this approach. GitHub Copilot’s research points the same direction. Arcade’s own tool patterns library (built from years of production eval data) organizes patterns by domain and task type precisely because this framing works.

Using more than ~20 tools in a single agent context degrades selection. Dynamic tool selection and progressive discovery help, but don’t fully solve it. The more durable answer is keeping agent scope narrow enough that 20 tools is sufficient. If enumeration pushes past 40, the agent is probably doing too much and should be two agents.

Description quality is the 10x lever

Arcade’s eval data shows that across more than 20,000 tool evaluations, description quality has the largest effect on selection accuracy. Larger than the tool name and larger than context.

Tool schemas land near the bottom of the context window in most agent frameworks. Per research on how models attend to context, that position means the description is the most recently processed information before the model makes its selection decision. It’s functionally the last thing the model thinks about before choosing.

In Arcade’s nightly evals, optimizing descriptions alone (no changes to tool logic or names) produced roughly a 10x reduction in errors. A Hesh et al. paper demonstrates the same effect independently, showing that a model could complete a much longer activity chain purely through description changes, with no structural changes to the tools.

Practically, that means:

Keep descriptions under 600 words
Start with an action verb
Write the rest as a short, task-intent-formed sentence that tells the model what the tool does and when to reach for it

Write the description. Iterate on it. Treat it like code. The teams that have spent the most time on this problem have the same conclusion: the description matters more than almost anything else in tool design, and most builders write it once and forget it.

Sam Partee is Head of Engineering at Arcade.dev. The Agentic AI Foundation is the home of open agentic standards and open source infrastructure. To learn more about MCP and connect with engineers thinking through these problems, visit aaif.io, join the conversation in the AAIF Discord, or join us at an upcoming AAIF event.

The Tool Abstraction Problem: Lessons Learned Building 1000+ MCP Tools

The API generator trap

Build tools around tasks, not capabilities

Description quality is the 10x lever

Get agentic intelligence direct to your inbox