A deep dive into the talk by Billy Hickman and Lilia Abaibourova at MCP Dev Summit North America 2026
Picture this: you get home after a long day, you want to watch one show, but the app presents you with hundreds of options, trailers, recommendations, and scroll. Twenty minutes later, you still haven’t picked anything.
“We created the same problem for AI agents,” Billy said.
Billy is a senior software engineer at Prime Video, working on core foundational services. He presented alongside Lilia Abaibourova, a principal product manager leading AI-native enablement across Prime Video. Their talk was a working solution to a problem they’d already hit in production and solved.
More Tools, Less Signal
When Prime Video’s engineering teams started building with agents, they followed a pattern that made obvious sense. Give the agent access to the same tools the engineers have. Stand up MCP servers. Let the agent figure out what to call.
As more teams adopted this approach and contributed to a shared MCP server, the tool count grew fast. With hundreds of tools loaded into a single context window, things started to break down:
- Context hits the limit from tool bloat
- Tool selection becomes unreliable
- Performance degrades and hallucinations increase
From Lilia Abaibourova and Billy Hickman’s talk at MCP Dev Summit 2026.
More tools ≠ better agents
The core issue is mechanical. An agent loaded with 100 tools has to parse and reason over every one of them, even when it only needs three or four. That overhead compounds, context fills up and the model’s ability to pick the right tool drops off.
Prime Video needed a way to give agents access to the right tools for the problem at hand, without pre-loading everything else.
Find Tools: One Tool to Rule the Rest
Their solution works from a simple premise: at the start of every session, the agent knows about exactly one tool. That tool is called find_tools.
find_tools does two things. It describes the problem categories available in the MCP server (things like “operations,” “training,” or “results”) and tells the agent how to ask for more. The agent calls it with a problem category, and the server responds with the relevant toolset for that space.
What makes this work at the protocol level is a feature already in the MCP spec. When the server sends a notifications/tools/list_changed notification, a compliant client calls back for an updated tool list. Prime Video’s server uses this to push tools into the agent’s context mid-session, after the agent has identified the problem category it’s working in.
The server tracks session state using the Mcp-Session-Id header introduced with streamable HTTP transport, so it knows which agent is operating in which problem space. When the agent moves to a new category, the old tools are unloaded and the new ones come in.
The flow in practice: agent calls find_tools, server responds with a notification, and the agent fetches an updated, scoped tool list.
The context window at any moment reflects only what’s relevant to the current task.
The Demo: A Marathon Agent
To show the pattern without exposing Prime Video’s internal tooling, The team built a running agent and MCP server for his London Marathon training. The sequence made the mechanism concrete:
- Session starts with one tool visible: find_tools
- Agent is asked for the fastest finish time from last year’s London Marathon
- Agent calls find_tools, requests results tooling, and receives a new get_marathon_times tool
- Agent runs the tool, gets the data — 4 hours, 28 minutes average — and answers
- Next prompt asks for a training plan to match that finish time
- Agent calls find_tools again, this time requesting training tools
- Results tools disappear; create_training_plan appears in their place
- Agent builds the plan using a tool that hadn’t existed in its context 30 seconds earlier
By the end of the demo, the agent had cycled through two distinct toolsets across a single session, loading and unloading based on what each task actually required.
What This Approach Gets Right
The benefits were specific. Only tools relevant to the current problem space appear in context, which helps reduce noise and keeps the agent focused. Teams contributing new tools to a shared server do not impose a cost on agents running unrelated tasks, so scalability improves without cluttering every workflow. Tools can also be added and removed mid-session, allowing an agent that exhausts one category to discover another without needing to restart the session.
The approach works across both remote MCP servers using HTTP streaming and SSE, as well as local stdio MCP implementations.
For centralized MCP servers shared across teams and organizational boundaries, that second point matters a lot. Without progressive discovery, every tool added to the shared catalog costs something for every agent running against it, relevant or not.
Tradeoffs Worth Knowing
Billy and Lilia were direct about where the approach has limits.
The extra round trip adds latency. Every discovery call costs time, and for tools an agent will reliably need in every session, loading them upfront probably makes more sense. The deterministic mapping from tool to category also needs governance. If tool definitions overlap or categories multiply, agents have trouble picking the right one. Progressive discovery changes where the human management problem lives, it doesn’t make it go away.
There’s also the compliance question. Prime Video built and controls their own internal clients, so they could ensure full spec compliance. Not every client in the wild implements tools/list_changed notifications correctly, which is a real constraint for anyone trying to deploy this pattern against external or third-party clients.
The approach depends on the Mcp-Session-Id header, which was under active discussion for potential removal from the spec at the time of the talk. They said the team is watching that closely.
The team’s next step is making discovery more dynamic. Rather than choosing from a fixed list of categories, an agent would describe the problem it’s trying to solve and receive a relevant toolset from a natural language search. For now, the deterministic approach has worked well enough to build on.
Billy Hickman is a Senior Software Engineer and Lilia Abaibourova is a Principal Product Manager at Prime Video. The Agentic AI Foundation is the home of open agentic standards and open source infrastructure. To learn more about MCP and connect with engineers thinking through these problems, visit aaif.io, join the conversation in the AAIF Discord, or join us at an upcoming AAIF event.