MCP Code Mode: Breaking the Tool-Call Ceiling

Every team building agentic systems on top of LLMs hits the same wall. It just takes a while to recognize it for what it is.

You start with a handful of MCP tools (fetchRecord, updateRecord, executeLogic) and the early demos look great. The agent chains a few calls together, returns a reasonable answer, and everyone gets excited about the possibilities. Then the questions get harder. The record sets get larger. Someone asks the agent to do something you didn’t anticipate, and the whole thing falls apart.

We ran into this on the Nextworld platform team, hitting what’s essentially a tool-call ceiling. Pre-defined tool calls stop scaling, not because the model isn’t smart enough, but because the architecture is working against it.

Here's how we rearchitected our MCP Server to eliminate it, and built what we’re calling Dynamic Agents in the process.

The problem with tool chaining

Nextworld’s MCP Server originally exposed a set of hard-coded tools: a fixed surface area that MCP clients and agents could call into. Fetch a record. Update a field. Execute a logic block. Each tool did one thing, and the agent’s job was to string them together to answer a question.

This approach has two failure modes that compound on each other.

Context window saturation. Every tool call returns data that lands in the LLM’s context window. A simple query (get metadata, build a filter, fetch records, drill into details) might take four or five calls, each pushing intermediate results into context. On longer conversations, the window fills up fast. The model starts losing its train of thought. Responses degrade. In business-critical workflows, “degrade” is a generous word for what happens. The agent begins hallucinating data that looks plausible but is subtly wrong.

This is a more fundamental limitation than most people appreciate. Paste an essay into an LLM and ask it to repeat the text back verbatim. It can't. It will paraphrase, drop a clause, change a number. That kind of drift is harmless when you're brainstorming, but it's catastrophic when the output is a financial report or an inventory count. An LLM cannot be your data pipeline. It can only be the operator turning the valves and choosing where the data goes.

An unsustainable dependency on predefined logic. When a user asks a question you didn’t anticipate (and they always do), the agent simply can’t answer it. The only path forward is for an application developer to go write custom logic for that specific use case. We found ourselves in a cycle where developers and product owners were trying to anticipate every query an end user might attempt and pre-build logic to handle it. That’s not scalable. It’s barely maintainable. And it means the agent’s capability is permanently bounded by what someone thought to build ahead of time.

These two problems feed each other. More predefined logic means more data flowing through tool calls, which means more context saturation, which means the agent needs even more hand-holding to stay on track. It’s a losing game.

The insight

“What if the agent could build its own tools at runtime?”

The turning point was recognizing that we were solving the wrong problem. We kept asking, “What tools do we need to add?” The better question was, “What if the agent could build its own tools at runtime?” That shift pointed us toward Dynamic Agents: ones that compose their own capabilities on the fly rather than working from a fixed toolbox.

We didn’t need a bigger library of pre-defined operations. We needed to drop down a level of abstraction and give the agent access to the building blocks of the platform itself: the same SDKs that our own agentic logic block framework relies on under the hood.

That’s Code Mode.

What Code Mode actually is

Code Mode is a capability of Nextworld’s MCP Server that gives the LLM the ability to write and execute JavaScript at runtime, inside an isolated sandbox on our servers. Instead of chaining together sequential tool calls and reasoning over the results in context, the agent writes a script upfront, executes it server-side, and returns a precise answer.

Where our MCP Server used to expose a growing catalog of purpose-built tools, with Code Mode the entire surface area collapses down to three:

List Libraries returns a lightweight catalog of all available agentic libraries, with just enough information for the LLM to determine which ones are relevant to its task.

Describe Libraries returns full documentation for a requested set of libraries, giving the LLM everything it needs to understand how to use them: inputs, outputs, and error handling.

Execute Code runs the LLM-generated JavaScript in a secure, isolated sandbox on the server, with support for database interactions and structured output capture.

That’s it. The agent discovers what’s available, learns how to use it, and writes code to accomplish the task. No pre-built tool for every operation. No hand-crafted sequencing logic. Three tools and a runtime.

Security by inheritance

The sandbox has access to a curated set of Nextworld SDKs: data access, logic invocation, workflow orchestration, and more. These are the same backend APIs and SDKs used throughout the platform, which means Code Mode inherits the platform’s full security model automatically. All code executes as the authenticated user, with the same role-based access controls, row-level security, and audit logging that govern every other operation on the platform. There is no separate security layer to build or maintain. And because we control both the sandbox and the available SDKs, this isn't an open runtime. It's a deliberately scoped environment where the agent can compose platform primitives into whatever logic a query demands, but nothing more.

What changes

The shift is structural, not incremental. A few things that were hard problems before Code Mode become non-issues after it.

Data stays on the server. When the agent writes a script that fetches, filters, and aggregates records, all of that happens in the sandbox. Intermediate results never enter the context window. This means the agent can operate over large record sets (thousands of records, complex joins, multi-step aggregations) without the model losing coherence. The context window constraint that previously imposed a hard ceiling on query complexity is simply gone.

Hallucination risk drops for data operations. When sums, counts, groupings, and filters run as executed code rather than LLM reasoning, the model isn't estimating or inferring results. It's reporting them. The class of errors where the agent fabricates records, miscounts, or produces subtly wrong numerical answers is addressed at the architecture level, not the prompt level. The LLM is no longer acting as the physical data pipeline. It's just building it.

Arbitrary query complexity becomes possible. Users can now ask questions that would have been impractical to express through a pre-defined tool surface: find customers whose names contain only vowels, identify vendors on credit hold with open purchase orders older than 30 days, surface records matching a pattern that no one anticipated when the application was designed. Because the agent writes the logic at runtime, there is no ceiling on query complexity imposed by what was pre-built.

Why this matters beyond our platform

The tool-call ceiling isn’t a Nextworld-specific problem. It’s a structural limitation of the tool-chaining pattern that nearly every MCP server implementation relies on today. Any team building agentic systems over a non-trivial data model will eventually hit it.

The insight behind Code Mode is that for enterprise data operations, the right interface between an LLM and a platform isn’t a fixed set of high-level tools. It’s a programmable layer with access to well-designed, well-documented building blocks. You don’t try to enumerate every possible operation ahead of time. You give the agent the primitives and let it compose.

This also means the solution improves on its own. Because the agent’s capability is a function of the underlying model’s ability to write correct code, every improvement in LLM code generation translates directly into a more capable agent, without any changes to the platform, the tools, or the prompt. That’s a fundamentally different scaling curve than hand-crafting tools and skills for every new use case.

What we shipped

Code Mode is live across the platform. Every existing Nextworld MCP Server was automatically converted to use it, with no action required from customers. End users experience no change to their interface, just an agent that handles harder questions, works over larger data sets, and produces more accurate answers.

For application developers, the change is even more significant. Building an agent no longer means anticipating every possible query and pre-building logic to handle it. It means designing a good data model, exposing the right platform building blocks, and trusting that the agent can compose them at runtime. That’s a fundamentally lighter development model, and one that only gets better as the models improve.

We're betting Dynamic Agents are where the industry is headed: agents with a governed, sandboxed runtime and access to platform primitives, rather than a fixed library of pre-built tools. We’ll be writing more about what we’re learning as we push it further.

Dynamic Agents: Addressing the Tool-Call Ceiling with MCP Code Mode