Portia AI's blog | Portia AI's blog

Building agents with Controlled Autonomy using our new PlanBuilder interface

September 10, 2025 · 9 min read

Robbie Heywood

AI Engineer

Balancing autonomy and reliability is a key challenge faced by teams building agents (and getting it right is notoriously difficult! (↗)). At Portia, we’ve built many production-ready agents with our design partners and today we’re excited to share our solution: Controlled Autonomy. Controlled autonomy is the ability to control the level of autonomy of an agent at each step of an agentic plan. We implement this using our newly reshaped PlanBuilder interface to build agentic systems, and today we’re excited to be releasing it into our open-source SDK. We believe it’s a simple, elegant interface (without the boilerplate of many agentic frameworks) that is the best way to create powerful and reliable agentic systems - we can’t wait to see what you build with it!

Getting Involved

If you’re building agents, we’d love to hear from you! Check out our open-source SDK and let us know what you’re building on Discord. We also love to see people getting involved with contributions in the repo - if you’d like to get started with this, check out our open issues and let us know if you’d like to take one on.

Straight into an example

Our PlanBuilder interface is designed to feel intuitive and we find agents built with it are easy to follow, so let’s dive straight into an example:

from portia import PlanBuilderV2, StepOutput

plan = (
    PlanBuilderV2("Run this plan to process a refund request.")
    .input(name="refund_info", description="Info of the customer refund request")
    .invoke_tool_step(
        step_name="read_refund_policy",
        tool="file_reader_tool",
        args={"filename": "./refund_policy.txt"},
    )
    .single_tool_agent_step(
        step_name="read_refund_request",
        task=f"Find the refund request email from {Input('customer_email_address')}",
        tool="portia:google:gmail:search_email",
    )
    .llm_step(
        step_name="llm_refund_review",
        task="Review the refund request against the refund policy. "
             "Decide if the refund should be approved or rejected. "
             "Return the decision in the format: 'APPROVED' or 'REJECTED'.",
        inputs=[StepOutput("read_refund_policy"), StepOutput("read_refund_request")],
        output_schema=RefundDecision,
    )
    .function_step(
        function=record_refund_decision,
        args={"refund_decision": StepOutput("llm_refund_review")})
    .react_agent_step(
        task="Find the payment that the customer would like refunded.",
        tools=["portia:mcp:mcp.stripe.com:list_customers", "portia:mcp:mcp.stripe.com:list_payment_intents"],
        inputs=[StepOutput("read_refund_request")],
    )
    # Full example includes more steps to actually process the refund etc.
    .build()
)

The above is a modified extract from our Stripe refund agent (full example here (↗)), setting up an agent that acts as follows:

Read in our company’s refund policy: this uses a simple invoke_tool_step, which means that the tool is directly invoked with the args specified with no LLM involvement. These steps are great when you need to use a tool (often to retrieve data) but don’t need the flexibility of an LLM to call the tool because the args you want to use are fixed (this generally makes them very fast too!).
Read in the refund request from an email: for this step, we want to flexibly find the email in the inbox based on the refund info that is passed into the agent. To do this, we use a single_tool_agent, which is an LLM that calls a single tool once in order to achieve its task. In this case, the agent creates the inbox search query based on the refund info passed in to find the refund email.
Judge the refund request against the refund policy: the llm_step is relatively self-explanatory here - it uses your configured LLM to judge whether we should provide the refund based on the request and the policy. We use the StepOutput object to feed in the results from the previous steps, and the output_schema field allows us to return the decision as a pydantic (↗) object rather than as text.
Record the refund decision: we have a python function we use to record the decisions made - we can call this easily with a function_step which allows directly calling python functions as part of the plan run.
Find the payment in Stripe: finding a payment in Stripe requires using several tools from Stripe’s remote MCP server (which is easily enabled in your Portia account (↗)). Therefore, we set up a ReAct (↗) agent with the required tools and it can intelligently chain the required Stripe tools together in order to find the payment. As a bonus, Portia uses MCP Auth by default so these tool calls will be fully authenticated.

Controlled Autonomy

As demonstrated in the above example, the power of PlanBuilderV2 comes from the fact you can easily connect and combine different step types, depending on your situation and requirements. This allows you to control the amount of autonomy your system has at each point in its execution, with some steps (e.g. react_agent_step) making use of language models with high autonomy while others are carefully controlled and constrained (e.g. invoke_tool_step).

PlanBuilderV2 steps by autonomy

From our experience, it is this ‘controlled autonomy’ that is the key to getting agents to execute reliably, which allows us to move from exciting prototypes into real, production agents. Often, prototypes are built with ‘full autonomy’, giving something like a ReAct agent access to all tools and letting it loose on a task. This approach is possible with our plan builder and can work well in some situations, but in other situations (particularly for more complex tasks) it can lead to agents that are unreliable. We’ve found that tasks often need to be broken down and structured into manageable sub-tasks, with the autonomy for each sub-task controlled, for them to be done reliably. For example, we often see research and retrieval steps in a system being done with high autonomy ReAct agent steps because they generally use read-only tools that don’t affect other systems. Then, when it comes to the agent taking actions, these steps are done with zero or low autonomy so they can be done in a more controlled manner.

Simple Control structures

Extending the above example, our PlanBuilderV2 also provides familiar control structures that you can use when breaking down tasks for your agentic system. This gives you full control to ensure that the task is approached in a reliable way:

# Conditional steps (if, else if, else)
.if_(condition=lambda review: review.decision == REJECTED,
    args={"llm_review_decision": StepOutput("llm_refund_review")})
.function_step(
    function=handle_rejected_refund,
    args={"proposed_refund": StepOutput("proposed_refund")})
.endif()

# Loops - here we use .loop(over=...), but there are also alternatives for
#         .loop(while=...) and .loop(do_while=...)
.loop(over=StepOutput("Items"), step_name="Loop")
.function_step(
    function=lambda item: print(item),
    args={"item": StepOutput("Loop")})
.end_loop()

Fun fact

We went with .if_() rather than .if() (note the underscore) because if is a restricted keyword in python

Human - Agent interface

Another aspect that is vital towards getting an agent into production is the ability to seamlessly pass control between agents and humans. While we build trust in agentic systems, there are often key steps that require verification or input from humans. Our PlanBuilder interface allows both to be handled easily, using Portia’s clarification system (↗):

# Ensure a human approves any refunds our agent gives out
builder.user_verify(
    message=f"Are you happy to proceed with the following proposed refund: {StepOutput('proposed_refund')}?")

# Allow your end user to provide input into how the agent runs
builder.user_input(
    message="How would you like your refund?",
    options=["Return to purchase card", "gift card"],
)

Controlling your agent with code

The function_step demonstrated earlier is a key addition to PlanBuilderV2. In many agentic systems, all tool and function calls go through a language model, which can be slow and also can reduce reliability. With function_step, the function is called with the provided args at that point in the chain with full reliability. We’ve seen several use-case for this:

Guardrails: where deterministic, reliable code checks are used to verify agent behaviour (see example below)
Data manipulation: when you want to do a simple data transformation in order to link tools together, but you don’t want to pay the latency penalty of an extra LLM call to do the transformation, you can instead do the transformation in code.
Plug in existing functions: when you’ve already got the functionality you need in code, you can use a function_step to easily plug that into your agent.

# Add a guardrail to prevent our agent giving our large refunds
builder.function_step(
    step_name="reject_payments_above_limit",
    function=reject_payments_above_limit,
    args={"proposed_refund": StepOutput("proposed_refund"), "limit": Input("payment_limit")})

What’s next?

We’ve really enjoyed building agents with PlanBuilderV2 and are excited to share it more widely. We find that it complements our planning agent nicely: our planning agent can be used to dynamically create plans from natural language when that is needed for your use-case, while the plan builder can be used if you want to more carefully control the steps your agentic system takes with code.

We’ve also got more features coming up over the next few weeks that will continue to make the plan builder interface even more powerful:

Parallelism: run steps in parallel with .parallel().
Automatic caching: add cache=True to steps to automatically cache results - this is a game-changer when you want to iterate on later steps in a plan without having to fully re-run the plan.
Step error handler: specify .on_error() after a step to attach an error handler to it, .retry() to allow retries of steps or use exit_step() to gracefully exit a plan.
Linked plans: link plans together by referring to outputs from previous plan runs.

plan = (
    PlanBuilderV2("Run this plan to process a refund request.")
    # 1. Run subsequent steps in parallel
    .parallel()
    .invoke_tool_step(
        tool="file_reader_tool",
        args={"filename": "./refund_policy.txt"},
        # 2. Add automatic caching to a step
        cache=True
    )
    # 3. Add error handling to a step
    .on_error()
    .react_agent_step(
        # 4. Link plans together by referring to outputs from a previous run
        # Here, we could have a previous agent that determines which       customer refunds to process
        task=f"Read the refund request from my inbox from {PlanRunOutput(previous_run)}.",
        tools=["portia:google:gmail:search_email"],
    )
    # Resume series execution
    .series()
)

Open Source Shout Out

Shout out to gaurava05 for adding ExitStep as an open-source contribution in this PR.

So give our new PlanBuilder a try and let us know how you get on - we can’t wait to see what you build! 🚀

For more details on PlanBuilderV2, check out our docs (↗), our example plan (↗) or the full stripe refund example (↗). You can also join our Discord (↗) to hear future updates.

How Portia ensures reliable agents with evals and our in-house framework

June 4, 2025 · 8 min read

Tom Stuart

Backend Engineer

At Portia we spend a lot of time thinking about what it means to make agents reliable and production worthy. Lots of people find it easy to make agents for a proof of concept but much harder to get them into production. It takes a real focus on production readiness and a suite of features to do so (lots of which are available in our SDK as we’ve talked about in previous blog posts):

User Led Learning for reliable planning
Agent Memory for large data sets
Human in the loop clarifications to let agents raise questions back to humans
Separate planning and execution phases for constrained execution

But today we want to focus on the meta question of how we know that these features help improve the reliability of agents built on top of them by talking about evals.

Design Highlight: Handling data at scale with Portia multi-agent systems

May 22, 2025 · 11 min read

Robbie Heywood

AI Engineer

At Portia, we love building in public. Our agent framework is open-source (↗) and we want to involve our community in key design decisions. Recently, we’ve been focussing on improving how agents handle production data at scale in Portia. This has sparked some exciting design discussions that we wanted to share in this blog post. If you find these discussions interesting, we’d love you to be involved in future discussions! Just get in contact (details in block below) - we can’t wait to hear from you.

Calling All Devs

We’d love to hear from you on the design decisions we’re making 💪 Check out the [discussion thread (↗)][github-discussion] for this blog post to have your say. If you want to join our wider community too (or just fancy saying hi!), head on over to our discord (↗), our reddit community (↗), or our repo on GitHub (↗) (Give us a ⭐ while you’re there!).

Visualise your Obsidian notes with Qwen3

May 8, 2025 · 9 min read

Omar ElMohandes

Software Engineer

Mark Smith

Developer Relations

Many users with stringent security, privacy or latency requirements have told us they prefer to run their own LLM instances locally. We recently added support for interfacing with Ollama models running locally.

To explore how we might use a local LLM practically, we decided to build an app that could turn an Obsidian note into a concept map – a visual diagram that shows how different ideas in the note are related. As an early stage startup we've actually been building our internal apps on top of local LLMs to keep our costs low: we use the obsidian app in this post to visualise notes coming out of our weekly engineering design meetings!

A unified framework for browser and API authentication

May 1, 2025 · 5 min read

Emma Burrows

Co-founder and CTO

The core of the Portia authorization framework is the ability for an agent to pause itself to solicit a user's authorization for an action it wants to perform. With delegated OAuth, we do this by creating an OAuth link that the user clicks on to grant Portia a token that can be used for the API requests made by the agent. We generally like API based agents for reliability reasons – they're fast, predictable and the rise of MCP means integration is getting easier.

However, there are some actions which are not easily accessible by API (my supermarket doesn't have a delegated OAuth flow surprisingly!), and so, there is huge power in being able to switch seamlessly between browser based and API based tasks. The question was, how to do this consistently and securely with our authorization framework.

A deep dive into our “User Led Learning” feature

April 17, 2025 · 11 min read

Mounir Mouawad

Co-founder and CEO

Mark Smith

Developer Relations

At Portia, we believe building agents for production means balancing AI autonomy with human control – something we call the ‘spectrum of autonomy’. We have previously seen how clarifications can be used during plan runs to handle the human:agent interface. With our new User Led Learning feature, we’re bringing this level of feedback into the planning process as well. Developers now have a powerful way to shape the Planning agent’s behavior—without rewriting prompts or tweaking models. When you generate a plan using the Portia AI SDK, that plan can be stored in the Portia cloud where it can be highlighted as a preferred plan with a simple thumbs-up. Each “like” tells the Portia planning agent, this was a good plan for this type of user intent—and over time, those signals help planning agents make better decisions on their own. It’s a subtle but powerful shift along the spectrum of autonomy: agents become more capable and self-directed, while still staying grounded in what users actually want.

More features for your production agent … and a fundraising announcement

April 16, 2025 · 6 min read

Emma Burrows

Co-founder and CTO

Mounir Mouawad

Co-founder and CEO

We came out of stealth a few weeks ago. Since then we’ve been working with our first few design partners on developing their production agents and have been heads down building out our SDK to solve their problems. To equip us with enough runway to grow, we’ve also been lucky enough to raise £4.4 million from some of the best investors we could ever hope for: General Catalyst (lead), First Minute Capital, Stem AI and some outstanding angel investors 🚀

In this post we want to give you a sense of what’s coming over the next couple of months.

Agent-Agent interfaces and Google's new A2A protocol

April 14, 2025 · 9 min read

Robbie Heywood

AI Engineer

Sam Stephens

Backend Engineer

Mounir Mouawad

Co-founder and CEO

This week, Google announced (↗) their new Agent-to-Agent protocol, A2A, designed to standardise how AI agents collaborate, even when run by different organisations using different underlying models. Positioned as complementary to MCP – which standardises agent access to external tools – A2A aims to standardise direct agent-agent communication. Google even declared A2A ♥️ MCP (↗), highlighting their vision for synergy between these protocols.

At Portia, we’ve been thinking about how agents interact with external systems via tools and agents for some time. You may have even read our post two weeks ago, Software interfaces in the agent era (↗). We divided the topic of agent integration with external systems into five categories based on increasing complexity, and A2A sits firmly at the top, in the Agent-Agent interface level.

Beyond APIs: Software interfaces in the agent era

March 27, 2025 · 12 min read

Tom Stuart

Backend Engineer

Robbie Heywood

AI Engineer

Mounir Mouawad

Co-founder and CEO

For decades, APIs have been the standard for connecting software systems. Whether REST, gRPC, or GraphQL, APIs follow the same principle: well-structured interfaces that are defined ahead of time to expose data and functionality to third parties. But as AI Agents start taking on more autonomous operations this rigid model is limiting what they can do.

APIs work well when requirements are known in advance, but agents often lack full context at the start. They explore, iterate and adapt based on their goals and real-time learning. Relying solely on predefined API calls can restrict an agent’s ability to interact dynamically with software.

Like many in our industry, we have been dealing a lot with the challenges of agent to software interfaces. We think the future of these interfaces will move beyond static APIs toward more flexible, expressive, and adaptive mechanisms. More on our thinking below, we’d love to hear your thoughts!

Build a refund agent with Portia AI and Stripe's MCP server

March 20, 2025 · 9 min read

Sam Stephens

Backend Engineer

Mounir Mouawad

Co-founder and CEO

Anthropic open sourced its Model Context Protocol (↗), or MCP for short, at the end of last year. The protocol is picking up steam as the go-to way to standardise the interface between agent frameworks and apps / data sources, with the list of official MCP server implementations (↗) growing rapidly. Our early users have already asked for an easy way to expose tools from an MCP server to a Portia client so we just released support for MCP servers in our SDK ⭐️.

In this blog post we show how you can combine the power of Portia AI’s abstractions with any tool set from an MCP server to create unique agent workflows. The example we go over is accessible in our agent examples repository here (↗).

Straight into an example​

Controlled Autonomy​

Simple Control structures​

Human - Agent interface​

Controlling your agent with code​

What’s next?​

Straight into an example

Controlled Autonomy

Simple Control structures

Human - Agent interface

Controlling your agent with code

What’s next?