Skip to main content

How Portia ensures reliable agents with evals and our in-house framework

· 8 min read
Tom Stuart
Backend Engineer

At Portia we spend a lot of time thinking about what it means to make agents reliable and production worthy. Lots of people find it easy to make agents for a proof of concept but much harder to get them into production. It takes a real focus on production readiness and a suite of features to do so (lots of which are available in our SDK as we’ve talked about in previous blog posts):

  • User Led Learning for reliable planning
  • Agent Memory for large data sets
  • Human in the loop clarifications to let agents raise questions back to humans
  • Separate planning and execution phases for constrained execution

But today we want to focus on the meta question of how we know that these features help improve the reliability of agents built on top of them by talking about evals.

Design Highlight: Handling data at scale with Portia multi-agent systems

· 11 min read
Robbie Heywood
AI Engineer

At Portia, we love building in public. Our agent framework is open-source (↗) and we want to involve our community in key design decisions. Recently, we’ve been focussing on improving how agents handle production data at scale in Portia. This has sparked some exciting design discussions that we wanted to share in this blog post. If you find these discussions interesting, we’d love you to be involved in future discussions! Just get in contact (details in block below) - we can’t wait to hear from you.

Calling All Devs

We’d love to hear from you on the design decisions we’re making 💪 Check out the [discussion thread (↗)][github-discussion] for this blog post to have your say. If you want to join our wider community too (or just fancy saying hi!), head on over to our discord (↗), our reddit community (↗), or our repo on GitHub (↗) (Give us a ⭐ while you’re there!).

Visualise your Obsidian notes with Qwen3

· 9 min read
Omar ElMohandes
Software Engineer
Mark Smith
Developer Relations

Many users with stringent security, privacy or latency requirements have told us they prefer to run their own LLM instances locally. We recently added support for interfacing with Ollama models running locally.

To explore how we might use a local LLM practically, we decided to build an app that could turn an Obsidian note into a concept map – a visual diagram that shows how different ideas in the note are related. As an early stage startup we've actually been building our internal apps on top of local LLMs to keep our costs low: we use the obsidian app in this post to visualise notes coming out of our weekly engineering design meetings!

A unified framework for browser and API authentication

· 5 min read
Emma Burrows
Co-founder and CTO

The core of the Portia authorization framework is the ability for an agent to pause itself to solicit a user's authorization for an action it wants to perform. With delegated OAuth, we do this by creating an OAuth link that the user clicks on to grant Portia a token that can be used for the API requests made by the agent. We generally like API based agents for reliability reasons – they're fast, predictable and the rise of MCP means integration is getting easier.

However, there are some actions which are not easily accessible by API (my supermarket doesn't have a delegated OAuth flow surprisingly!), and so, there is huge power in being able to switch seamlessly between browser based and API based tasks. The question was, how to do this consistently and securely with our authorization framework.

A deep dive into our “User Led Learning” feature

· 11 min read
Mounir Mouawad
Co-founder and CEO
Mark Smith
Developer Relations

At Portia, we believe building agents for production means balancing AI autonomy with human control – something we call the ‘spectrum of autonomy’. We have previously seen how clarifications can be used during plan runs to handle the human:agent interface. With our new User Led Learning feature, we’re bringing this level of feedback into the planning process as well. Developers now have a powerful way to shape the Planning agent’s behavior—without rewriting prompts or tweaking models. When you generate a plan using the Portia AI SDK, that plan can be stored in the Portia cloud where it can be highlighted as a preferred plan with a simple thumbs-up. Each “like” tells the Portia planning agent, this was a good plan for this type of user intent—and over time, those signals help planning agents make better decisions on their own. It’s a subtle but powerful shift along the spectrum of autonomy: agents become more capable and self-directed, while still staying grounded in what users actually want.

More features for your production agent … and a fundraising announcement

· 6 min read
Emma Burrows
Co-founder and CTO
Mounir Mouawad
Co-founder and CEO

We came out of stealth a few weeks ago. Since then we’ve been working with our first few design partners on developing their production agents and have been heads down building out our SDK to solve their problems. To equip us with enough runway to grow, we’ve also been lucky enough to raise £4.4 million from some of the best investors we could ever hope for: General Catalyst (lead), First Minute Capital, Stem AI and some outstanding angel investors 🚀

In this post we want to give you a sense of what’s coming over the next couple of months.

Agent-Agent interfaces and Google's new A2A protocol

· 9 min read
Robbie Heywood
AI Engineer
Sam Stephens
Backend Engineer
Mounir Mouawad
Co-founder and CEO

This week, Google announced (↗) their new Agent-to-Agent protocol, A2A, designed to standardise how AI agents collaborate, even when run by different organisations using different underlying models. Positioned as complementary to MCP – which standardises agent access to external tools – A2A aims to standardise direct agent-agent communication. Google even declared A2A ♥️ MCP (↗), highlighting their vision for synergy between these protocols.

At Portia, we’ve been thinking about how agents interact with external systems via tools and agents for some time. You may have even read our post two weeks ago, Software interfaces in the agent era (↗). We divided the topic of agent integration with external systems into five categories based on increasing complexity, and A2A sits firmly at the top, in the Agent-Agent interface level.

Beyond APIs: Software interfaces in the agent era

· 12 min read
Tom Stuart
Backend Engineer
Robbie Heywood
AI Engineer
Mounir Mouawad
Co-founder and CEO

For decades, APIs have been the standard for connecting software systems. Whether REST, gRPC, or GraphQL, APIs follow the same principle: well-structured interfaces that are defined ahead of time to expose data and functionality to third parties. But as AI Agents start taking on more autonomous operations this rigid model is limiting what they can do.

APIs work well when requirements are known in advance, but agents often lack full context at the start. They explore, iterate and adapt based on their goals and real-time learning. Relying solely on predefined API calls can restrict an agent’s ability to interact dynamically with software.

Like many in our industry, we have been dealing a lot with the challenges of agent to software interfaces. We think the future of these interfaces will move beyond static APIs toward more flexible, expressive, and adaptive mechanisms. More on our thinking below, we’d love to hear your thoughts!

Build a refund agent with Portia AI and Stripe's MCP server

· 9 min read
Sam Stephens
Backend Engineer
Mounir Mouawad
Co-founder and CEO

Anthropic open sourced its Model Context Protocol (↗), or MCP for short, at the end of last year. The protocol is picking up steam as the go-to way to standardise the interface between agent frameworks and apps / data sources, with the list of official MCP server implementations (↗) growing rapidly. Our early users have already asked for an easy way to expose tools from an MCP server to a Portia client so we just released support for MCP servers in our SDK ⭐️.

In this blog post we show how you can combine the power of Portia AI’s abstractions with any tool set from an MCP server to create unique agent workflows. The example we go over is accessible in our agent examples repository here (↗).

Seamless human agent interactions with just-in-time authorization

· 6 min read
Emma Burrows
Co-founder and CTO
Mounir Mouawad
Co-founder and CEO

In part 1 of this series, we established why there is a need for a Just-In-Time (JIT) authorization system, whereby an agent has the ability to authorize itself only at the point where it is very likely that they will 1/ need that authorization and 2/ that they are clear what they will use it for. In this section, we’ll look at how we have done this at Portia AI.

Update - June 2025

Since publishing this post, we've made further improvements to just-in-time authorization within the Portia SDK - see more details at the end of this post.