How Portia ensures reliable agents with evals and our in-house framework
· 8 min read
At Portia we spend a lot of time thinking about what it means to make agents reliable and production worthy. Lots of people find it easy to make agents for a proof of concept but much harder to get them into production. It takes a real focus on production readiness and a suite of features to do so (lots of which are available in our SDK as we’ve talked about in previous blog posts):
- User Led Learning for reliable planning
- Agent Memory for large data sets
- Human in the loop clarifications to let agents raise questions back to humans
- Separate planning and execution phases for constrained execution
But today we want to focus on the meta question of how we know that these features help improve the reliability of agents built on top of them by talking about evals.