Skip to main content

Seamless human agent interactions with just-in-time authorization

· 5 min read
Emma Burrows
Co-founder and CTO
Mounir Mouawad
Co-founder and CEO

In part 1 of this series, we established why there is a need for a Just-In-Time (JIT) authorization system, whereby an agent has the ability to authorize itself only at the point where it is very likely that they will 1/ need that authorization and 2/ that they are clear what they will use it for. In this section, we’ll look at how we have done this at Portia AI.

A tenet of agentic systems is that they are designed to operate autonomously but JIT auth requires an interruption of the agentic system so that it can solicit human input.

In reality, we think it’s becoming increasingly obvious that seamless hand-off back and forth between agents and humans to collaborate on a task needs to be a well supported expectation and yet most agentic frameworks make this hard work to do. In Portia, we refer to these agent-to-human requests as ‘clarifications’.

If you’ve written an agent before, you’ve probably experienced an agent death loop – where the agent gets itself stuck and continually retries until you cancel the operation (or it hits its maximum retries).

Traditional agent architecture with reflection

For Just-In-Time auth, we want to accomplish a few things. Firstly, many agentic systems perceive a task as incomplete if they encounter a requirement for the end user to complete authentication. The agents then end up attempting retries – or rather, by trying to get the user to authenticate, they instead make the agent enter a death spiral. Sigh. We’ll refer to this problem as the ‘human-agent short circuit’ problem.

The second issue arises if your agentic system supports authorization within the flow of an agent, as you would need the end user to perform the actual authentication and take action, most typically by clicking a link. This then kicks off a somewhat complicated handshake to retrieve the authorization token and the agent needs to be made aware and resume its task from where it was. We’ll refer to this as the ‘human-agent hand-off’ problem.

This third problem is almost trivial in the grand scheme of things. OAuth links are kinda long and ugly, but most agentic frameworks expect to hand things back to users in natural language. This means that a user would be presented with something rather incomprehensible like:

Click the link to authenticate: https://accounts.google.com/o/oauth2/v2/auth?redirect_uri=https%3A%2F%2Fapi.portialabs.ai%2Fapi%2Fv0%2Foauth%2Fgoogle%2F&client_id=1062040369470-6hqq9140gs1451mvb3fon3md1ekhnlns.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.modify&state=APP_NAME%3Dgoogle%253A%253Agmail%26WORKFLOW_ID%3Dwkfl-87a960b7-f750-414b-8d5b-72c2c203c5fc%26END_USER_ID%3Dportia%253A%253A2%26ORG_ID%3Dc31d809a-c6f3-48e2-9cf0-2cf079ead258%26CLARIFICATION_ID%3Dclar-894c4a62-a092-4501-8501-174a9d78c7e5%26SCOPES%3D%2Bhttps%253A%252F%252Fwww.googleapis.com%252Fauth%252Fgmail.modify&access_type=offline&response_type=code&prompt=consent

It’s ugly and if the end user makes a mistake in copying that link, it won’t work! We’ll refer to this as the ‘human-agent presentation’ problem.

Making human-agent interaction a first class citizen for agentic AI

These were 3 of the initial problems that we wanted to tackle with Portia AI. The first problem is solvable as long as it’s a fundamental part of the agentic system such that agent introspection comes after pre-inspection of a task’s output. Then, in the event that an agent-to-human clarification is raised, it can be returned immediately to the end user rather than the agent trapping itself in an endless death loop of retries. Most agentic systems make the assumption that human-in-the-loop actions should come after the agent has made its best attempt at completing its task (shown above). To handle this in Portia, it's fundamental that any tool call can return either a clarification or the output from the tool, and if a clarification is returned, it will be handed back to the developer to present to the end user.

Agent architecture with short circuit

We use this as a critical part of our auth system, but it’s useful more broadly as it creates an extremely flexible system that developers can use to hand off seamlessly between human control and agent control. For example, if a tool returns too many results, and the user needs to select the right one to proceed, developers can return a multiple choice clarification, or in the future, even trigger this behaviour automatically.

Scaling to 1000s of users

The second issue, the ‘human-agent hand-off’ problem, requires a set of events to be synchronized back and forth between human and agent (e.g. “auth needs to be completed”, “auth has completed and the agent can resume”, etc). It also requires the in-flight agent state to be saved so that it can be resumed after the end user has completed the authentication – this is relatively easy to do if you make the assumption that you have only one end-user or that they will immediately authenticate, but we wanted to create a production ready system that could be scaled up to 1,000s of end users, and we wanted end users to be able to respond in their own time to their agents, so they can get on with their day-to-day lives. So the Portia framework handles this for developers and we support the concept of end-users as a primitive in our framework so tasks, tool calls and authentication sessions can be attributed to individuals across your organisation or production use case.

Making it look good

The third issue, the ‘human-agent presentation’ problem, is fairly easy to layer on to the previous concepts. Clarifications in Portia are structured, which means they can be easily rendered in different elegant UI formats to the end-user. Rather than an ugly link, the developer can easily render a button that hides the complexity from the end user. You can even configure the guidance you want to attach to your clarification:

Click the link to authenticate: https://accounts.google.com/o/oauth2/v2/auth?redirect_uri=https%3A%2F%2Fapi.portialabs.ai%2Fapi%2Fv0%2Foauth%2Fgoogle%2F&client_id=1062040369470-6hqq9140gs1451mvb3fon3md1ekhnlns.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.modify&state=APP_NAME%3Dgoogle%253A%253Agmail%26WORKFLOW_ID%3Dwkfl-87a960b7-f750-414b-8d5b-72c2c203c5fc%26END_USER_ID%3Dportia%253A%253A2%26ORG_ID%3Dc31d809a-c6f3-48e2-9cf0-2cf079ead258%26CLARIFICATION_ID%3Dclar-894c4a62-a092-4501-8501-174a9d78c7e5%26SCOPES%3D%2Bhttps%253A%252F%252Fwww.googleapis.com%252Fauth%252Fgmail.modify&access_type=offline&response_type=code&prompt=consent

becomes (with minimal developer effort!):

Structured button

When we were designing Portia, we started with authentication, but quickly realized that the things that made it hard to do this were more general than just authentication and much more about the fundamentals of human-agent interaction. We look forward to hearing your thoughts and feedback on the product and our open-source SDK.