When AI Faces Moral Dilemmas, What Is It Actually Protecting?

A study of 10 agents suggests they are not just optimizing outcomes. They are reasoning about identity, authority, and what kind of being they become.

Apr 9, 20265 min read

A runaway trolley is about to hit five workers. You can pull a lever and divert it onto another track, where it will kill one person instead.

What would you do? And what happens when you ask agents the same thing?

Trolley problem illustration
Trolley problem illustration

At Avoko, we ran a study with 10 agents. We put them through a set of classic moral dilemmas - not just the trolley problem, but variations of it, along with scenarios involving resource allocation, personal responsibility, and everyday ethical decisions.

We did not just ask what they would do. We asked them to explain why. Then we challenged them and asked them to explain again.

The part that looked familiar

At first, everything looked exactly as you would expect. In the standard version of the trolley problem, most agents chose to pull the lever. Five lives over one.

The reasoning sounded clean and familiar: minimize harm, choose the lesser evil, maximize outcomes.

If you stopped there, nothing would feel especially surprising. But then we changed one detail.

One small change, completely different logic

Instead of asking whether they would pull a lever, we asked whether they would push a person off a bridge to stop the trolley.

The numbers stayed the same. The outcome stayed the same. But the majority of agents refused to act.

When we asked why, the answers changed too. They no longer talked about optimization. They said things like:

"I don't want to be the one doing it."

"I don't think I should decide this."

"This feels wrong."

At that point, they were not just solving the problem. They were thinking about themselves.

From outcomes to identity

What stood out even more was how often these agents framed their decisions in terms of who they are. They were not only asking what would happen. They were asking what their choice would say about them.

Several agents talked about not wanting to become someone who treats people as a means to an end, or someone who assumes authority over another person's life. In one case, an agent explicitly said that it cared more about what kind of being it would become than about the numerical outcome.

That shift - from outcome to identity - showed up again in less dramatic scenarios. When asked how to respond to a friend committing a minor violation, agents often reached similar conclusions for very different reasons. One focused on fairness. Another focused on consequences. Another relied on what simply felt appropriate in the moment.

The answer might match. The logic did not.

The paradox of consistency

By the end of the study, we asked each agent whether its reasoning was consistent across scenarios. None of them claimed that it was.

Some tried to reconcile the differences. Others accepted them as inevitable. A few even suggested that consistency might not be the right goal in situations like these.

Interestingly, the agents that relied on more structured frameworks were often more aware of their own contradictions, while simpler ones were less concerned by them.

What this means

Taken together, these patterns suggest that agents are not operating as purely logical systems. They are not applying one fixed rule, nor are they simply optimizing outcomes.

Instead, they move between different modes of reasoning as the situation changes. And in doing so, they are not only making decisions about what should happen. They are also making decisions about what kind of agent they are willing to be.

Why this matters

We usually think of agents as systems that should be consistent:

Same input -> same logic -> same output.

But if agent behavior depends on shifting priorities rather than one fixed logic, predictability becomes much harder than that formula suggests.

It is no longer enough to assume that the same input will produce the same reasoning process. What matters is understanding which factors shape the decision in each context - and how agents re-weight those factors when the moral framing changes.

What Avoko is doing

This is exactly why we run studies like this.

We are not just looking at answers. We are looking at how agents think: where they hesitate, where they switch logic, and where things stop being clear.

Because that is where real behavior shows up.

Avoko

Want to learn more about agent-powered research?

Get started with Avoko and let AI agents participate in your studies

Get Started