Why we need State machines

Published on May 23, 2026

I first met state machines in a fifth-semester Theory of Computing module: neat diagrams, formal proofs, and exam questions about string automata. It felt purely academic.

Years later, I keep running into the same concept in production, usually showing up as an order stuck between “failed” and “fulfilled” with nobody able to say which is true. When software breaks, the cause is rarely a missing if statement. More often, the system has entered a state that nobody clearly named.

An order submitted but never acknowledged; a double-clicked “confirm” button; a retry arriving after a timeout. These look like edge cases, but they are state problems.

We tend to model software as data and functions. Real software, though, is behavior over time. When we don’t make those states explicit, the state machine still exists. It is just scattered across booleans, timestamps, database columns, and developer assumptions, and that is where bugs live.

One of the most dangerous misconceptions is believing you don’t have a state machine. Any workflow with multiple stages already forms one. The only difference is whether the transitions are documented and enforced, or hidden across conditionals, database fields, and developer assumptions.

Take a simple order sent to a fulfillment service. The order carries its state as an enum:

enum OrderState {
    CREATED, SUBMITTING, SUBMITTED, FULFILLED, FAILED, CANCELLED
}

The driving code looks reasonable on the surface:

void processOrder(String orderId) {
    Order order = db.getOrder(orderId);
    if (order.getState() == OrderState.CANCELLED) return;

    if (order.getState() == OrderState.CREATED) {
        order.setState(OrderState.SUBMITTING);
        provider.submit(order);
        order.setState(OrderState.SUBMITTED);
        db.save(order);
    }

    String status = provider.getStatus(order.getExternalId());
    if (status.equals("FULFILLED")) {
        order.setState(OrderState.FULFILLED);
        db.save(order);
    }
}

Notice what is missing: nothing decides which moves between states are legal. setState accepts any value from any starting point. The states exist; the rules do not.

This invites critical bugs:

  • Can an order go straight from CANCELLED to FULFILLED?
  • Can a FAILED order be retried, and from which states?
  • What happens if the provider callback arrives after we manually moved the order to FAILED?

Every developer has to rebuild the rules in their head. The system gets harder to reason about over time, not because the domain is complex, but because the transition model is missing.

To fix this, name the events and define the allowed transitions clearly:

enum OrderEvent {
    SUBMIT_REQUESTED, SUBMIT_ACCEPTED, SUBMIT_REJECTED,
    PROVIDER_CONFIRMED_FULFILLED, PROVIDER_CONFIRMED_FAILURE,
    CANCEL_REQUESTED, RETRY_REQUESTED
}
Current stateEventNext state
CREATEDSUBMIT_REQUESTEDSUBMITTING
SUBMITTINGSUBMIT_ACCEPTEDSUBMITTED
SUBMITTINGSUBMIT_REJECTEDFAILED
SUBMITTEDPROVIDER_CONFIRMED_FULFILLEDFULFILLED
SUBMITTEDPROVIDER_CONFIRMED_FAILUREFAILED
FAILEDRETRY_REQUESTEDSUBMITTING
CREATEDCANCEL_REQUESTEDCANCELLED

The full picture is easier to see as a diagram:

State diagram of the order lifecycle showing transitions between CREATED, SUBMITTING, SUBMITTED, FULFILLED, FAILED, and CANCELLED states triggered by events such as SUBMIT_REQUESTED, PROVIDER_CONFIRMED_FULFILLED, and RETRY_REQUESTED

By routing state changes through a central transition(currentState, event) function that throws an IllegalStateException on invalid pairs, we prevent accidental jumps.

State machines don’t remove complexity by magic; they just move it out of scattered code into a model you can actually inspect and challenge.

Making transitions explicit transforms how you develop, test, and run systems:

It sharpens communication. Instead of a vague question like “Can we cancel this order?”, the team asks a precise one: “Can we transition from SUBMITTED to CANCELLED?”

It cleans up testing. You stop testing internal service-method branches and start testing behavior directly:

Given state SUBMITTED
When RETRY_REQUESTED happens
Then the transition should be REJECTED

It makes observability richer. Instead of logging a generic order updated, your logs tell a chronological story: order_id=123 | event=PROVIDER_CONFIRMED_FULFILLED | transition=SUBMITTED->FULFILLED.

And it adds resilience in distributed systems. When messages arrive late or duplicated, checking each incoming event against an explicit state lets the system safely decide whether to process, ignore, or escalate it.


Why this matters in the AI era

As AI agents write more of our code, the value of a clear specification skyrockets. LLMs are excellent at scaffolding happy paths, but when business rules are vague, they guess. And their guesses can be dangerous.

If you ask an AI to “implement the order flow,” it will write clean-looking code with a few conditionals. But it won’t know if a fulfilled order can be retried or if a late callback should trigger an alert.

A state machine acts as a bulletproof prompt contract. Instead of open-ended instructions, you hand the agent your transition table and a few invariants:

- FULFILLED and CANCELLED are terminal states.
- An order cannot be cancelled after it has been submitted.

Now, the AI has far less room to invent behavior. It can predictably generate the exact implementation, write tests for invalid transitions, and add structured logging around state changes.

The shift is worth stating plainly:

AI does not remove the need for design. It increases the need for clear design.

When humans write ambiguous code, other humans infer context via institutional knowledge. AI agents don’t have that backchannel; they will simply generate flawed logic with absolute confidence.


You don’t need a state machine for everything. But if your software deals with retries, asynchronous callbacks, user approvals, or external APIs, you already have a state machine.

The only question is whether it’s explicitly protecting your system, or hidden in your code causing bugs.