Asynchronous Request-Response Pattern

Published on February 16, 2026

We had a service that kicked off requests to an external provider. The provider would process them asynchronously and fire a callback when done. Simple enough, until we scaled to multiple instances.

The callback came back, but it hit a random instance behind the load balancer. Not the one that initiated the request. Not the one holding the context needed to act on the response. A different one entirely, with no idea what to do with it.

That’s when I realized: getting a response back isn’t the hard part. Getting it back to the right place is.

The Constraint

Consider two services: Service X (the requester) and Service Y (the processor). Service X sends requests to Service Y for long-running operations. Here’s the constraint that makes this interesting:

The response must be processed by the same instance of Service X that initiated the request.

This comes up more than you’d think:

In-memory session state: The requesting instance holds user session data needed to process the response
WebSocket connections: A specific instance maintains the client’s WebSocket connection
Distributed transactions: The initiating instance coordinates a multi-step transaction
External callbacks: The initiating instance has to relay the result back through the same channel it received the original request on

This rules out simple load-balanced responses. We need a mechanism to route responses back to the exact originating instance.

Polling

The most straightforward approach. Your instance fires off the request and keeps asking “is it done yet?” like a kid on a road trip.

Trade-offs

Pros

Simple to implement
Works with any HTTP infrastructure
Easy to debug and monitor
No additional infrastructure needed

Cons

Polling overhead scales with concurrent requests
Artificial latency (up to poll_interval delay)
Wastes resources checking pending jobs
Service Y must store job state

It works, but it doesn’t scale gracefully. At a 2-second poll interval with 200 concurrent requests, you’re making 100 requests per second to check on jobs that might only complete a handful of times per minute.

Webhooks with Load Balancer Routing

Webhooks flip the model: instead of asking, you get told. But now you’re back to the original problem, the callback hits a random instance.

The workaround is load balancer configuration to route based on instance identifiers, through custom headers, query parameters, or sticky session cookies. It works, but it tightly couples your application logic to infrastructure configuration. Every time you add an instance, your LB rules need updating. Every time an instance dies, stale routes break things.

Trade-offs

Pros

No polling overhead
Lower latency than polling
Push-based model

Cons

Requires LB configuration changes
Instance routing logic leaks into infrastructure
Handling instance failures is complex
Firewall/network rules may block callbacks

Message Queues

Message queues provide a clean separation of concerns and handle the routing problem at the application layer, not the infrastructure layer.

Kafka

Kafka can solve this, but the solutions feel like workarounds:

Option A: Broadcast Response Topic

All X instances subscribe to the response topic
Each message includes the target instance ID
All instances receive all messages; only the target processes it

Imagine a post office that delivers every letter to every house, and each household checks “is this for me?” before throwing away the rest. It works, but it’s wasteful.

Option B: Partitioned by Instance

Response topic partitioned by instance ID
Each X instance owns specific partitions

Better, but now scaling becomes a partition management problem. Add an instance? Rebalance. Remove one? Rebalance. Kafka wasn’t designed for request-reply - the ecosystem has tried to bridge this gap (Spring’s ReplyingKafkaTemplate, KIP-762), but the workarounds reinforce the point: you’re fighting the grain of the tool.

Kafka Trade-offs

Pros	Cons
Durable message storage	Broadcast wastes network/CPU on all instances
High throughput	Partitioning complicates scaling (rebalancing)
Built-in replication	Adding/removing instances requires partition reassignment
	Not designed for request-reply patterns

RabbitMQ: The Reply-To Pattern

This is what eventually clicked for me. RabbitMQ’s Direct Reply-To feature is purpose-built for this exact problem.

Each requesting instance creates a temporary, exclusive queue and includes its reference in the message. The processor sends responses directly to that queue. No broadcast. No partition management. No infrastructure coupling.

The glue that holds this together is the correlation ID. When Service X publishes a request, it attaches a unique correlation ID along with its reply-to queue. When Service Y finishes processing, it copies that correlation ID onto the response and publishes it to the reply-to queue. Service X matches the incoming correlation ID to the original request and knows exactly which operation just completed. Without this, you’d receive responses on your exclusive queue with no way to tie them back to the request that triggered them.

Here’s the core of it. Service X declares an exclusive queue, publishes with replyTo and correlationId set, and consumes from its own queue:

ConnectionFactory factory = new ConnectionFactory();
factory.setHost("localhost");
Connection connection = factory.newConnection();
Channel channel = connection.createChannel();

// Each instance gets its own exclusive queue
String callbackQueue = channel.queueDeclare("", false, true, true, null).getQueue();

String correlationId = UUID.randomUUID().toString();

// Publish request with reply-to and correlation ID
AMQP.BasicProperties props = new AMQP.BasicProperties.Builder()
        .replyTo(callbackQueue)
        .correlationId(correlationId)
        .build();

channel.basicPublish("", "task_queue", props, "process this request".getBytes());

// Consume responses - only this instance receives them
channel.basicConsume(callbackQueue, true, (consumerTag, delivery) -> {
    if (delivery.getProperties().getCorrelationId().equals(correlationId)) {
        System.out.println("Got response: " + new String(delivery.getBody()));
    }
}, consumerTag -> {});

Service Y picks up requests and sends responses back to the reply-to queue:

channel.basicConsume("task_queue", false, (consumerTag, delivery) -> {
    AMQP.BasicProperties replyProps = new AMQP.BasicProperties.Builder()
            .correlationId(delivery.getProperties().getCorrelationId())
            .build();

    String result = process(delivery.getBody());

    channel.basicPublish("", 
      delivery
        .getProperties()
        .getReplyTo(), replyProps, result.getBytes());
    channel.basicAck(delivery.getEnvelope().getDeliveryTag(), false);
}, consumerTag -> {});

The queue belongs to the instance, and when the instance is gone, so is the queue. No routing tables, no infrastructure changes.

RabbitMQ Edge Cases

Once you adopt this pattern, there are a few operational details worth considering.

Orphan Queues

When a Service X instance crashes without graceful shutdown, its exclusive queue may briefly persist. RabbitMQ handles this automatically:

Exclusive queues are deleted when the connection closes (including unexpected disconnections)
Auto-delete queues are removed when the last consumer disconnects
The broker’s heartbeat mechanism detects dead connections (typically within 60 seconds)

Graceful Shutdown

Service X instances must handle in-flight requests during shutdown. You have two options: cancel all pending requests immediately, or wait for them with a timeout. In practice, a short timeout (5-10 seconds) with a fallback to cancellation strikes the right balance.

Message Expiration

Set a TTL on your messages. If a response arrives after the requesting instance is long gone, it shouldn’t sit in a queue forever. Dead letter exchanges give you visibility into these expired messages without silently dropping them.

Criteria	Polling	Webhooks	Kafka	RabbitMQ Reply-To
Instance routing	Natural	Requires LB config	Complex	Built-in
Latency	Poll interval	Low	Low	Low
Scalability	Polling overhead	Good	Partition rebalancing	Good
Infrastructure	None extra	LB changes	Kafka cluster	RabbitMQ cluster
Complexity	Simple	Moderate	High	Moderate
Failure handling	Retry-friendly	Complex	Durable	Acknowledgments

When to Use Each Approach

Choose Polling when:

Request volume is low
Simplicity is the priority
You don’t want additional infrastructure

Choose Webhooks when:

You already have sophisticated load balancer configuration
Network topology allows callbacks
Latency requirements are moderate

Choose Kafka when:

You’re already invested in the Kafka ecosystem
Message durability and replay are critical
You can accept the partition management overhead

Choose RabbitMQ Reply-To when:

You need instance-specific routing without infrastructure changes
Request-response is your primary pattern
You want built-in queue lifecycle management

The problem I ran into, getting a callback to the right instance, isn’t exotic. It’s one of those distributed systems problems that feels simple until you’re debugging why responses are landing on instances that have no idea what to do with them. Polling works until it doesn’t scale. Webhooks work until your infrastructure team pushes back. Kafka works until you’re drowning in partition rebalancing.

RabbitMQ’s reply-to pattern solved it cleanly: temporary queues, correlation IDs, and automatic cleanup. It added a dependency, but removed a whole category of routing complexity.

If you’re dealing with stateful async communication between scaled services, start here.