Asynchronous Request-Response Pattern

Published on February 16, 2026

We had a service that kicked off requests to an external provider. The provider would process them asynchronously and fire a callback when done. Simple enough, until we scaled to multiple instances.

The callback came back, but it hit a random instance behind the load balancer. Not the one that initiated the request. Not the one holding the context needed to act on the response. A different one entirely, with no idea what to do with it.

That’s when I realized: getting a response back isn’t the hard part. Getting it back to the right place is.

The Constraint

Consider two services: Service X (the requester) and Service Y (the processor). Service X sends requests to Service Y for long-running operations. Here’s the constraint that makes this interesting:

The response must be processed by the same instance of Service X that initiated the request.

This comes up more than you’d think:

  • In-memory session state: The requesting instance holds user session data needed to process the response
  • WebSocket connections: A specific instance maintains the client’s WebSocket connection
  • Distributed transactions: The initiating instance coordinates a multi-step transaction
  • External callbacks: The initiating instance has to relay the result back through the same channel it received the original request on

This rules out simple load-balanced responses. We need a mechanism to route responses back to the exact originating instance.

Polling

The most straightforward approach. Your instance fires off the request and keeps asking “is it done yet?” like a kid on a road trip.

Trade-offs

Pros

  • Simple to implement
  • Works with any HTTP infrastructure
  • Easy to debug and monitor
  • No additional infrastructure needed

Cons

  • Polling overhead scales with concurrent requests
  • Artificial latency (up to poll_interval delay)
  • Wastes resources checking pending jobs
  • Service Y must store job state

It works, but it doesn’t scale gracefully. At a 2-second poll interval with 200 concurrent requests, you’re making 100 requests per second to check on jobs that might only complete a handful of times per minute.

Webhooks with Load Balancer Routing

Webhooks flip the model: instead of asking, you get told. But now you’re back to the original problem, the callback hits a random instance.

The workaround is load balancer configuration to route based on instance identifiers, through custom headers, query parameters, or sticky session cookies. It works, but it tightly couples your application logic to infrastructure configuration. Every time you add an instance, your LB rules need updating. Every time an instance dies, stale routes break things.

Trade-offs

Pros

  • No polling overhead
  • Lower latency than polling
  • Push-based model

Cons

  • Requires LB configuration changes
  • Instance routing logic leaks into infrastructure
  • Handling instance failures is complex
  • Firewall/network rules may block callbacks

Message Queues

Message queues provide a clean separation of concerns and handle the routing problem at the application layer, not the infrastructure layer.

Kafka

Kafka can solve this, but the solutions feel like workarounds:

Option A: Broadcast Response Topic

  • All X instances subscribe to the response topic
  • Each message includes the target instance ID
  • All instances receive all messages; only the target processes it

Imagine a post office that delivers every letter to every house, and each household checks “is this for me?” before throwing away the rest. It works, but it’s wasteful.

Option B: Partitioned by Instance

  • Response topic partitioned by instance ID
  • Each X instance owns specific partitions

Better, but now scaling becomes a partition management problem. Add an instance? Rebalance. Remove one? Rebalance. Kafka wasn’t designed for request-reply - the ecosystem has tried to bridge this gap (Spring’s ReplyingKafkaTemplate, KIP-762), but the workarounds reinforce the point: you’re fighting the grain of the tool.

Kafka Trade-offs

ProsCons
Durable message storageBroadcast wastes network/CPU on all instances
High throughputPartitioning complicates scaling (rebalancing)
Built-in replicationAdding/removing instances requires partition reassignment
Not designed for request-reply patterns

RabbitMQ: The Reply-To Pattern

This is what eventually clicked for me. RabbitMQ’s Direct Reply-To feature is purpose-built for this exact problem.

Each requesting instance creates a temporary, exclusive queue and includes its reference in the message. The processor sends responses directly to that queue. No broadcast. No partition management. No infrastructure coupling.

The glue that holds this together is the correlation ID. When Service X publishes a request, it attaches a unique correlation ID along with its reply-to queue. When Service Y finishes processing, it copies that correlation ID onto the response and publishes it to the reply-to queue. Service X matches the incoming correlation ID to the original request and knows exactly which operation just completed. Without this, you’d receive responses on your exclusive queue with no way to tie them back to the request that triggered them.

Here’s the core of it. Service X declares an exclusive queue, publishes with replyTo and correlationId set, and consumes from its own queue:

ConnectionFactory factory = new ConnectionFactory();
factory.setHost("localhost");
Connection connection = factory.newConnection();
Channel channel = connection.createChannel();

// Each instance gets its own exclusive queue
String callbackQueue = channel.queueDeclare("", false, true, true, null).getQueue();

String correlationId = UUID.randomUUID().toString();

// Publish request with reply-to and correlation ID
AMQP.BasicProperties props = new AMQP.BasicProperties.Builder()
        .replyTo(callbackQueue)
        .correlationId(correlationId)
        .build();

channel.basicPublish("", "task_queue", props, "process this request".getBytes());

// Consume responses - only this instance receives them
channel.basicConsume(callbackQueue, true, (consumerTag, delivery) -> {
    if (delivery.getProperties().getCorrelationId().equals(correlationId)) {
        System.out.println("Got response: " + new String(delivery.getBody()));
    }
}, consumerTag -> {});

Service Y picks up requests and sends responses back to the reply-to queue:

channel.basicConsume("task_queue", false, (consumerTag, delivery) -> {
    AMQP.BasicProperties replyProps = new AMQP.BasicProperties.Builder()
            .correlationId(delivery.getProperties().getCorrelationId())
            .build();

    String result = process(delivery.getBody());

    channel.basicPublish("", 
      delivery
        .getProperties()
        .getReplyTo(), replyProps, result.getBytes());
    channel.basicAck(delivery.getEnvelope().getDeliveryTag(), false);
}, consumerTag -> {});

The queue belongs to the instance, and when the instance is gone, so is the queue. No routing tables, no infrastructure changes.

RabbitMQ Edge Cases

Once you adopt this pattern, there are a few operational details worth considering.

Orphan Queues

When a Service X instance crashes without graceful shutdown, its exclusive queue may briefly persist. RabbitMQ handles this automatically:

  • Exclusive queues are deleted when the connection closes (including unexpected disconnections)
  • Auto-delete queues are removed when the last consumer disconnects
  • The broker’s heartbeat mechanism detects dead connections (typically within 60 seconds)

Graceful Shutdown

Service X instances must handle in-flight requests during shutdown. You have two options: cancel all pending requests immediately, or wait for them with a timeout. In practice, a short timeout (5-10 seconds) with a fallback to cancellation strikes the right balance.

Message Expiration

Set a TTL on your messages. If a response arrives after the requesting instance is long gone, it shouldn’t sit in a queue forever. Dead letter exchanges give you visibility into these expired messages without silently dropping them.


CriteriaPollingWebhooksKafkaRabbitMQ Reply-To
Instance routingNaturalRequires LB configComplexBuilt-in
LatencyPoll intervalLowLowLow
ScalabilityPolling overheadGoodPartition rebalancingGood
InfrastructureNone extraLB changesKafka clusterRabbitMQ cluster
ComplexitySimpleModerateHighModerate
Failure handlingRetry-friendlyComplexDurableAcknowledgments

When to Use Each Approach

Choose Polling when:

  • Request volume is low
  • Simplicity is the priority
  • You don’t want additional infrastructure

Choose Webhooks when:

  • You already have sophisticated load balancer configuration
  • Network topology allows callbacks
  • Latency requirements are moderate

Choose Kafka when:

  • You’re already invested in the Kafka ecosystem
  • Message durability and replay are critical
  • You can accept the partition management overhead

Choose RabbitMQ Reply-To when:

  • You need instance-specific routing without infrastructure changes
  • Request-response is your primary pattern
  • You want built-in queue lifecycle management

The problem I ran into, getting a callback to the right instance, isn’t exotic. It’s one of those distributed systems problems that feels simple until you’re debugging why responses are landing on instances that have no idea what to do with them. Polling works until it doesn’t scale. Webhooks work until your infrastructure team pushes back. Kafka works until you’re drowning in partition rebalancing.

RabbitMQ’s reply-to pattern solved it cleanly: temporary queues, correlation IDs, and automatic cleanup. It added a dependency, but removed a whole category of routing complexity.

If you’re dealing with stateful async communication between scaled services, start here.

Further Reading