Message Delivery Guarantees in Distributed Systems

Same Message. Different Outcomes

Today•3 min read

Imagine three people successfully transferred 10k to your bank account.

First payment, never reaches you
Second payment, credited twice
Third payment, credited exactly once

For a customer, these are the three different experiences.

But for the software engineer designing distributed systems, this represents the three core message delivery guarantees.

At most once
At least once
Exactly once

What are message delivery guarantees?

In distributed systems, multiple components communicate with each other via message brokers (such as Kafka and RabbitMQ). Message delivery guarantees refer to how surely the message will be delivered from the producer (sender) to the consumer (receiver).

A message is considered delivered if the broker receives a successful acknowledgment from the consumer.

Let's break down what the different guarantees are and when to use them.

At most Once Delivery

A message will be sent to the consumer only once from the message broker. Once the message is sent from the broker, it will be marked as delivered, basically, fire and forget. If it fails at the consumer due to any app error or network blips, then the message will be lost forever.

There is no failure handling or retry mechanism, and this is how it results in an experience where the first payment never reaches your bank account, even after being deducted from the sender's account.

This can be used in systems where some sort of data loss is acceptable. Like logger, UI/UX metrics ingestion, or location ingestion, etc.

At least Once Delivery

A message will be sent to the consumer once or multiple times from the message broker until it gets a successful acknowledgment. This can be achieved by failure handling with the help of a retry mechanism.

Consider the earlier example where 10k is getting credited to your account.

The consumer process will look like:

Message received
updateDB (credit amount in receiver account)
notifyCustomer
Ack Broker

Let's say the request processing failed just before acknowledging the broker. In this case, the message will be considered as undelivered at the broker end. Since this is an at-least-once delivery semantics, the broker will resend the message using a retry mechanism.

Retry mechanism: A system that retries the delivery of failed messages as per the configured retry policy (number of retries & retry after time) and eventually adds the message to the dead letter queue if all the retries fail to deliver the message.

It is possible for a broker the message is not delivered, while for the consumer, the message is processed successfully. In this scenario, the broker will retry the delivery of the same message. Now, on retry, the message got delivered successfully, but this caused data duplication since the same message was already processed at the application side without acknowledging the broker.

In this way, at least one message delivery is guaranteed, but data duplication can occur. And that's why the second payment reaches your bank account twice, even after being deducted from the sender's account only once.

This can be used in a system where some percentage of data duplication or duplicate processing won't affect the business outcome. Like webhooks to third parties, social media notifications, or email marketing campaigns where the data duplication is harmless.

Exactly Once Delivery

In this, the message will be received by the consumer and processed successfully only once. As per Confluent's senior engineers, this is technically difficult to achieve. It requires complex coordination and high cost.

But why is this guarantee difficult to achieve?

In distributed systems, broker and consumer are independent services, and to ensure exact single delivery and processing, both the system needs to be aware of each other's states. Post-processing consumer should send a success acknowledgment to the broker, but network issues can occur before both systems become aware of each other's state, and this creates uncertainty whether the message is processed, acknowledged, both, or neither. To resolve this ambiguity, there is a need for a coordinator mechanism such as transactions or distributed commit protocols, which introduce an engineering overhead.

To avoid this overhead, engineers prefer at least one delivery along with consumer deduplication or idempotency. This ensures the message will be delivered from the broker and will be processed exactly once at the consumer.

This can be used in systems where duplication of data is not accepted at all. Like payments, order placement, or ticket booking.

Conclusion

In this article, we discussed different types of delivery guarantees and how to achieve exactly once delivery in an efficient way.

Message Delivery Guarantees in Distributed Systems

What are message delivery guarantees?

Let's break down what the different guarantees are and when to use them.

At most Once Delivery

At least Once Delivery

Exactly Once Delivery

Conclusion

Written by Ankit Jain

Subscribe to my newsletter

What are message delivery guarantees?

Let's break down what the different guarantees are and when to use them.

At most Once Delivery

At least Once Delivery

Exactly Once Delivery

Conclusion

Written by Ankit Jain

Subscribe to my newsletter

More from Ankit