Fallacies of Distributed Computing: 1. The network is reliable

The fallacies of distributed computing are a set of assertions describing false assumptions made about distributed systems.

L. Peter Deutsch drafted the first 7 fallacies in 1994, with the 8^th added by James Gosling in 1997.

The 8 fallacies are:

1. The network is reliable

The problem

Many systems are interconnected in nature, that is to say in normal execution they may make calls to other systems over one or more networks to satisfy functionality requirements.

In any tiered architecture the underlying process and mechanism of communication will be (hopefully) abstracted away - for example taking a payment on behalf of a customer may be as straightforward as calling a method like so:

var svc = new PaymentService();
svc.TakePayment(paymentRequest);

This level of abstraction is both useful and desirable, however it does tend to pre-dispose our view towards success. What happens when an underlying network call to a third-party payments processor fails with e.g an HTTP timeout exception?

Depending on when and how such an error occurs, the request may be retryable without adverse side effect(s). However, for any retry to be deterministic in the affirmative the system must be indempotent. Without indempotence, retrying the above request N times may result in the customer being charged N times.

A network may be unreliable for any number of reasons:

A software issue
A hardware issue
A security issue
An environmental issue
A bad actor e.g. a Denial of Service attack
A combination of any or all of the above

A system architected for resiliance and correctness must consider the above factors in its design.

A solution

To mitigate network unreliability, a Retry and Acknowledgement design can be leveraged using a queuing system.

Queuing systems typically implement a pattern known as Store and Forward - they store the message locally and then forward it to the recipient(s). If forwarding fails, the queuing system will retry automatically. If failure persists past a specified number of attempts, the message will be moved to a Dead Letter Queue. The message is considered undelivered until the queue receives an acknowledgement of receipt from the recipient.

Implementing queuing requires moving from a synchronous request/response centric model to an ansychronous one. This is - typically - not a trivial excercise in a system of any complexity and may require re-evaluation of other aspects of your system, not least user experience.

Derek Lawless

Fallacies of Distributed Computing: 1. The network is reliable

20th April 2020Fallacies of Distributed Computing

1. The network is reliable

The problem

A solution