Fallacies of Distributed Computing: 6. There is one administrator

The fallacies of distributed computing are a set of assertions describing false assumptions made about distributed systems.

L. Peter Deutsch drafted the first 7 fallacies in 1994, with the 8^th added by James Gosling in 1997.

The 8 fallacies are:

6. There is one administrator

The problem

If your system is self-contained and deployed into environments you fully control and manage there may indeed be only one administrator.

However, it is more likely that any non-trivial system will depend on one or more in-house or third-party services for its operation and be deployed into environments you do not completely control or that differ from each other in any number of ways e.g. operating system and/or application framework versions, security patching policies, UAC, shared resources, firewall rules, etc.

Multiple in-house teams may be involved to deploy and support the system, with each operating outside the processes governing that system’s development. Third-party services — their availability, compatibility, and development cadence — will typically exist outside your control.

Solutions

Infrastructure as Code

Where possible, automate the provisioning and configuration of the environments into which deployments are made. Infrastructure as Code (or IaC) is a core element of DevOps, with infrastructure modelled in code (or DSL-as-template form) — versus manually managed — and are governed by the same quality processes as application source code e.g. revisioned using source control, unit tested, etc.

Ideally, regard IaC as a core competency — that is, embed DevOps capabilities in your development teams.

Logging and monitoring

Diagnosing issues can be non-trivial at the best of times and particularly complex for distributed systems. To gain better visibility into the behaviour of a distributed system ensure that centralised logging, metrics, and tracing (the Three Pillars of Observability) have been considered as key aspects of the system’s design.

Decoupling

Ensuring appopriate decoupling between system components allows for greater resiliency in the event of either planned (system upgrades) or unplanned (service failure) downtime.

Introducing queuing (with appropriate retry policies, exponential backoff, and DLQs) is one method of achieving decoupling.

Derek Lawless