Fallacies of Distributed Computing: 6. There is one administrator
The fallacies of distributed computing are a set of assertions describing false assumptions made about distributed systems.
The 8 fallacies are:
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology doesn’t change
- There is one administrator
- Transport cost is zero
- The network is homogeneous
6. There is one administrator
If your system is self-contained and deployed into environments you fully control and manage there may indeed be only one administrator.
However, it is more likely that any non-trivial system will depend on one or more in-house or third-party services for its operation and be deployed into environments you do not completely control or that differ from each other in any number of ways e.g. operating system and/or application framework versions, security patching policies, UAC, shared resources, firewall rules, etc.
Multiple in-house teams may be involved to deploy and support the system, with each operating outside the processes governing that system’s development. Third-party services — their availability, compatibility, and development cadence — will typically exist outside your control.
Infrastructure as Code
Where possible, automate the provisioning and configuration of the environments into which deployments are made. Infrastructure as Code (or IaC) is a core element of DevOps, with infrastructure modelled in code (or DSL-as-template form) — versus manually managed — and are governed by the same quality processes as application source code e.g. revisioned using source control, unit tested, etc.
Ideally, regard IaC as a core competency — that is, embed DevOps capabilities in your development teams.
Logging and monitoring
Diagnosing issues can be non-trivial at the best of times and particularly complex for distributed systems. To gain better visibility into the behaviour of a distributed system ensure that centralised logging, metrics, and tracing (the Three Pillars of Observability) have been considered as key aspects of the system’s design.
Ensuring appopriate decoupling between system components allows for greater resiliency in the event of either planned (system upgrades) or unplanned (service failure) downtime.
Introducing queuing (with appropriate retry policies, exponential backoff, and DLQs) is one method of achieving decoupling.