Centralised v Distributed Architectures

<< Previous - Sellability | Table of Contents | Next >> - Brand Reputation


DISTRIBUTED V CENTRALISED ARCHITECTURE

One of the big architectural talking points over the last decade or so has been the shift away from the centralised (monolithic) architecture towards the distributed architecture. I describe the reasoning for this shift here (How We Got Here?).

In this chapter summarise points and provide something of a comparison between these architectural styles. If you'd like to know more, I suggest reading the benefits and drawbacks in the relevant sections (e.g. Monolith, Microservices).

GUIDES, NOT RULES

I do lay certain expectations on the reader. You still need to do your own thinking. After all, it's your workload, and your business. These are guides, not rules. The table below shows typical cases, which, of course, means there will be exceptions [1].
Aspect Centralised Distributed
History / Reasoning Suitability Proven over ~5 decades, across every conceivable sector. A simpler, comprehensible model, useful for rapid prototyping. Proven over ~1-to-2 decades. A more complex architectural model, built on the necessity to deliver more rapid change, adaptability, and business scale.
Commonly applied to - Startups - e.g. proving out a business / product.
- Performance critical applications, or ones that require the Principle of Locality.
- Workloads requiring Immediate Consistency.
- Applications that require self-hosting on a client location.
- Many business types now use distributed architectures.
- Businesses desiring rapid change, Agility, and scale.
- SAAS solutions (although not exclusively for a distributed architecture).
Drawbacks Failed to inspire the modern digital web revolution and Ravenous Consumption model (change was too slow), requiring an industry “rethink”. Complexity is transferred to the architecture.
Distinguishing Characteristics Uses a centralised model (for everything). Code is held in one place, software is deployed as a single unit, data is managed in one place, and deployment tools are singular. Uses a distributed model (for everything). Code is distributed in multiple repositories, software is deployed independently and often encapsulates a business domain, databases are independent (and often linked to the domain concept), and there's a 1-to-1 relationship between deployment tool and software component.
Complexity Complexity lies mainly within the (single) codebase and in the effort to deliver change (the result of Batching), creating Change Friction. Complexity lies within the architecture (e.g. instance management).
(Typical) number of instances Small, possibly one, or a small handful, albeit there's no technical limitation. Many, potentially in the hundreds or thousands, of smaller instances.
Instance lifetime (and implications) Instances are long-lived, and therefore cause Loss Aversion - a form of Control. Unhealthy instances are nursed back to health, often manually, causing Circumvention. Instances are short-lived, ephemeral, and therefore exhibit less Loss Aversion. Unhealthy instances are simply terminated and replaced in a repeatable, predictable manner.
Instance Attachment High attachment (and therefore investment) due to:
- Many Assumptions embedded in the application, and its deployment.
- Lots of configuration in one place.
- Significantly fewer instances (i.e. we can ill-afford to lose one).
- Instances tend to be more difficult to replace, thus we are more invested in them.
Low attachment as:
- Each service contains far fewer embedded Assumptions.
- They are cohesive.
- They have fewer dependents, and lower Complexity, making it (relatively) easy to spin up new instances.

Instances are treated as Cattle, not Pets.
Codebase Single, centralised. Initially, this makes for a simple development experience, but one that gets increasingly difficult to manage as functionality is added. Eventually a Complexity Event Horizon is reached, indicating:
- Staff on-boarding is increasingly difficult.
- Staff don't want to work with it, which may have retention issues.
- Change is slow.
Multiple, many, distributed. Each component gets its own code repository. This is fine when working on a single component (or small number of components), but harder when you need everything.
Data stores (typically) One, centralised. Multiple / many, decentralised, and potentially using (many) distinct technologies.
Interaction Style Local communications (on the same host or virtual machine). Remote communications (distinct hosts or virtual machines).
Scalability Model Vertical, but with the potential for a Horizontal option (outside of the centralised instance). However, you may be scaling for things you never use, and you can't easily tailor resources to specific areas. Vertical and Horizontal. You may vertically scale an instance, or horizontally scale (add more instances). Theoretically there are an unlimited number of instances that could be scaled horizontally (of course, not so in practice).
(Typical) Consistency Model Immediate Consistency. A transaction is managed by one instance, allowing a range of sub-transactions to be performed in isolation, yet made immediately available to all constituent parts of the system. Eventual Consistency. The distributed model means transactions are also distributed. The entire business transaction is not immediately consistent across all parts of the system.
Rollback / replay Relatively easy with atomic transactions (see Immediate Consistency) - you simply rollback the transaction.

Replaying a failed transaction should also be easier, since there are no residual pollutants (it was fully rolled back). Once you find the issue, you re-run it.
Harder, due to Eventual Consistency. Rollback involves coordinating each distributed transaction to rollback, or accepting pollutants in parts of the system, which may, or may not, be resolved.

Replay is also harder (unless requests are Idempotent). For instance, how do you indicate which parts of a business transaction to complete?
Observability Relatively easy due to the centralised model. Everything related to a transaction is captured in one place, making it easy to trace or diagnose issues. However, there's no real resource delineation to understand system health (e.g. how much RAM is the carts solution using?) - it's allocated at the top level. Harder. Each instance tends to be ephemeral and, by its nature, distributed. Gathering metrics, logging, and debugging data all either involve gathering the data directly from the distributed instances (not advisable by hand), or publishing it to a central location. This is the downside of a more complex architecture.
Runtime Resilience Centralised systems tend to be more stable unless under duress.

They also exhibit greater Loss Aversion, causing manual intervention and Circumvention, thus leading to further resilience challenges.

Rollback is straightforward, due to the atomic nature of transactions.
Less stable, mainly due to there being more moving parts and the reliance on (sometimes unstable) network connections.

Good distributed systems assume hostility and failure, and cater for it, consequently making them quite robust on failure.
Release Resilience Lower. The Blast Radius is often larger due to less isolation and many more embedded Assumptions. This also implies risk, and higher regression costs. Higher. More isolated to change (lower Blast Radius), due to a looser coupling and lower accumulated Assumptions.
Common team organisation Typically creates centralised (sometimes siloed) teams of specialists (e.g. engineering, system admin, security) - of which Batching is partly to blame. This also implies questions around ownership and accountability.

Individuals in centralised teams tend towards I-shaped people (they're deeply experienced in one area, but not necessarily in others).
Typically promotes a more diverse, cross-functional team (Cross-Functional Team), capable of managing a solution end-to-end (with clear ownership and accountability). This model better suits T-shaped people, who must be more adaptable and sometimes willing to adopt work they have little expertise in.
Manageability Both easier and harder.

Good
- There's only a single executable to contend with; i.e. low cognitive overhead.
- If a client is self-hosting it themselves. Relatively low cognitive overhead.
- There's (generally) fewer instances of them, making them easier to track and deploy.
- It's easier to find the configuration since it's in one place.

Bad
- They can be harder to configure for minor changes, due to the amount of irrelevant config (for your scenario).
- They exhibit a higher aversion to loss (Loss Aversion), and tend to be nursed back to health (sometimes manually), rather than being terminated and replaced.
Both easier and harder. The converse of the centralised points.
Evolvability Lower, due to its slow release cycles, partiality towards Batching, higher risk associated with change (due to Lengthy Release Cycles), and higher Blast Radius.

Upgrade Procrastination is more likely, due to a broad dependence on a specific technology (and version of it); larger Blast Radius.
Good. Components are more independent, typically using Lowest Common Denominator integration techniques (e.g. text-based). There's fewer externalised assumptions about the implementation, making it possible to (rapidly) adapt and change a single component without significantly affecting others.
Security Mixed. The downside of a centralised solution is it accesses a monolithic database, which tends to exhibit a poor Separation of Concerns (it's hard to apply the Principle of Least Privilege), thus leading to a larger Blast Radius, should its data be exfiltrated. The patching of vulnerabilities can also be tough (Upgrade Procrastination), due to many embedded Assumptions and the regression effort (Regression Testing).

On the flip side, containing everything in one place enables us to deploy all security controls around a single perimeter, meaning we can be reasonably confident we've not missed anything.
Mixed. It typically promotes the opposite of the centralised model.

There's a good Separation of Concerns, but it also requires us to distribute our security controls, meaning we could miss some, or find them inadequate.

Depending upon your outlook, software patching can be attractive. For instance, a Microservices architecture enables us to patch software in a piecemeal fashion, permitting parts of a system to be patched, released, and regression tested individually.

However, the practice of Technology Choice per Microservice may also expose us to a much wider range of security vulnerabilities, as different vendors react at different speeds, potentially leaving some parts of a system exposed, whilst others are secure. The counter argument is you're more likely to be keeping pace with upgrades (due to lower Blast Radius), and thus, less likely to be affected by a system-wide vulnerability.
Delivery / deployment Tends to be less dynamic, with a tendency towards Batching, leading to a range of problems, including: Lengthy Release Cycles, lots of Waiting (The Seven Wastes), features waiting in Manufacturing Purgatory, or being leapfrogged by others (Expediting), Circumvention, heavy regression effort, and Waterfall deliveries. It generates Release Resilience risk (see earlier). Rapid delivery (by comparison), mainly due to releasing (multiple) smaller, lower-risk, changes (reduced Batching).
Cloud friendly It depends. Monoliths contain more embedded Assumptions. For instance, you're more heavily tied to a select number of technologies. This is fine when the cloud satisfies all of those assumptions, less so otherwise (due to a higher potential Blast Radius).

These assumptions can limit options to those with the most control, but of the least benefit (e.g. Infrastructure -as-a-Service (IAAS)) [2].
The lower number of embedded Assumptions heightens optionality, thereby increasing possibilities; e.g. IaaS, PaaS, CaaS, Serverless.

FURTHER CONSIDERATIONS