How We Got Here

<< Previous | Table of Contents | Next >>


HOW WE GOT HERE

The technology businesses of today are - in the main - very different to those of only a decade or so ago. Our ways of working, interactions with our customers, methods for constructing and delivering software, tolerance to (rapid) change, communication channels, and our almost incessant drive towards modernity - all quite different.

It's exciting, and largely why I chose to pursue such a career. However, it also presents us with a problem. The pace at which technology moves at has three common effects:

  1. It leaves some behind. It's easy to be pigeonholed, or complacent, in the face of rapid change, and thus fall behind.
  2. It leaves some with an underappreciation of real problems that can't be solved with “greenfield strategy”.
  3. We forget to learn the lessons of the past. I'll focus on this concern here.

Without a decent grounding on where we have come from, we can neither fully appreciate the challenges that we've had to overcome, nor the lessons it has taught us. As Churchill said: “The longer you look back, the farther you can look forward.” Looking backwards enables us to glean important information around the technologies, platforms, practices, and movements that have shaped our industry. But critically, it also helps us to perceive what might be next for us.

The past though doesn't feel like something my colleagues and I often discuss or reflect upon. I suspect it may be the same for the reader. We're too busy with the here-and-now (or the future) to consider how (and why) we ended up here (the purpose of this chapter). For example, how many of us pause to reflect upon the stimuli for the Agile methodology (it's well documented, of course), or a microservices architecture? We're too busy using them. How about TDD, BDD, Continuous practices, or the Cloud? And why do we now seem to be more accepting of failure (systems or ideas) than ever before?

Humour me then, for a short while, and let's return to an earlier time, shall we? You'll allow me to make some sweeping generalisations I hope? [1]

Picture the scene. It's the early 2000s. The internet is a nascent but burgeoning presence. The mainstream availability of smartphones, social media platforms, video conferencing, messaging tools, and the cloud is still to come.

You're a developer who's recently joined a (SME) company creating new internet-based services. The work day is a standard 9-5, top-and-tailed by your daily commute to the office building where you and your colleagues work. Internally, the business is organised into business functions, each encapsulating the key lifecycle responsibilities of: product, development, testing, and operational departments. Your desk is one of many individual partitions - the height of the partition walls being sufficient to ensure you can't see your neighbours when you are seated. Except for meetings, you work quietly - in isolation - in your alcove.

Whilst communication does happen, it's typically between aligned individuals in the same business function. Developers talk with other developers, testers talk with other testers. There's little cross-talk, or pollination of ideas, across these boundaries.

As you settle into your new company, you begin to sense a constant friction bubbling below the surface, most clearly seen near their quarterly software release. This release in particular though seems particularly fraught, with lots of raised voices and people working long hours.

On one visit to the canteen you meet George - a seasoned software veteran.

“Hi George; everything ok?”
“Well,“ he says, “we've gone and got ourselves into another predicament.” He rolls his eyes and grins mischievously, having seen this many times. “The developers didn't have enough time to build all of the functionality, so they've had to pass the software onto the testers late and unfinished. The testing department is protesting that they haven't been given sufficient time to complete their testing, which has been further exacerbated by the need to send the software back for fixing - at least twice by my count. And of course when they do get a new build from us, they aren't confident of which parts have changed, so they're forced to regression test everything. They've got a fair point, don't you think?” You nod in agreement; it doesn't sound like an enviable position. “Stacked upon that, our testing team has to constantly prove their value to the business... Now, you and I know they're a valuable asset, but many of our execs don't appreciate what they give us. The execs are suspicious of a testing team that doesn't find bugs, therefore the team is incentivised to find bugs. This sometimes leads them to expend too much time on what we'd call 'edge cases' and to exaggerate the importance of some of the problems they find.”

George leans casually back in his seat, takes a sip of coffee, and proceeds. “Now, the operations team objects because (a) they've been given insufficient (handover) information to support it in production, (b) what it is has never really been explained to them, (c) they haven't been involved in its construction in any sense, and (d) more concerning; they've just heard from the testers about the poor build quality. Why should they support something they think is riddled with faults?” Again, you nod in agreement.

“And then there's the usual friction of two competing forces to contend with too…” George opines.
“Friction?” you prompt.
“Sure. Between development (those who build) and operations (those who run). They both desire what's best for the customer, but that view is quite subjective and often differs per role. For instance, Development needs to promote new features (i.e. change) in order to drive the business forward. On the other hand, the Operations team promotes the importance of service stability. After all, they're paid to keep services running, and don't like failures because: (a) the customer isn't getting the desired level of service, (b) a failure makes them look bad (regardless of the cause), and (c) they don't like being called out at 4am to fix a broken system. Funny that, eh? You can see why they're so cagey over our build quality now?” George takes a final sip and dumps his empty coffee cup in the recycling bin.
“Taken to its logical conclusion, that thinking would limit (or even prevent) all change. And thus, the friction.”

CHANGE & FEAR

Change, as they say, is hard. Any change can have unanticipated consequences. Therefore change carries risk, and some fear. One of the best ways to make something robust is to never change it. Of course this isn't particularly practical, and leads to all sorts of other unsavoury consequences, including our inability to evolve.

“And finally, whilst all this is playing out,” concludes George, “we have the development team working flat out to build the new features for our impending deadline. We've got three major features that require merging, with a small team focused solely on that.” George sighs heavily. “Meanwhile the execs are determined to release this software on time, come hell or highwater. Everyone involved knows they'll need to put in a lot of extra hours to get the release out, which is why you should keep your head down. Catch you later!”

So, let's compare the experience of yesteryear to that of today. Firstly, when we compare the quantity of change, it's should be noted that I'm not suggesting that there were no changes, simply that by modern standards they were (comparatively) fewer, and typically far larger, with a typical Release Cadence measured in quarters, not weekly, and certainly not daily. This was - in large part - because there were fewer demanding external stimuli (e.g. customer expectations).

Remember, this was in the burgeoning days of the internet, before information became widely available, before devices became so highly-integrated into almost every aspect of our daily lives, and before e-commerce, social media, mass globalisation, or the Cloud. With that hindsight it's easy to see why Monoliths, Lengthy Release Cycles, (intentionally) siloed teams (Silos), and the Waterfall Methodology were the norm. Additionally, with all of this happening, one might argue that it was easier to emphasise stability over change, and thus create a strong bias towards Availability over Resilience.

None of this though sat particularly well with consumer expectations. The prevailing wind was one of steadily increasing change (Ravenous Consumption), and businesses were (and will continue to be) forced to adapt.

FEEDING THE FIRE OF CHANGE

We are in fact the victim of our own success - by showing that we could adapt, and move faster, we have further fed the fire of rapid consumption.

Increasingly, customers rejected late - or long - product release cycles in favour of faster change. To meet such a demand required (a series of) alternative approaches and ideas. Agile, DevOps, BDD, TDD, Test-First, Continuous Practices, Microservices, and the Cloud are all ideas that grew from our need to deliver more rapid change, all driven from an increase in customer demand, or a need to adapt to change more readily.

STANDING ON THE SHOULDERS OF GIANTS

None of this happened in one fell swoop. For example, when John Allspaw and Paul Hammond described their now famous “10 Deploys per Day: Dev and Ops Cooperation at Flickr” [2], it was only one - albeit significant- milestone in an age very different from before. Ideas generate more ideas. Waterfall became Agile became DevOps became DevSecOps becomes ??? Scrum isn't fast enough for you? How about Kanban? IaaS isn't fast enough for you? Why not try a PaaS, a CaaS, or even Serverless?

This also raises the question of failure. In order to satisfy a significant increase in the rate of change, both businesses and customers have had to reassess their tolerance towards failure. I made a point earlier on the competing forces of change vs stability. Without compensating controls (e.g. continuous practices), rapid change increases the risk of something going wrong; stability (i.e. a low level of change) on the other hand reduces that risk, but it isn't particularly useful in most modern consumer settings.

RISK OF FAILURE

In those days, most of us couldn't move at a rapid pace (even if we had wanted to) as the risk of failure was too high and there were few established approaches to support.

If rapid change increases the risk of failure, then it also creates problems for high Availability. One such tonic is to increase the prominence of Resilience (a system's ability to cope well with failure and reassert itself) through the introduction of compensating controls. This makes sense when you consider that Resilience and Availability are siblings, and both are children to Reliability; i.e. they aren't mutually exclusive (I'm thinking of the MTTR recovery aspect in particular), and practices to enhance resilience can also improve availability.

Consider for instance Blast Radius, Blue-Green Deployments, or Canary Releases - they exist to counter failure, but they may also have a positive influence on availability. Microservices and other Event-Driven Architectures are loosely-coupled and fiercely independent, meaning that even if parts of a system fail, others continue to function. Additionally, an emphasis on a Declarative Model enables us to employ platforms to quickly identify failing components, and either revive them or quickly replace them.

SUMMARY

Many aspects of how we deliver, deploy, and manage software are quite different to the perceived norm of one or two decades ago. In the main, this has been shaped by the changing attitudes and expectations of our consumers.

Ravenous consumption (Ravenous Consumption) created a need for change (and the ability to adapt to it) faster, which unless compensated for carries an inherent risk. It's a problem our industry has mainly overcome through many iterations of innovation that have introduced compensating factors to mitigate change risk (they compensate, but they don't neutralise) across the wide spectrum, including: (a) how we work (e.g. Agile), (b) how we deliver software (e.g. continuous practices), (c) our perception of failure (a greater emphasis not on trying to prevent failures, but expecting and embracing them), (d) how we run and manage our software (e.g. using distributed architectures on the cloud to build resilience into our systems).

FURTHER CONSIDERATIONS