Reliability
RELIABILITY
TTM | ROI | Sellability | Agility | Reputation |
The confidence customers / users have in your product.
Reliability is a measure of stakeholder confidence (Stakeholder Confidence) in a system, product, delivery practice, or business. High reliability - in whatever form it takes - implies high-quality, and that has a positive influence on Sellability and Reputation. Poor reliability implies the opposite.
A customer can rely upon you if:
- You can provide them with the service when they need it.
- That service recovers quickly when it fails.
- You deliver new features or fixes promptly, and when you say you will.
- The service you offer is able to meet their business needs, which may include significant fluctuations in use.
- The quality of the service you offer them is comparable with the price and agreement.
- You treat them fairly in exchanges.
- You deliver on your promise and don't significantly alter it (unless there's a good reason) at the next contract renewal.
Reliability is a transcendent quality, spanning many areas of the software lifecycle, including:
- Runtime reliability. How reliable the software is when in use.
- Output reliability. How reliable are the products and features that you produce?
- Delivery reliability. How reliable is your delivery of value to customers?
- Organisation reliability. How do customers perceive your business?
We'll touch on them next.
RUNTIME (SYSTEMS) RELIABILITY
Runtime reliability is possibly the most familiar form of reliability within the software space. Customers interact with your software product via its runtime platform. It's where your software executes, data is processed, and business transactions occur, making it a critical part of a software service.
Reliability (in this context) is considered the amalgamation of two other qualities: Availability and Resilience. One is about keeping the service running as much as possible. The other is about recovering quickly (ideally seamlessly and autonomically) when part of a service becomes unavailable.
The choice of technology, application architecture, runtime platform, and the use of reliability patterns can all influence runtime reliability.
OUTPUT RELIABILITY (OR QUALITY)
Another aspect of reliability relates to how well your products and features behave when in use. This isn't necessarily their runtime attributes, but their functional accuracy and stability, or their quality.
Customers pay for functionality (ok, they also pay for non-functional aspects, but most assume these as Table Stakes). Therefore, if a feature doesn't: do what it's supposed to (e.g. fails to meet the requirement), is a poor Usability experience, is vulnerable to attack, or creates a stability issue (e.g. it contains more bugs than is deemed acceptable), then - from the customer's perspective - you've failed to do your job. They deem your products unreliable.
POOR USABILITY
Products with poor Usability aren't reliable (they aren't relied upon to deliver the expected experience), and thus, may only be used as a last resort. Need dictates, and in these cases users will find ways to circumvent it with solutions better suited to their needs.
There are many ways to improve quality, and thus reliability (of which I won't be going into great detail here - see the Reliability - Solution Mappings chapter). One way is to introduce better development and test processes (e.g. the many Shift-Left practices). Another is to look at how you organise yourself. For instance, a Cross-Functional Team typically embeds quality (in various guises) into their work early on due to a strong (and diverse) culture of collaboration. Another is to remove unnecessary features and code, thereby reducing the complexity of the overall solution.
RELIABILITY & COMPLEXITY
Reliability contends with complexity. The more complex a system, the harder it is to understand, and the more things that can fail. A system made up of millions of lines of code is more complex than one containing 100k. As is a system of systems (Frankenstein's Monster Systems), compared to focused, cohesive units interacting.
DELIVERY RELIABILITY
Imagine that you've purchased a new car. You've spent a long time investigating which make and model to buy and saving up for it, before you finally take the plunge. The dealer has promised it's delivery today, so you take time off work in anticipation, but… it never turns up. You'd probably be rather annoyed. I certainly would.
The same scenario is true of software. It's a very quick (and effective) way to upset a customer - promising to deliver something, and then failing to do so.
A few things can be done to enhance delivery reliability, including: do it often (employ regular releases), reducing work-in-progress (WIP), promoting Uniformity (to avoid divergences, and nasty surprises), repeatability (e.g. practices and mechanisms), small batches (thus reducing effort and time, subsequently lowering risk), organisational structure (e.g. avoid silos), observability (so you can measure delivery success), and automation.
Fast is good, but predictability is better. If we can predict when a feature will be available and that it will always meet a certain standard, then we can be confident of its result.
ORGANISATION RELIABILITY
Reliability can also be viewed at the top-most level - i.e. how your business is perceived. (Prospective) Customers may be interested in: your track record (have you done this before), whether it's your speciality (i.e. you should be good at it), noteworthy news and social media items (both positive and negative), the word-of-mouth from other customers, your culture (e.g. is it collaborative; how do you collaborate with customers?), transparency (I don't mean overly-transparent, but some level breeds trust), and the strength and type of partnerships. Fundamentally, reliable organisations deliver on what they promise.
POOR PARTNERSHIP CHOICES
Choosing the wrong partner is another way to affect your perceived reliability (yes, yours). If your system depends upon a (unreliable) partner system, or they significantly increase their transactional costs at contract renewal, it will probably impact your customers. They're dealing with you, not the partner, so it falls upon you to resolve it, or take the heat. Even when your customers are aware of where the problem lies (with the partner), it still reflects badly on your judgement.
Use caution during the selection process - it's your brand's Reputation that's at stake.
CHARACTERISTICS & INFLUENCING FACTORS
I've provided some examples of reliability-influencing factors below. However, please note this section looks at general improvements, with less focus on specific mechanisms, so is deliberately vague. See Reliability - Solution Mappings for reliability solutions; alternatively, follow the resilience or availability chapters if you want those specifics.
RELIABILITY & AVAILABILITY
The relationship between Reliability and Availability is close - I view it as one of parent and child.
A software module can only be used if it's available; therefore a system that is unavailable for extended periods is unreliable and unlikely to be used if other options exist.
RELIABILITY & RESILIENCE
The relationship between Reliability and Resilience is also close - I also view it as one of parent and child.
Software that fails regularly, and isn't quickly resolved leads to poor availability. In these circumstances it's obvious that customers (and other stakeholders) will view it sceptically, and deem it unreliable.
RELIABILITY & REUSE
Reliability and Reuse both rely upon stakeholder confidence (Stakeholder Confidence) and influence one another. Software that is known to contain many bugs (or one of high-impact) is deemed unreliable, and therefore is unlikely to be reused by others. Conversely, software that isn't widely reused doesn't (typically) get the same level of verification across a breadth of use cases, and thus is more likely to contain undiscovered bugs or vulnerabilities (albeit, that makes it a less enticing target), again marking it as less reliable.
RELIABILITY & RELEASABILITY
Releasability also touches on Reliability. For instance, failing to deliver on a promise (due to poor delivery mechanisms, or too much work) may cause your business to be perceived as unreliable and not worthy of custom.
RELIABILITY & SCALABILITY
I've described in other chapters how poor Scalability can lead to (sometimes lengthy) outages. I've witnessed this on several occasions. It affects confidence, which also influences Reliability.
PILLARS AFFECTED
SELLABILITY
Generating sales off the back of a notoriously unreliable product is always going to be challenging (see Reputation). Any technical due diligence undertaken by the prospective customer should uncover those Reliability issues, resulting in either a difficult sales process, or its disintegration.
As mentioned, Releasability also touches on Reliability; therefore, consider a slow (or flawed) release mechanism another obstacle towards sales.
REPUTATION
Reliability correlates with confidence (Stakeholder Confidence). Poor reliability creates poor confidence, which often leads to poor Reputation.
There's some evidence of this in the news [1]. Consider some of the major reliability issues affecting the reputation of car manufacturers (e.g. recalls), banks (e.g. outages leading to the inability for customers to access their funds), and airlines (e.g. outages leading to stranded customers). These brands have spent years building up their reputation, only for it to be shattered in the matter of a few hours or days.
Poor Reliability can have a significant impact upon a brand, including:
- Substantial financial repercussions, sometimes into the hundreds of millions (or even billions) of dollars.
- Slashed share prices.
- The publishing of half-year losses.
- The untimely (?) departure of key business executives.
SUMMARY
Reliability is multifarious and influenced by many factors. Runtime, output, delivery, and organisational are all forms of reliability that can influence how customers perceive your business and its products.
At a fundamental level, reliability relates to Stakeholder Confidence and the customer's (typically external) view. Can one party rely upon another to fulfill its obligations? Those that can't, lose customer confidence, creating reputational damage that both drives existing customers away, and reduces the potential for new custom.
Availability and Resilience are the two most obvious technical qualities related to reliability, but there are other influencers, including: Releasability (e.g. delivering on time), Scalability (e.g. successfully dealing with fluctuations in transactional load), Reuse (e.g. more eyes, more stable), and Observability (e.g. seeing potential problems before they occur).
DISTANT RELATIONS
Whilst Observability, Scalability, and Reuse are important in this context, they are not first-class citizens of Reliability (which is about the customer). Rather, I see them as influencing factors on Availability, Resilience, or both.
FURTHER CONSIDERATIONS