DevOps - From the beginning
The term DevOps has been with us for a decade, and the common understanding of what it means has continued to drift. What started as a way of describing the culture where work flows through a team all the way to production where eventually value is delivered, has been distilled to build automation and infrastructure as code. This is all good, but it is not what DevOps was meant to teach us in the first place. But as people we cling on to the tangible, especially engineers fall into this fallacy all the time, we keep looking for concrete tools which we can use and show off.
Christopher Little stated the following, which I find to be spot on:
DevOps isn't about automation, just as astronomy isn't about telescopes
Remember that DevOps was derived from purely non-technical movements and concepts, such as Lean, The Theory of Constraints and the Toyota manufacturing model. Fundamentally it is all about removing waste and improving flow from business requirements to value delivered to customers. This, done through blame-free management and continuous learning, is the key concepts of DevOps - build/deployment automation and continuous delivery tooling isn't fundamental. As with most conceptual ideas and movements, we look for the concrete answers and tools to use. In psychology the term cognitive closure describes the human desire to eliminate ambiguity - this seems to be especially strong in engineers. As with SOA, microservices and most other buzz words, DevOps as a term has slowly been overloaded, and the ideas have been watered out as years have passed since The Phoenix Project made it known to the public. Microsoft eventually put the final nail in the coffin by renaming their "team services suit" to Azure DevOps.
The Three Ways
Nobody should talk about DevOps without having read The Phoenix Project, and we should keep reminding ourself of the story about "Parts Unlimited". Eric teaches "the three ways" of manufacturing which later have become the underpinning concepts of DevOps. I will briefly remind you about these key concepts.
System thinking and flow
The First Way introduces system thinking, emphasizing the importance of flow in software delivery. All development teams uses some kind of delivery pipeline, whether it involves a lot of intermediates or only one person doesn't matter, there will be multiple steps involved in delivering a feature or change to production. The main point behind the first way is that work should flow from left to right, starting from business needs and finally being delivered to production. Any work stuck somewhere between (work in progress/WIP) does not pose any actual value and should be kept to a minimum. Maximizing flow is the sole goal, which will speed up delivery performance. Having said that flow is directly connected to WIP, we can improve it by not starting new work before finishing something else and by keeping work items as small as possible. Keep in mind that something being finished most of the time means actually delivering it to production.
Enforcing this way of working will at some points lead to idle workers, and the temptation to start new work in parallel will arise - doing so means increased WIP which will eventually decrease delivery performance. This is often hard for managers and team leads to understand, but really it is quite logical. Delivering software isn't just write the code, testing it and handing it of for ever. Work started will lead to more unplanned work, maintenance, follow up etc. Starting work because of "spare time", e.g. waiting for an integration to finish etc. without necessarily finishing it, is a counterproductive habit, and idle resources will often be better. We tend to think that more parallelization is more efficient, which often is true when working with computers, but rarely true for humans.
We can also look at this from a more basic engineering perspective. We learn in the university about queuing theory and Little's Law, which describes the relationship between customers in a system, average wait time and arrival rate.
L = λ * W L: number of customers in the system λ: rate of arrival W: wait time in the system
This can be directly translated to the language used in previously:
WIP = Throughput * Leadtime Leadtime = WIP / Throughput
Academically speaking we can conclude that if you want to speed up a process, you don't start more work before finishing something else, i.e. limiting WIP.
Balancing the amount of WIP is necessary to avoid work piling up in queues, but also to avoid underutilization, typically occurring when WIP is significantly lower than the number of workers. The "correct" amount of parallel work is an individual number, varying based on how much pair programming is practiced as well amount of unplanned work (often because of technical debt - a discussion for another day)
Amplify feedback loops
Having broken work down into manageable sized chunks, being delivered to production at a reasonable pace, do not get fooled to believe the work is "done". This is important for all stages of the development lifecycle, after designing a need or requirement, or after implementing a feature, it is desirable to get feedback as soon as possible. Fixing an issue while it is fresh in mind increases the probability of the work actually being delivered with quality. This does make sense, but might be hard to implement because it requires all members of the team to pull in the same direction. Being a requirement analyst, developer or tester, everyone has to be critical of every piece of work. This does not only include bad quality in terms of actual implementation and code, a need or requirement must also be of significance and not interfere with the overall goal of the team or project.
The second way is probably the easiest place to start if you want to introduce DevOps to an organization, and also where tooling and automation plays the biggest role. Pull requests and pair programming is the simplest thing you could do, which definitively will improve the quality delivered. Good tests together with continuous integration is supposed to give quick feedback to developers about faults in their code, without it ever reaching any environments. Good monitoring in production is amplifying the feedback loop from customers, by discovering bugs as early as possible - optimally before the customers notice.
Culture for experimentation and learning
The third way is all about putting tensions into the system, by taking risk and learning from it. In early times software development processes was* designed trying to avoid all risks, often time artificially by having long release cycles being signed of* by people outside the team. What we have learned from all of this is that risk cannot be avoided, an thus we choose to embrace it instead. The third way amplifies the two previous principles discussed. By limiting WIP, striving to increase flow and decreasing the lead time to production, we can take on risk, since the amount of work is always of controllable size. Bugs and faults are not as scary, if we always find them early and are able to fix them quickly. Experimenting and introducing risk forces the team into making changes that handle faults better and better. If you are deploying to production every other day, you do not want that process to be scary and cumbersome - most likely you will want to improve it as much as you can.
In the end DevOps is about how we work and how we deliver value. Focus is on work as something visible, that we control to increase flow through our development process. Small individual chunks of work are easier to assess and the risk of failure is easier to handle. Manageable sized work items also allows us to do more experimentation that help us learn faster thus creating better systems for our customers. By amplifying feedback loops we want to prevent errors from happening again, or at least catch them as early as possible. Many of the principles discussed can be carried out with the help of tooling - kanban boards, CI/CD, logging and monitoring are all important tools used by most organizations embracing DevOps. But to truly harvest the goods from all of this I believe that teams have to understand why we do all of this. I still see teams working on several different projects at the same time and wondering why they cannot complete any of them. Managers still try to control risk by increasing batch sizes and by introducing new approval steps. As spoken many times before: DevOps is about culture and not tooling, everybody has to pull in the same direction for it to be beneficial.