There is a huge amount of use and abuse of the term DevOps. This probably means that it is very near the top of the hype cycle so expect a lot voices in the next few months or year ranting about how much DevOps sucks. This doesn’t have to be so, but let’s rewind and reacquaint ourselves with the whole concept.

Definitions

Let’s go to the fount of unreliable knowledge (Wikipedia) and see what its definition is.

DevOps (a clipped compound of “software DEVelopment” and “information technology OPerationS”) is a term used to refer to a set of practices that emphasize the collaboration and communication of both software developers and information technology (IT) professionals while automating the process of software delivery and infrastructure changes.

That is two things in a single run on sentence joined. These are:

a set of practices that emphasize the collaboration and communication of both software developers and information technology (IT) professionals

and

automating the process of software delivery and infrastructure changes

I’m not totally thrilled with the first one, the second one is fine. The “collaboration and communication” is critically important when the roles of developers and IT are not fulfilled by the same people. DevOps is also meaningful when roles are combined, so I’m looking for a definition that doesn’t require separation. I would phrase the first one like so:

A set of practices that emphasize operations as a system feature throughout the life of the project.

That makes me feel a little more warm and fuzzy. It separates the team structure from the practices. Let’s use some terms so we can discuss those long sentences a little more easily. The two parts we pulled from the Wikipedia definition

  1. Operations as a Feature - OaaF
  2. Infrastructure as Code - IaC

Operations as a Feature

What do I mean by treating operations as a feature? I mean that like every feature, it must follow the application’s life cycle. Instead of creating a development environment and having the feature developers “do whatever” to get code up into that environment, you start from day one with a well-defined approach to automation and use that approach to deploy to dev, to deploy to QA, to deploy to staging and ultimately to release the application^[This sounds heavily slanted towards server based applications, but the principles apply to everything. For instance, you can automate deploying VMs that then get used to test desktop applications automatically and also provision VMs for manual testers.]. You probably won’t get it all in one fell swoop, but you constantly improve it just like any other feature.

“Automate it, dummy!” isn’t what I’m talking about here (that is covered by IaC). What I’m talking about is treating the whole process of automating your deployments, migrations, and roll outs as first class citizens. You’re going to establish the requirements, document them just like any other requirement, and then plan them and schedule them just like any other requirement. This will begin when software development begins (often, even before actual dev begins), and won’t end until the app is riding off into the sunset.

I pushed back on the question of team structure in my definitions, but to treat Operations as a Feature, you do need to have dedicated people working on this throughout the life of the project. This may not be all they work on, but it’s not a “throw it over the fence” approach. Feature Developers and Operations Developers must be collaborating, even in the upfront design. Every iteration includes design, development and testing of operations features. They are planned, estimated, and committed to just like any other feature. Most importantly, you’re embracing the Continuous Integration principle and never, ever, treating it as “We’ll figure it out later”. If you’re not using your operation automation to deploy from day one, then you’re not treating it as a feature.

Infrastructure as Code

Infrastructure as Code is an established term, so I won’t try to redefine it. Managing the automation of DevOps embraces the IaC discipline, but IaC is not just DevOps. It includes all of infrastructure, not just for a given application. For instance, a new line of business application won’t perform upgrades to an organization’s firewalls or migrate the organization from self-hosted Exchange to Office 365. Both activities can still be managed using IaC principles with one major constraint. If a given piece of infrastructure doesn’t have an ability to be managed by a script or API^[Application Programming Interface], then it’s going to be awfully hard to manage operations for that item as code artifacts. It’s awfully hard to manage pulling network cables with code^[Until we have a robot for that!].

It sounds costly when you think about it by itself. Automating takes time to get it right and the automation itself needs testing. Think of this: Where do you want your pain? Do you want your pain during the dev>test>stage phases or do you want your pain during the release phase? This is the difference between having 2 people working together on release night or having the entire team on a conference call. It also usually means the difference between a small outage (or nonexistent!) window and being offline for extended periods. I have consistently found that by having these practices well defined, you also have established a very solid foundation for disaster recovery.

Criteria for DevOps

By the time you are successfully using DevOps in your projects, you will be able to answer yes to all of these:

  1. Our operations requirements are documented as well as any other system requirement.
  2. There are clear disaster recovery and availability objectives.
  3. I can deploy the app via automation.
  4. I can migrate data via automation.
  5. I can roll back deployments and migrations via automation.
  6. I can monitor the health of each service and server (via probe or other) from a central facility.
  7. Application logs and auditing are centralized (directly or via aggregation).
  8. I test disaster recovery and availability procedures frequently against production scale systems1.
  9. The above follow Continuous Integration(CI) practices and processes, including:
    • All code commited to a source control repository (e.g. git, Subversion, TFS SCC).
    • Frequent builds (preferably on commit) based on code committed to that repository.
    • Developers can execute “the build” in their own environment.
    • The build artifacts can be used to build a shared environment based on any given successful build.
  10. The above are executed and tested throughout the application life cycle (dev>qa>stage>release). That is, you deploy the application to all environments via the automation scripts. Manual deployment does not satisfy the criteria to be able to move to the next environment in the life cycle.

What is not DevOps

I think of IaC separately from DevOps, but some people will run them together. Most of our lives are full of gray so I can live with that. However, I’ve seen a lot of use of the term “DevOps” from people who mean “business as usual.” This happens a lot in job postings, but it happens elsewhere too.

  • DevOps does not mean a developer who also does technical support.
  • DevOps is not a portmanteau of “Developers/Operators”. That is, just because you’ve got a developer who deploys the application, that doesn’t mean you’re doing “DevOps” unless you meet the criteria above.
  • DevOps (or IaC) is not “business as usual” in IT operations. Lots of IT departments script various operations, but until it’s meeting the criteria above around testing and CI, it’s not DevOps or IaC.

Further Reading

http://www.somic.org/2010/03/02/the-rise-of-devops/ https://blogs.the451group.com/opensource/2010/03/03/devops-mixing-dev-ops-agile-cloud-open-source-and-business/

  1. Bonus points if you actually do this against production systems with confidence. See Netflix’s Chaos Monkey and related technologies.