People/Organisations, Process, Tools: 2017

Monday, 23 January 2017

DevOps And IT Operations

IT’s Holy Grail

DevOps is 2016’s tech holy grail – unified development and operations, both working to deliver what the business needs, quickly, reliably, and adaptably. Done well, DevOps transforms the way organisations work; it helps break down barriers between tech teams, and between technology and the rest of the business. Good DevOps is the antidote to increasing segmentation and specialisation within companies. With the promised benefits, is it any wonder that senior managers are pushing for it in organisations spanning all sizes and industries?

The reality of DevOps implementation is different. DevOps changes the way organisations run, and the way the people in them think. It redefines what’s possible and desirable. But it comes at a price. Change is hard. And some people have further to go, more changes to go through, and more to lose, than others.

The changes DevOps require in staff throughout the organisation are hard for some people to wrap their heads around. Within development, it can be difficult to think about infrastructure, security, and stability. In the business, it can be difficult to get used to the idea of working side-by-side with technical staff. For operations, the stakes are higher.

DevOps and IT Operations

There is a team within all organisations that have learned to define their value based on up-time, stability, and security; a team that have been repeatedly told that their value is in how well the systems run; a team which is invisible, until things go wrong, at which point people start shouting; the IT operations team. DevOps is a godsend for the operations team. It’s a new way for the organisation to think, where the concerns that have been given to operations, and which they have been fighting to protect for years, are shared with the rest of the company, and where out of hours support is performed by the people who wrote the system. This is a great relief. It is also terrifying.

For so long, operations teams have defined their worth by these metrics, and now they’re being taken away. What’s left? What will operations teams do, now? When developers do on-call, fight fires, and build-in stability, scalability, and security, what’s left for operations? When infrastructure is defined in code, who needs operations? Is it time to update the CV and move on, or is there a role for operations once developers have taken everything over?

People Resist Change for Valid Reasons

From where a traditional system administrator sits, it’s sometimes impossible to see what the world will look like when concerns, security, uptime, and support are shared responsibilities. This is a different experience than developers or testers moving to DevOps. Developers are being made responsible for more of the system, which enhances job security. No developer looks at DevOps and sees job loss. The end result may be collaboration, but the path that individuals in different departments take are very different. System Administrators, however, work to implement DevOps with the perception that they may be unemployed once they finish.

Having been through a number of DevOps implementations, we can say that these concerns come up regularly. DevOps is a whole new way of doing things, and requires learning how to think differently, how to use different tools, and how to work closely with people that have previously had nothing to do with operations. There’s a lot to learn, and fear makes learning much harder. If the goal is effective learning, fear must be eliminated (or at least significantly reduced).

Fear, Uncertainty, and Doubt

Much of the resistance to DevOps comes from fear and uncertainty around the outcomes, and doubt about the personal benefits. The first step in reducing this fear actually starts before implementing DevOps, or talking about introducing it. The first step is communication.

Before starting the transformation, it’s important to understand what people would spend their time doing if they didn’t spend hours firefighting, preparing for releases, telling people no, or tracking down bugs in production systems. Different people in the team will have different answers, but it’s important that those answers are clarified, and dug into, until staff see how moving past their current responsibilities lets them contribute directly to business goals, and frees up time for learning new things, giving them a compelling reason to assist in any change effort.

Once new DevOps practices start being introduced, it’s important to link individual’s motivations and the impact of implementing new ways of working together. In a well-considered implementation, business drivers, individual motivations, and new methods of behaviour will align, allowing people to work, for their own reasons, toward the same goals. This is not easy.

“If you aren’t making mistakes, you aren’t taking enough risks.” Debbie Millman

Operations staff, who have spent their careers learning how IT systems work, how they fail, how to identify problems, and how to build scalable, stable, secure systems, are invaluable to people attempting to do it for the first time. They can shorten the length of time it takes for developers to understand the infrastructure by orders of magnitude. Whether infrastructure is in code or manually installed, people who understand those systems, the impacts of bugs, and the impact of updates, are needed. And for a long time, after DevOps starts, operations will be the people with that knowledge. While they’re busy collaborating with developers and business staff to share that knowledge, they’ll also be learning from them. They’ll be picking up knowledge they’ve never had access to, before – why the business wants particular things done, how the code is going to accomplish it, and the value of the infrastructure that will enable it. Operations staff have a lot to contribute to DevOps, in the short- and long-term. They just need to see it.

Fine Print

Even the most effective transformation is going to come up against problems or obstacles. Things are going to fail. Mistakes are going to be made. More than anything, it must be clearly understood that the migration from the old way of working to the new is an evolutionary process, where there will be setbacks, and lessons learned, before the final goal is reached. And even the concept of ‘final goal’ is fraught, as it implies that DevOps is a journey with an end, with a set of artifacts that indicate DevOps success. The truth is more nebulous.

The acceptability of failure can be a difficult concept for people to accept, even when it makes perfect sense, especially if they work in operations. Which is worse, 5 hours of downtime 1/yr, or 10 minutes of downtime 1/month? In many companies, 10 minutes of downtime would be cause to reduce the number of deployments, to reduce the risk of further outages, even if it increased the likelihood of a longer outage. The truth is that more frequent outages, that last less time, are a sign of better resilience, and better systems, than rarer, more severe outages. During the learning process, mistakes will happen that increase the frequency of outages, and this is OK. Over the long run, the reduction in total downtime will more than make up for the increase in outage frequency.

It’s the Journey That Matters

DevOps is the path. It is the way of thinking, of approaching problems, of seeing problems in a unified way, that creates better results for internal staff and customers. As such, there is no final, right implementation. There are no DevOps artifacts. There are no DevOps tools. Each artifact, each tool, can be used to enable or inhibit DevOps. Each company will apply DevOps ideals in ways that suit their culture, and their customers. They’ll then reflect on what they’ve learned about themselves and their customers, and they’ll iterate. Again. And again. And again. DevOps is the journey.

I first wrote this article for the OpenCredo blog

Monday, 16 January 2017

The Risk of DevOps Tools: Automated Failure

The Promise of DevOps Tools

In the rush to embrace DevOps, many organisations seek out tools to help them achieve DevOps nirvana; the magical tools that will unify Development and Operations, stop the infighting, and ensure collaboration. This search for tools to solve problems exists in many domains, but seems particularly prevalent in IT (it may be real, or a reflection of my exposure to IT). The temptation to embrace new tools as a panacea is high, because the problems in IT seem so pervasive and persistent. I have been in many rooms, and at many conferences, where vendors are selling perfect software. No matter the problem, there’s a piece of software you can buy that solves it. I’ve heard hundreds of similar experiences over more than a decade in IT. After all that time I can safely say, there’s no silver bullet. Solving problems is hard. Understanding their causes is harder.

Automated Failure

Many consulting conversations start with a customer asking for a solution to a particular problem. Good consulting is understanding that while the customer wants this problem solved, they may not have the tools to identify the underlying cause. Sometimes a problem is pretty straight-forward (e.g. “We’re going live in 2 weeks and our system doesn’t perform as well as we need it to.”). Other times, however, there are unseen and unspoken causes that lead to the request (e.g. “Automate our deployment pipeline” is sometimes an attempt to address Development, Operations, and the Business not communicating effectively). In order for the problem to go away over the long term, the underlying problem needs to be addressed – brought into the open, explored, and resolved. This can be difficult, as some customers don’t want to talk about anything other than the superficial engagement. Other times, it can be difficult because responsibility for the problem lies, at least in part, outside the sphere of influence of the customer. It would be remiss, however, to not raise any deeper issues discovered during exploration phases, if they impact on the problem being solved.Customers ask for automation, continuous delivery, and DevOps practices. But without addressing the underlying problems, we make poor use of tools, or apply them to the wrong problems, which ultimately codifies and automates failure. Most of the challenges aren’t in poor tools, but in systemic challenges. Addressing these issues is a key component of organisational success.

Systemic issues

Deming told us that 95% of individual performance is determined by the system (e.g. Red Bead Experiment). Most of what we do, most of what the people we work with do, and most of the results, are determined by how the system has been built and maintained. And management creates the system. Both management and front-line staff are trapped by the system, but only management has the ability to change it significantly.

We perpetuate the systems in which we find ourselves. Prevailing management theory (i.e. Taylorism and its descendents) focuses on optimizing individuals – individual actors, individual teams/silos. The assumption is that by getting individuals to do their best, the entire system does its best. This is demonstrably false (see The Goal for clear demonstrations of when and how this fails).

Changemakers

DevOps, and Agile before it, attempts to change this narrative. Instead of talking about using processes to get the most from individuals, it talks about flow. Instead of units of work, it talks about customers and collaboration. It puts focus not on optimising the things each person does best, but on how information (in the form of code, much of the time) flows through the system. A good DevOps implementation will change how teams work. It will cause realignment that helps Operations and Development work together, with the result being shared responsibility and ownership of performance. This works brilliantly inside a team, or small group of teams. And it often finds a champion in management, who can help perpetuate it throughout the rest of the technical teams.

What next?

Now that information is flowing well, it’s an excellent time to introduce tools that complement the process. The best part is that with Development, Operations, and the rest of the business communicating effectively, it’s easier to see where automation and speed will help.

Each organisation is different, but tools that can be used to visualise work (physical cardwalls, Trello, Jira) are very good for reinforcing collaboration. Predictable deployments are more likely when the deployment pipeline and testing have been automated. And communication usually remains good when teams that work on the same functionality all sit together in one place.

How you fit these ideas and processes into your organisation depends on how your organisation works. Before bringing them in, make sure it functions the way you want it to.

I first wrote this article for the OpenCredo blog

Monday, 9 January 2017

Shadow IT

The Rise of Shadow IT

Almost since the release of AWS (as the first truly successful Infrastructure As A Service platform), shadow IT* has been a problem for many companies. As soon as it became easy to create new servers, needing nothing more than a credit card, development departments that had been constrained by IT Operations teams started using it. They took the ability to create new infrastructure and used it to deliver what they were requested to. In some companies, the inevitable rapidly became accepted as the way to do things, and both development and IT operations worked together to figure out how to collaborate on building systems that satisfied development’s desire for change, and operations desire for stability. Outsourcing infrastructure, and all it implied, gave rise to Devops – the unification of business needs, developer delivery, and operational capacity – but it also gave rise to something else, in companies where the operations teams weren’t quite as quick to move – Shadow IT.

Developers, long constrained in what they could request, or in the time it took to deliver on those requests, took to IAAS life it was lifesaving medicine. No more constraints! No more restrictions! Nobody telling them no! Development managers, who had been under pressure from the business to deliver no longer had to be trapped by people telling them they couldn’t. A balance which had been in place for many years was disrupted nearly overnight. Developers knew it immediately. Some IT departments still don’t know it. Any time the business tells developers to deliver something, and IT Operations tells them they can’t, shadow IT comes into play. Not because of technology, but because of people – there’s a dissonance caused by being caught between two opposing forces. When a way out presents itself, it will usually be taken. That’s human nature. Understanding that is essential to understanding that the change that has come to IT isn’t going away; it is, in fact, going to continue to change.

How do you know if you have Shadow IT?

As the CIO (or IT Director, IT Operations Manager, etc), if the business and IT are in conflict, with IT Operations saying no, then you either already have Shadow IT or will have it soon. Someone, somewhere in your development team, is considering how to use cloud technology to make deployments faster, make infrastructure more stable, and reduce their own headaches.

Why is Shadow IT a problem?

From the perspective of the business, shadow IT is both a blessing and a curse. It’s a blessing because, as mentioned above, it removes the constraints that have stopped or inhibited developers from delivering business requirements. Removing these fetters helps increase delivery speed, and capacity. The downside is twofold: 1) Shadow IT is hard to track. If IT costs belong to IT Operations (as is frequently the case), shadow IT is spend against their budget that they can’t account for. This may result in budget overruns that are difficult to explain or account for. It may also result in IT Operations purchasing equipment that never gets used, as the team is bypassed. 2) IT Operations are the part of the organisation most familiar with risks of running equipment in a live environment. They are frequently responsible for security and scalability. While some of those risks go away with cloud infrastructure (e.g. auto-scaling), not everything does, and new problems are introduced (e.g. how to handle logs for services that might disappear at any time?). These are questions that IT Operations are used to asking, and answering. While there’s nothing inherently less capable with developers, they are used to thinking of different questions, in different domains, and may not have the domain knowledge required to think of the questions themselves. For these reasons, it’s important to address Shadow IT early, and figure out how to integrate external IT systems with internal IT and business requirements.

Eliminating Shadow IT

Use of cloud technologies isn’t going away. That genie is out of the bottle. So if there’s no way to eliminate the IT, what’s the solution? Eliminate the shadow.

Since it is only going to get easier to create customised infrastructure as companies gain experience with providing what customers want, the only feasible solution is to adapt. Bring shadow IT out of the shadows. Not by forbidding it, but by embracing it. Take time to understand what it is that shadow IT provides that internal IT doesn’t. Then make a decision – is it worth spending time and energy adapting internal IT systems to provide those same service, or would that time be better spent adapting internal IT people to the new way of working. Neither is easy, but the latter, while difficult, is far more likely to succeed – people will use what’s easy and gives them what they want, after all; once they’ve settled on external IT providers, it would be very hard to get them to go back to using internal systems.

The first step in adapting IT people to using external systems is to help them understand that their role in the organisation is changing. They are no longer gatekeepers, responsible for saying now. Instead, their role is to help enable the use of the best tool for the job, bringing their specialised knowledge of systems to bear on cloud-based systems, figuring out how to adapt inherently unstable cloud infrastructure to software and customer needs for stability (hint: automated recovery and scaling). One of the biggest challenges that comes with thinking this way is that a lot of Sys Admins are used to spending their time on the command line, and creating repeatable, redundant, scalable infrastructure requires moving toward infrastructure as code (terraform, ansible, mesos, etc).

As with most new ways of working, the difficulty isn’t with the new technology itself, but in people adjusting to a new way of working that in many cases challenges their deep assumptions about what they do, and how they provide value. This change takes time, and may take new people. Don’t delay starting, however – the longer you wait to get started, the more ground there will be to make up, and the harder those same people will find it.

*Shadow IT is IT spend that isn’t sanctioned by IT Operations, and so isn’t well tracked, or admitted to.

I first wrote this article for the OpenCredo blog

Tuesday, 3 January 2017

DevOps Is Transformative

The Pre-Devops Environment

DevOps is transformative. This (hopefully) won’t be true forever, but it is for now. While the modern management practices of separating development and operations (and to a lesser extent, everyone else) prevail, the tearing down of the walls that separate them will remain transformative. In company after company, management and front-line staff are coming to realise that keeping functions separate, which are inherently interdependent, is a model for blame, shifted responsibility, and acrimony. It’s easy to divvy-up a company up based on function. To many people, it seems the most logical way to do it. Ops does operations, Dev does development, Marketing markets, etc. It seems much harder to do it any other way. So why do it?

Separation by function isn’t the best way to break up a company. People with similar areas of expertise work together and learn together, but they don’t understand the company, and nobody understands them. Functional separation leads to pedestals for some, and dismissal for others. In well run development and operations teams, people will tell you they’re pretty happy. They’ll also frequently tell you they wish they knew more about what was going on. This isn’t caused by poor communication, but by an inherent shortcoming in functional separation – nobody gets to feel like they own anything. Everybody is just a cog in a machine. For some people, that’s enough, but most people need more. Most people need to feel some control, some pursuit of mastery, and some purpose or sense of belonging.

Devops Is About Communication

DevOps isn’t about technology. Technology is a part of it, but it’s not the root. DevOps has been, since its inception, a process by which Development and Operations (and the rest of the organisation) can learn to speak the same language and work together. Historically, Development has been responsible for delivering change, and Operations has been responsible for maintaining stability. To many people, these are fundamentally opposed remits – change creates the risk of instability, and stability prevents change. One of the things that DevOps does is it shares out responsibility for both change and stability to both groups. Developers become more responsible for stability, and operations become more responsible for facilitating change.

The most popular implementation of DevOps is in Operations staff delivering tools that make delivery happen faster – the move toward continuous delivery. It would be a shame, however, if people thought that continuous delivery is all there is to DevOps. When some of Development’s concerns about delivery and change leak into operations, concerns with stability leak back. This is inevitable, as rapid change requires stable releases. So with continuous delivery often comes a push toward testing first, automated testing, and ensuring that both positive and negative tests are well represented in the code (this change can equally start with QA, and spread from there). Without this improvement in automated testing, continuous delivery another name for shipping bad code.

As information starts to leak in both directions, communication improves, common ground increases, and a virtuous circle is created. Improvements in communications leads to more collaboration, which can lead to changing how teams work together. In some cases, it can lead to complete automation of infrastructure deployment (i.e. infrastructure as code), configuration management (e.g. configuration as code), or at the logical extreme, developers being paged when systems go down at night (i.e. developers being responsible for stability). Improved collaboration encourages more regular communication, which in turn suggests changing team structures. If Development and Operations staff work better together than they do when they’re separated, it makes sense to put them in teams together. But how are those teams structured? It’s at this point that the rest of the company needs to get involved.

Delivering What Customers Want

Continuous delivery can only take an organisation so far. It’s part of a transformative process, part of a way to improve things, part of DevOps, but not all of it. Reaching its full potential requires assistance. It requires organisational change, which is where things get difficult. Collaboration between Development and Operations can be done by two people who want to work more closely together, two managers who want their teams to work better together, or one more senior manager who sees things and wants to improve them. Progress beyond the purely technical arena requires senior management involvement, because the next step is to look at how customer needs move through the company, and how to improve it, which usually falls outside the sphere of control for technical staff.

Customer needs usually come into the company via sales, account managers, or customer service. These are people who are intimately familiar with customer needs, and what they’re willing to pay for (as well as what they hate). From there, someone prioritises them into things to do now, later, and not at all. And then someone works on delivering, testing, deploying, and supporting those needs. These people all help needs flow through the company, and therefore help money flow into the company. These people don’t have to be separated, each delivering part of the pie, without knowing what the benefit of their work is. Instead, they can work together, collaborating and communicating on the best way to meet demand, satisfy requirements, and deliver, with a minimum of fuss. By organising teams in this way, so that customers with a need are at one end, and satisfied customers are at the other, they are transformed from groups that are looking out for their own small part of an unknowable empire, to a group that’s working together to deliver something worthwhile.

Never underestimate the transformative power of giving people a purpose they can wrap their heads and hands around, of building teams they can share that purpose with, and of tearing down walls that stop people from talking to each other. Don’t be mislead, it isn’t easy. Compensation will have to change, management will have to change, and expectations will have to change. And when it’s done, nobody will want to go back to the way it was before.

I first wrote this article for the OpenCredo blog