DevOps - Introduction to Streamlined Deployments
As stated in the previous article, DevOps is the answer to the question, "what do I have to change in my process to be able to and want to deploy to production 10 times a day with zero downtime?" The number 10 was not just some random number picked from a hat. We had a random number generator do that for us. In all seriousness, that was chosen because that means in a typical eight-hour day, that means going to production every 45 or so minutes.
Framing that definition in minutes really throws it into a new perspective. This means the deployment process has to scale really well, and any manual steps need to be removed or become extremely stream-lined. Most people don't mind doing multiple manual steps if the deployment to production only happens once every couple of weeks. Hell, it is still easy to do a manual step or two if deployments only happen once or twice a day. But every 45 minutes? That would get old really fast.
At the time of this writing, it is not possible to deploy code Farm Credit Services of America is not ready to deploy to production every 45 minutes. The whole process made a quantum leap over the past year to get us a lot closer than before. The goal of this article is to explain where we were, where we are, and what still needs to be changed.
This article is not a step by step guide on how to get started in DevOps. Nor will it cover what changes need to be made. I don't know your company or its deployment process. I can only speak from my experiences. I am going to venture a guess and assume the deployment process at your company is different than Farm Credit Services of America. You have to start from somewhere. My recommendation is to get a cross-functional team in a room and discuss the current state of your deployment process, identify the manual steps, and discuss how you can streamline or automate those steps. The cross-functional team should include one or two web admins, DBAs, developers, and a leader sponsor.
Initial State of Deployments
When I joined Farm Credit Services of America back in February 2013 this was the process to get a change to production.
- A code change is made locally, database script is generated and manually deployed to development / test environment.
- That code change is checked into the development branch. The code is automatically built and deployed to the development environment.
- Manual verification of the development environment.
- The code changed is merged into a main or master branch. The code is automatically rebuilt deployed to the test environment.
- Manual verification of the test environment. Artifacts are automatically copied to a shared folder
- Repeat steps 1-5 1-n number of times until code is in a releasable state.
- Manually submit a form requesting a deployment to pre-production. Included in the form is an area which points to the artifacts on the shared folder. Code is manually deployed to pre-production by web admins
- At same time generate a delta SQL script to push any changes from test to pre-production. Submit script to DBAs.
- Manual verification of the pre-production environment.
- Manually submit a form requesting a deployment to production. Included in the form is an area which points to the artifacts on the shared folder as well as a time and date when the deployment should occur.
- Regenerate the delta script between pre-production and production, which is then placed into a shared folder. A link to that shared location is added to the form created in previous step.
- At 11:00 AM the day of the deployment attend a Change Review Committee (CRC) meeting discussing the deployment.
- The code changes is manually deployed to production by web admins and database changes are manually deployed by DBAs.
- Manual verification of the production environment.
During emergency steps 1, 2, 3 (changes were made right in the master branch), and 12 were skipped. The fastest code could get to production during this time was about 2ish hours. And that was when it was really pushed. For example, right after a pre-production request was submitted I would walk over to the web admin area and get them to manually deploy the change as soon as possible. This also meant a lot of extra stress, which meant people were less likely to be diligent and occasionally the fix caused another bug. Or the database delta script missed something.
That 2-hour timeframe is only during an emergency. If the process was completely followed it took about a day or so.
Having a deployment take at least 2 hours when really pushing it automatically excludes it from being able to be deployed to production every 45 minutes. It would be a constant state of chaos. Stress levels would be at a max all the time.
Current State of Deployments
In the past four years, a lot of progress has been made in making this more efficient. Now the process to deploy code to production is:
- Code change / database changes are checked into a feature branch. A pull request is created. A pull request will attempt to merge the change into the master branch, built the merged code, and run all unit tests. During that time another developer will look over and approve the changes.
- Pull Request is approved and automatically merged into master. The code is automatically built and artifacts are pushed to Octopus Deploy.
- Build automatically tells Octopus Deploy to deploy to the development environment.
- The development environment is automatically verified via integration tests.
- Build automatically tells Octopus Deploy to deploy to test environment.
- Manual verification of test environment.
- Repeat steps 1-6 1-n number of times until code is in a releasable state.
- Push a button to tell Octopus Deploy to deploy to pre-production environment. This generates a delta database script between what is in source control and pre-production as well as what is in source control and production
- Manual verification of the pre-production database delta script. Everything is deployed to pre-production environment.
- Manual verification to pre-production environment.
- Manually submit a form requesting a deployment to production. Include in the form the Octopus Deploy projects and version numbers to deploy. Indicate time / date for deployment to be scheduled via Octopus Deploy.
- At 11:00 AM the day of deployment attend CRC to get approved.
- Code and database changes are automatically deployed to production by Octopus Deploy
- Manual verification of production environment.
During an emergency, all steps are followed except the attendance of CRC. All automated tests are run. The two hours has been whittled down to 30 minutes in the event of an emergency. That may not seem like a lot, but the confidence level in each deployment has gotten much higher, it is to the point where for smaller releases the operations staff doesn't need to be online or present during deployments. Automating Database Deployments and adopting Octopus Deploy were the biggest reasons for that drop in time.
This is the point where I wish I could tell you we sat down, wrote out of the process as it stood and where we wanted it to go. But that wasn't really the case. The number one thing people wanted to streamline was the CRC process and get rid of the manual meetings. That drove the direction of almost every decision that was made, from tool choice to deployment philosophies. Along the way, we adopted a number of things to help with that, such as build once/deploy anywhere and automated database deployments.
Future State of Deployments
30 minutes is great, but there is still a lot of room for improvement. That 30 minute timeframe is only in the event of an emergency. It is not something that can be done every 45 minutes. There are still a number of manual steps that need to occur. These either need to be streamlined or automated. The CRC meeting is right now the biggest impediment to the whole deployment process.
At the time of this writing, this would be my ideal future state.
- Code change / database changes are checked into a feature branch. A pull request is created. A pull request will attempt to merge the change into the master branch, built the merged code, and run all unit tests. During that time another developer will look over and approve the changes.
- Pull Request is approved and automatically merged into master. The code is automatically built and artifacts are pushed to Octopus Deploy.
- Build automatically tells Octopus Deploy to deploy to the development environment.
- The development environment is automatically verified via integration tests.
- Build automatically tells Octopus Deploy to deploy to test environment.
- Manual verification of test environment. Tests for verification are created.
- Repeat steps 1-6 1-n number of times until code is in a releasable state.
- Push a button to tell Octopus Deploy to deploy to pre-production environment.
- Pre-production environment is automatically verified.
- Push a button to start the deployment of production. Approvals are gathered via some sort of electronic process.
- Code is automatically deployed to production by Octopus Deploy
- The production environment is automatically verified.
That being the ideal future state, it is important to note the time for deployments can only be whittled down so much. No matter what, a build/deployment cycle will take some amount of time. What is an acceptable amount of time is completely up to you and your company. For me, I would like to see that get whittled down to 15-20 minutes which includes some sort of CRC approval. Some of the ideas being thrown about to help this process further are:
- Replace CRC with a tool that will allow people to approve without attending a meeting. Still requires a manual step to enter something but it will eliminate the meeting.
- Developers work with business owners and QA to create a suite of automated tests to run immediately after a deployment to pre-production and production. The creation, verification, and subsequent passing of these tests will be the "manual verification" of the test environment.
- Developers create a suite of sanity check tests to run in all environments immediately after deployments.
- An automated way to verify database changes earlier in the process. This would eliminate the need to manually approve delta scripts.
Next Steps
This article was only the start for this subsection of DevOps. It seems like there are many common topics every company encounter on this journey.
- Database Deployments
- Code / Release Management
- Test Automation
- Buy In
I have already written a series of articles for Automated Database Deployments, there is not need to rehash that here. At Farm Credit Services of America, we are beginning down the path of DevOps. Most likely we will stumble along the way. My plan in the coming weeks and months is cover the remaining topics in much more detail and cover what succeeded as well as what failed. Unlike my topic of automated database deployments, which was well on its way to being adopted by Farm Credit Services of America when I started writing on it; the topic of DevOps is going to be covered on this site in much more real-time.