Unfortunately, The physical burden of bare metal maintenance was only replaced with a virtual one.
It was 2014 when I started my second job at a medium-sized Israeli startup called Dealply and getting a server was far too easy, as many as I wanted or needed. All it took were a few button clicks. I just had to log on to the company’s AWS account, go to the Elastic Cloud Computing service (EC2) toggle a few options, click some “next” buttons and within 10 minutes I had an entire new server up and running at my disposal.
I was personally incharge of 10-15 application servers running one I coded myself, and another 20 or so Solr (a textual search engine) servers as well. The team I was a part of were in charge of another 100 or so. All we did what was expected of us, to take full ownership, end-to-end, of what we do. That included being our own DevOps and maintaining what we created. Don’t get me wrong, it was a lot of fun and I’ve learned so much in a very short time. It was good for my career to have hands-on experience in keeping servers up for 24/7.
In hindsight and from a managerial position, owning and maintaining these servers was a complete waste of the personnel’s time. I think we spent about 20 to 40 percent of our time on rolling out updates, replacing servers and fixing OS issues or automating these processes which was also time consuming. It was a lot of effort spent not on the money makers. It’s even funnier because Daniel, the CTO who recruited me, was the one who I stole “the money makers” state of mind from.
Maintenance could have been something simple and idiotic such as the log forwarder has stopped working for some reason. The logs then were not deleted in time leading to a completely full hard drive which crashed the entire server along with all the applications running on it.
Maintenance could have been something much more complex. You would never guess that the latest Java minor version that came preinstalled with your newly launched server instance would have such an obscure effect on your application. That was a week’s work gone for nothing. Not to mention that each application instance’s memory settings were unique to each server, as the available resources (RAM, CPU, HD) were not the same between all servers. Or that one time where I forgot to copy the OS kernel’s maximum number of open files (nofile) from the old server to the new. I’m human after all. The presumption that applications would run the same on different servers is inherently wrong. Applications will act differently on different servers. Different servers include my precious local working station as well which is nothing similar to any of the production servers. It is hell.
Although the startup was a profitable one, recruiting more engineers was hard and no one in management was eager to spend more money on salaries. Reluctantly we work harder and harder. I think a regular work day was around 10-11 hours in the office, another hour from home and waking up every other night because a server went down. It was horrific. The next job was even worse.
Not only these experiences are now deeply embedded in my philosophical approach to system design [see the Philosophical Layer series], they have evolved into a promise to my engineers. I want them to be happy when they come to work every morning. I want them to be happy with their families when they go home. My engineers need to sleep well at night knowing that they’re servers won’t go down. In order for engineers to be happy and for them to focus on the money makers – maintenance must be minimized. They should not spend their time chasing ghosts in the machines. Which is why I simply LOVE containers, which opened the door to an entire new era in cloud computing.
Version mismatches are hell
The entire company has been using ElasticSearch (ES for shorts) version 2.4 for happily ever after. A new project has been initiated with an entire new application written from scratch. Lucky you! You would then notice a requirement that would be very hard and time consuming to implement on ES v2.4. Alas, with the latest ES v6.5 it’s a no brainer at all.
One way for you is to go to your Ops guy asking if he can upgrade the ES production to v6.5. He will correctly deny that request as all the company’s applications depend on v2.4. You then ask him to deploy an entire new ES v6.5. You’re out of luck as he has no idea how to install and maintain that version. It is not your fault but you’re stuck with the really time consuming solution.
Let’s say he did give you a go, or you launched it on your own. You install ES v6.5 on your laptop and start coding the new application. Three weeks afterwards you’re being praised and glorified for the good work and getting a new task – on one of the legacy applications that requires v2.4. Crap. It turns out the two cannot work on the same computer side-by-side. They require different Java versions. You roll everything back to v2.4, finish that task and two weeks later it turns out there’s a bug in your newly developed application. Now what? Endlessly keep on rolling backward and forward?
So major versions are off the table. The same could happen with minor versions as well. Turns out that this whole time you’ve been working with MySQL 5.7, the production server is MySQL 5.6. Only after you’re done coding, you’d find out it was based on a feature published with this minor version. Now you have to code everything all over again. Company time and your time was wasted. What will be even worse is chasing a bug that does not exist. If an issue in production, that’s running with MySQL v5.6, has been fixed in your locally installed v5.7, for how long will you be chasing your tail until you mark the bug as rejected/closed?
A similar but worse case would be if your entire runtime is different. You’ve been developing an application with Java 7, but it turns out that the entire company’s server fleet is using Java 6. Nobody will or can upgrade the entire fleet for you. Your application is not the only one in existence. So you either just cost the company a whole new server (even a virtual one costs money and added maintenance!) or now you have to downgrade your Java version and recode half your work.
Containers to the rescue
All of the above is real common scenarios that happen to many developers, or at least they did back then. How using Containers would prevent all of the above?
From a developer’s perspective Containers have two very important traits:
- OS independence – the “compiled” container comes with all the OS dependencies required
- Identity – the “compiled” container runs exactly the same no matter where it runs
(There’s a third important trait that will be uncovered later)
Instead of installing ElasticSearch or MySQL directly on your local workstation, you download an already made Container, that includes a specific version and all the required OS dependencies are installed within it. Thus you can just turn Containers on and off and they run no matter what is installed or missing on your OS.
You are coding an application that needs ES v6.5?
- Fetch v2.4’s Container from the Container Repository
- Turn off v2.4’s Container which comes with Java 6 installed within (the OS independency trait)
- Fetch v6.5’s Container from the Container Repository
- Turn on v6.5’s Container which comes with Java 7 installed within
It works vice versa of course. Isn’t that simpler to set up and manage than installing/uninstalling/rolling between versions? You need to mirror your production environment’s MySQL? Just fetch it’s container and use it.
The identity trait ensures that the same Container will run exactly the same at your workstation, and at your colleagues’ as well. When they’ll need to work on your applications, they won’t have to go through the entire ES/MySQL installation process, they’ll just fetch the same Containers you are using. That’s a lot of time saved and mistakes prevented. Your application’s external dependencies are fixed, better handled and better shared.
Your application is someone else’s external dependency. Instead of delivering a source code or a compiled one, you’d be delivering an entire Container, built with “instructions” from a Dockerfile with all the required OS dependencies. You’ve just made your colleagues live easier. The Ops guy’s life would be that much easier as your Container can run exactly the same on every server and he will not need to make any changes in production for you.
There is however a caveat. Developing inside a container needs to be supported by your IDE. Debugging code with your IDE requires a remote debugger set up which is hard and sometimes not even possible. I would start with coding directly on your local workstation and run the container / run tests inside the container and leave the development within a container to extreme needs only.
Both your code within a Container and it’s external application dependencies can be delivered together as a Compose. With the bare minimum of what your application does, the Ops guy can deploy your Compose to production. Consider how easy it would be to update it. That would be replacing just a single Container and not an entire server. The less components changed the better.
It contributes much to a system’s resilience and to Velocity.
- Prevents mistakes during development
- Zero set up time of external dependencies
- Simpler and safer development to production delivery method, correctly decoupling between Dev and Ops
That is only the beginning of a Container’s journey. The question of how it changed cloud computing is yet to be answered. On that and more, in the next article.