On the 13th of November 2014 AWS launched Elastic Container Service (ECS), their managed Container Orchestrator, a managed master. You no longer needed three dedicated maintainable servers to run the “master” application on. You could launch as many Orchestrators as you needed for free. Alas, you still had to maintain and pay for the underlying fleet of EC2 virtual servers. A managed master orchestrator contributes to a system’s resilience as it is another application maintained by someone with far more experience than you.
It started hard for AWS. I remember the service being buggy and sluggish. It could have taken a few good minutes to launch more servers and tens of seconds to launch more containers. It lacked a lot of features that Swarm and Kubernetes already had. At Wiser (2016) we were working on a project that required concurrently running and constantly recycling thousands of containers. Our architect tested ECS and it just couldn’t hold.
As the years went by, ECS became more and more resilient with more and more features added to it, such as a scheduler, a Container Auto Scaler and eventually also a Cluster Auto Scaler that scaled the underlying EC2 servers. But it never caught up with Kubernetes which is now the de-facto standard, at least for the time of writing this. By the end of 2017 it was rumored that AWS are going to launch their own Kubernetes service. I was curious why would AWS invest with a rivalry instead of pushing forward their own solution. That sounded very counter intuitive.
One day I met with Yaniv Donenfeld, who was in charge of the entire Container technology product roadmap at AWS, a very serious guy. He said that I’m looking at this entirely wrong. “Did you know that 80% of Kubernetes Clusters are running on AWS’s infrastructure?”. His line of thought was eye opening. He continued and explained that If most engineers are adopting a specific technology and you will not support a de-facto standard – you’ll end up losing them as future customers. If you can also make your current customer life easier with adopting a rival technology – do so.
It is no wonder that AWS invested time in it. In the end, AWS are making money out of worker nodes/EC2 servers, no matter which orchestrator the customer chooses. Indeed, In June 2018 AWS launched Elastic Kubernetes Service (EKS) which just like ECS had a managed master, although it was not a free one. Worker nodes were running on EC2 instances.
I was eagerly looking forward to this release as I prefered that myself, my engineers, my company and our infrastructure to work with the industry’s standard. As eager as I was and as we were months before product launch it would be easy for us to switch between ECS and EKS. Before moving forward, the first and foremost thing to do is to compare between the two.
ECS was completely superior for two reasons. The first was the cost. ECS was completely free no matter how many clusters or environments you have. It would have been about $1600 a month with K8S. The second was that ECS was fully integrated with all other AWS managed services, notably Container Auto Scaler, load balancers and IAM (security). I figured I would have to invest about 3 months on EKS to make it integrate as such. I would also be the one in charge of maintaining these integrations (I guess most of them were addressed and resolved by now, please double check it). ECS’s only downside would be that it can not be run on a local working station, but the containers themselves can.
My conclusion was that in order to meet “industry standard” I would have to pay more to get less, a less resilient system without any kind of benefit. Not worth it. We stayed with the good old and proven ECS.
Serverless Containers
Three years later (November 2017) AWS launched the world’s first serverless container platform, a Platform as a Service named Fargate. It wouldn’t matter any how for a developer, it is just another orchestrator that “takes my application and runs it somewhere I don’t know or care how”. But for the Ops guy it was a game changer as underlying server maintenance is no longer required. At all.
The Ops would no longer care where the applications are running. All the “open issues” of optimal bin packing and keeping the cluster up to date no longer exists. No need to buy and maintain servers, no need to launch or shutdown servers at all. It’s no longer the entire company’s problem. If a developer needs 4 CPUs and 4GBs of RAM times 10 instances due to high availability, he’ll get it within seconds. No questions asked. By the end of 2019 Fargate also supported Kubernetes Pods, thus making Kubernetes completely serverless as well.
Don’t worry, nothing comes for free. Running containers on Serverless Fargate has a premium of about 280% over running it on ECS. Is it worth it? One morning at Silo, one of our development environments stopped working. For some reason, all of our containers stopped responding and even recycling them and redeploying them did absolutely nothing. After two hours of trying to figure out why and what happened I gave up. It was a development environment so it wasn’t time worthy.
I brought down the ECS cluster and relaunched it. As we were using Terraform it was a one line command, which took about 45 minutes to execute. Once the cluster was back up, everything went back to normal. So that’s a few hours of downtime and our team had no knowledge how to resolve an underlying server issue. Could this have happened / will happen in a production environment? Why not actually? If it were to happen in production we’d be toasted. So we’ve decided that closer to the product launch we’ll switch to Fargate – to increase our system’s resilience.
Note: you can run an Orchestrator in your own data center. But if Serverless Compute is defined by zero maintenance of zero servers, then by definition it can not be applied to your data center, the one who is full of bare metal servers.