A database failure or a bottleneck is harder to resolve (failover, tuning or scaling) than a stateful/stateless application. A database is also a more “sensitive” component in the system. For both these reasons, the answer for should you use a container is it depends.
You can run a database in a Container in production. But should you do it?
There is a lot of old misinformation and misconception when it comes to Container performance, both networking and I/O. Many intro level blog posts about Docker and Containers are just reverberating “it has I/O latency” without anything to back it up.
I’ve found two posts referring to research done by IBM in 2014, saying “show that containers result in equal or better performance than VMs in almost all cases”. The amazing thing is that one post referenced this research but still said “it’s bad for performance” without any explanation. Another from Stackoveflow quoted the results directly. Alas, the IBM research does also say that Containers require tuning for I/O intensive applications. A later research done in 2017 suggested the same.
This is 2020. Things may have progressed since then, which is why I posted that question in a much loved forum called Operations Israel, where way more experienced engineers than me reside. I’ll summarize the insights I’ve gained from it.
There is no technical reason not to put a database in a container (forgive me for the double negative), but there are other business constraints to consider. A good example mentioned in the forum was if your product is an on-premise one, no one guarantees that all your customers would allow you to run containers.
Another exception mentioned is that of a scenario that requires a constant peak high end performance under heavy load. A scenario such as Algo Trading, when every millisecond counts. That would be an excellent reason not to use Containers at all. Neither does virtual servers by the way. You’d work directly with dedicated servers / bare metals anyhow. If that’s not the kind of performance you require, then running a database in Containers would barely take any performance hit which you’d overcome with scaling the infrastructure and the database itself [see Scaling Strategies: Infrastructure or Applicative Scaling]. Or as someone in the forum beautifully put it”Containers would be one of the last bottlenecks you’d need to resolve”. On the contrary, Vitess.io is an example of MySQL intended to run on Kubernetes by default.
Another rule of thumb surfaced from that forum. If you are already running an Orchestrator and Containers in production and have hands-on experience with it, it would be easier for you to launch, manage and scale a database in Containers than doing it on a virtual server. For me at Silo as most of my career was with Containers and all of Silo’s applications run in it, it would have been easier to launch Neo4J in a Container than to start digging into virtual servers setup and maintenance.
On the other hand, for an organization that has just started with Kuberentes and has been running a MySQL server for years on bare metal, they shouldn’t containerize it. As always, know yourself, know your company and know your use case.
Lastly, there is always the unknown unknown. I have a few good years of experience with Java, but never ran a Java application in a Container. What I did not know is that JVM has tuning issues when run in a container. How long would it have taken me to realise that if I’d try to run Neo4J in a Container? Luckly, I didn’t have to find out as Aura, Neo4J as a Service, came out. What else I don’t know? I don’t know!
That’s why, even though I’m an advocate for Containers, I would still think twice before running a database in one. I would double check my use case, the database itself and benchmark before making such a hard to change decision. Or just go with a fully managed database.
Back in 2016, although 4 years old Containers was considered an unstable technology, which is why you’d find many articles suggesting not to use Containers in production. It is indeed another layer in the application stack. If it crashes, it takes the application with it. They were right to worry.
That was back then. Today is 2020. Containers are being run by the millions worldwide in small and large companies. Although during Silo’s PoC we ran into an unresolved issue with Docker Containers that took us a few good days to recover exactly just before shipping the devices out. When the AC plug is pulled out unexpectedly, the docker engine fails to recover. Who knew, but come one that’s an obscure scenario.
Again forgive me for the double negative but there is no technical reason for an application not to be in a Container. I would not even think twice about it. It is a solid, proven and beneficial technology. If you’re starting from scratch, go with Containers.
If you’ve never practiced with Containers, consider the following when not to:
- If it isn’t broken don’t fix it
- There’s no business value in it for you / your company
- It’s not worth the learning effort
- Refactoring is hard [see Not all that Glitters is Gold: Limitations & Refactoring]
If you’d invest months in containerizing all of your applications just for the sake of it, and not as a roadmap for higher availability or lower maintenance, then you’ve done nothing but waste time and resources.