02 Feet in the Cloud, 03 Serverless Development

Safer but Slower: Latency

Reading Time: 6 minutes

Cold Starts

When a Function is idle, not handling any incoming requests, it is frozen and later thawed. When not enough Functions instances are available, the infrastructure would then launch a new one. Alas, launching a new one can take from a few hundred milliseconds to a few seconds. 

This time varies and consists of:

  • The micro container image needs to be downloaded into the underlying managed EC2 server, which is why Lambda functions are slim and limited in size (adds between 0.5-6 seconds)
  • Starting the micro container instance (around 100-150ms)
  • Starting up the instance within a VPC is slower (adding 200ms to almost 1s). AWS brought down this number from 15 seconds!
  • Application bootstrap:
    • Environment bootstrap (C# is several ~100s of ms slower than the fastest Node.JS and Python)
    • Your custom initialization (could be nothing, could be seconds)


Note: AWS are constantly working to bring those numbers down and their variance is very high. Treat these numbers carefully and test your own code!

Although the average livelihood is around minutes, it should not be counted on. It is impossible to predict and control when a cold start would happen. A cold start always happens on scaling up either from 0 to 1, or from 100 to 500. It is bound to happen during traffic surges and can be caused by other events as well. Expect a substantiated lag could affect your customers for a while

On the other hand, although Containers have a much higher cold start, the effects are noticeable. It takes between tens of seconds to a minute to launch another instance, plus the application bootstrap. As they are scaled in advance, according to expected future surges, cold start most probably will not cause any lag at all to the end customer. With Containers, there is no cold start on as zero instances available, as there is always at least one container available. 

Under extreme predictable surges, Containers should launch fast enough. Functions will better handle unpredictable and extreme ones.

To better explain this tradeoff in terms of end customer latency, I’ll bring forward a few practical real scenarios I encountered. Always know your use case before selecting which compute to use.

When nobody cares about latency

I’ve once before given the example of Silo’s Device Metadata Service (2019). It was invoked once in a few months and had to digest several thousands of messages created from a parsed CSV file. Each message ended up being a row in a MySQL database (Aurora Serveless). These entries were accessed for the first time weeks afterwards. Latency was not an issue at all. 

When latency makes your product look bad

At Silo (2019) we had a business/user flow that started right after a container was successfully vacuumed, leading to a customer labeling his food. A customer would lift the container off the base and that would trigger the Alexa Skill.

This triggering involved two different Lambda invocations. The first one was an OOTB one owned by Alexa Voice Services (AVS), that would ask the customer “what’s in the container?”. After the customer replies “strawberries”, a second Lambda function owned by Silo was invoked. It would reply with “strawberries were added to your inventory”. When multiple Lambda functions are chained together and more than one requires a cold start, the lag quickly adds up. Our product manager would stand for a few good seconds with a heavy container in its hands waiting for something to happen. We ended up disabling AVS’s Lambda and adding its logic into our own Lambda. That resolved this and our PM was happy once again. At least for a while.

Further down the road, during development we had a total of 5 Lambda functions invoked within a VPC:

  • One Service that will use natural language processing (NLP) and a graph database (Neo4j) to uniquely identify customer’s intention (think about a customer once saying Spaghetti and then queries for Pasta). Neo4j needs to run within a VPC [see Dead or Alive: Persistent Connections of this series for further details].
  • Another would estimate for how long the food is going to be good for.
  • Due to the system’s asynchronous nature and it’s Service Oriented Architecture, a “bridge” to aid synchronous requests was needed and it needed a Redis within a VPC. [see Dead or Alive: Persistent Connections of this series for further details].
  • In order to fulfil the bridge’s lock/release mechanism, two more Lambda invocations were required.

After we finished coding the bridge, we tested it with mock services that did nothing. When all the functions were cold, the full interaction took 14 seconds. Obviously going further with Lambdas was not the way to go. We planned to switch to Containers once the two other services will be ready, which was not a part of the MVP.

At the beginning of the series I stated that we coded our applications to be able to run both as Functions and Containers. This scenario above was an exception, as we knew for sure it would never run on Lambdas. Sometimes you need to let go of your own rules when the scenario calls for it.

In between – when nobody notices latencies

The user flow did not end when the user labeled his food via the Alexa Skill. Once done, a third service not mentioned above, the Food Management Service would then update its own DynamoDB which would eventually cause an update to a view on the user’s mobile application. According to the customer’s mental model of our product, a flow that started at the device could end up on his mobile a few seconds afterwards – because they are two separate physical entities. The Food Management Service ran on Lambdas and no one, neither the developer, the product manager nor the investors, noticed lags caused by cold starts.


Provisioned Concurrency

The first solution to consider is AWS’s Provisioned Concurrency, which is the quickest most reliable solution as it is OOTB by a cloud provider, without any coding or maintenance. It tackles the cold start issue by keeping a minimal amount of micro container instances warm up in advance (requests are still executed one at a time, instances are still unpredictably recycled) thus reducing the probability of a cold start and better guaranteeing a low latency user experience when needed. 

It doesn’t come off cheap. For a single 512MB RAM function that is kept warm 24/7 you’d be paying 5.4$ monthly per reserved instance (pricing is linear with warm time and RAM). You would probably need more than one.

You’d be paying less for the marginal processing duration (the actual compute time used) but the economic equilibrium point is at 9 days worth of processing time, which would suggest that your Lambda processing is a little less than a third of it’s up time.

If that is true, you maybe shouldn’t have gone with Lambda in the first place. For comparison, running three Containers on Fargate for high availability costs 8.52$ (3.12$ / 57% more) and can reliably handle many more concurrent requests than just one. Provisioned Concurrency may not be cost effective at all. 

There are ways to reduce these costs, by dynamically scaling or scheduling (by using Application Auto Scaling feature/service) the number of reserved instances (e.g. keep it warm for just 8 hours a day, during work time).

I would consider this solution only as a “quick and dirty” one if your code is already running on Lambda. Consider it against the cost of switching from Lambda to a Container on Fargate, which could require coding. Remember that an engineer gets paid 200k$ annually, thus a single workday costs about 550$ which is equivalent to keeping 8 functions warm for an entire year.

Do the math, know your use case.

Warm uppers

Another solution, which is considered a good one, is to have a scheduler that invokes your Lambda function once in 5 minutes and there are several [serverless plugin, Lambda Warmer] implementations already available.  

Use Containers

In the case of a low latency customer facing flow, Containers should be the default way to go. The best way to overcome trouble is not to get into it in the first place. 

Although running on Fargate is a learning curve, there would not be nearly any maintenance to do. It still ensures a highly resilient system. If you already have expertise with Containers, go with it. If you don’t, don’t be afraid to use Containers. This is why I started this series with a warning not to be afraid of Containers. This is the use case where being afraid of it would lead you down the road to bad decisions and a bad product. As harsh as I sound now do remember this is a tradeoff, a consideration. The right solution for you and your current application state may be another one.

Leave a Reply