A Function’s interconnectedness with other services of your cloud provider is one of its beauties. It can save you a lot of unnecessary, repetitive coding of integrating with other infrastructure cloud services. The below use cases would have you better understand how to decouple applications, when and why it would be better to use a Function and one day to integrate with other custom infrastructure components. As it is embedded into the infrastructure layer, it is reusable between multiple applications. Taking advantage of this is a major trade off to consider.
It was 2010 when AWS first introduced Simple Notification Service (SNS), a fully managed notification service. Since then more and more services have started reporting changes to their internal state, better known as Events. On the other hand, more and more applications could have reacted according to these Events. Applications that are waking up from idle state due a reported change (Event) are called Event Driven, or reactive [more about Silo’s reactive system in a future series].
For example, in 2011 Auto Scaling started reporting Events to SNS when an EC2 instance failed to launch. You could have configured SNS to send your IT team an email every time that this specific Event had happened. In turn, it allowed the IT to respond in real time to this unexpected error.
Decoupling with S3
A more interesting Event was emitted from S3 (managed file storage system) in the end of 2014, one that notified that a new file has been uploaded to a bucket. Up until then, if you wanted to process incoming files you could have you basically had two options:
The first and simplest one is to have a web application with an upload form. Each upload would be a thread that would both store the file in S3 and then immediately process. A different solution that would allow later processing, is after the upload is complete a message would be sent to another thread who would process it in its own time. If the application were to crash the upload would be lost and no one would know. You’d have to wait for the customer, who is long gone by now, to start the upload again. The two roles, the upload and the processing, were tightly coupled together in the same application.
The second option would be to decouple the uploading and the processing. Have one web application with an upload form and another application continuously polling on S3 waiting for new files to arrive. Polling is problematic. If it’s too frequent (1 seconds) you’d be hitting your server for no good reason (you’re paying for each S3 API call) as a file may rarely arrive. If it’s not frequent enough (5 minutes), you’d cause lagging as it would take a while to detect that a file has indeed arrived.
The benefit of an Event from S3 that it could be indirectly sent to your application either via a push mechanism (SNS) or through the long poll (poll once and hold until an item is available) mechanism of Simple Queue Service (SQS). That was one huge step forward as it solved the constant polling problem, but it still was another running application mostly in idle state that needed to be maintained and scaled.
If you’ve read the series of Cloud Computing Evolution, you wouldn’t be surprised to know that at exactly the same day that S3 has started to report Events, Lambda was announced. Not a coincidence at all. S3 started asynchronously invoking a Lambda function with an Event. The event would pass to Lambda Runtime’s internal queue that would launch a function instance for processing.
When choosing between a Container and a Function, consider that Lambda can be asynchronously or synchronously invoked by many AWS services, including SNS, CloudWatch (to reach to alarms), Auto Scaling (to react to scaling failures), API Gateway and Application Load Balancer (to react to an incoming web requests).
These seamless OOTB integrations will save you a lot of coding time and as they are maintained by AWS themselves, they are highly resilient. I would like to further elaborate on two unique services, both frequently used message aggregators for async processing. Using Functions greatly would simplify integrating with then. [on the difference between a Stream, a Queue and a Topic in a future article]
Kinesis Stream – a Consumer per Shard
At Silo (2017) we had PoC with 30 devices based on Raspberry Pi. These devices reported Events that ended up at an ElasticSearch. That was the PoC of what later will become our Event Analysis [more on that in a future article]. I won’t dig into all the technicalities of the pipeline details but the Events went all the way through the device to ElasticSearch:
A Kinesis Stream is a sharded stream, the data is internally spread across partitions and each shard/partition could have no more than one consumer attached to it. You could have attached multiple listeners/applications types on a single stream. Kinesis throughput is scaled by increasing the number of shards, not the number of consumers.
Back then, the best practice was still to code a Kinesis Consumer and run it within a Container. An implemented sophisticated coordination between all shard consumer instances was already available in Java but the team was coding in Node.JS which required a MultiLangDaemon that was also already available. But even so, it was a nightmare. It took a total of two and half weeks to get it running in a Container and guess what? it mostly worked. Once a week something went completely wrong with the MultiLangDaemon and we had to manually restart the Container.
If you have multiple instances running and it does not have any effect on the system, automatically and expectidly restarting Containers once in a while is not a bad idea at all. Instead of coding a reset or a cleanup after each or several successful executions, have the application safely exit and set the container to restart upon exit. That would ensure that it is a 100% ready for the next execution, without being dependent on the previous ones. This actually increases the system’s resilience. But that of course would have needed to be coded into the MultiLangDaemon. Lambda natively does something similar as Function instances are destroyed every few minutes/executions and new ones are launched instead. [more on the “you had one job” rule of thumb in a future article]
In 2019, when we worked on the well designed pipeline, it took us 1 day to code the processor and one day to launch it with a Lambda function, to integrate it and test it with a Kinesis Stream. Never had any troubles during the few months it was working. It scales as you can easily add a Function instance per shard. Although a Lambda running for 24/7 costs about 20$ monthly, more than 3 times of highly available containers, it was definitely worth it.
Do note that Silo’s Kinesis Consumer was a stateless one. There are many use cases for Kinesis which would entail a stateful consumer.
Scaling SQS Consumers
Unlike the above use case of a Stream, a queue throughput is scaled by increasing the number of consuming instances. Consuming from Queues is done in batches, thus the ratio between numbers of messages and concurrency required is equal or lower than one. You won’t be paying for anything when the queue is empty. Functions are good exactly for this.
Coding an SQS consumer can take a while and some open source implementations can be found, but that’s not the effort being saved from you. Scaling any kind of application is not always an easy thing to do. Scaling Queue consumers is even more problematic.
Any kind of application scaling, or Auto Scaling, requires metrics and rules:
- Scale up when metrics are above X
- Scale down when metrics are below or equal X
The most basic one to start with, is the one who makes the most sense, CPU Usage:
- if not enough compute power is available → get more
- If too much compute power is available → release it
It is a sufficient metric to upon scale Queue consumers up and down but far from being an optimal one. Sub-optimal metrics lead to costs not saved. It may be 10$ a month for your small company and not worth the trouble at all, but it may be 1,000$ for another company or for the millions of AWS customers combined.
A far better metric for scaling would be the second derivative of the number of messages in the queue, the rate the number increases or decreases through time. That is an excellent predictor to how much processing power you would need in the next few minutes. I hope your math is good enough to code a time series of a second derivative, hopefully the real metric is no more complicated than that. Besides coding, that would be an application that continuously and frequently samples the number of messages and is running 24/7. Lambda Runtime does just that for you.
Crossing infra/app boundaries
Terminating a consuming application during scaling down is even more problematic. Let’s presume that you currently have 10 consumers running in Containers. Their aggregated CPU Usage is 60% and you wish to switch off two of them. Your Orchestrator which reacts to an Event from the Auto Scaling service randomly picked two out of 10. It has forcibly shut them down, alas these two were still processing messages. That may be fine as these messages will not be lost at all and will just be processed by another instance later on, depending on message visibility timeout. Later entails latency.
This can’t be completely solved with a shutdown hook, although it is a step in the right direction. The Orchestrator would notify the consumer to not to pull another batch, to not to further process incoming messages. Still the problem remains with the ones currently being processed.
Timeouts may be sufficient, but not airtight. The Orchestrator could give a grace period for the consumer to finish processing. A too short of a grace period and you haven’t solved the problem. A too long of a grace period and you’re paying for unutilized compute resources.
When a long grace period is set, The Orchestrator would not be able to distinguish between a timeout and a heavy duty batch that for some applicative reason takes longer to process (slowness in the DB e.g.). Setting up the exact timeout, or a dynamic one, would require the Orchestrator to know what the application does – and that’s crossing a boundary. The infrastructure layer is expected to remain agnostic to the application layer.
If timeouts are insufficient, you would consider an event driven one. Let the consuming application tell the Orchestrator when it is done processing, then either gracefully shutdown or let the Orchestrator terminate it. But that would be again crossing the boundary only in the other direction – your application would then need to be aware and to communicate with the infrastructure layer.
A Function instance will never be terminated during processing as it is invoked synchronously, thus no messages will be lost as there is no unexpected termination to avoid.If behind the scenes Lambda fails to scale down, that is not your problem as you are paying only for actual compute time used. Using Functions with SQS saves you from entirely addressing this issue.