03 The Change Factor, 05 Future Changes

Storage of Truth: Persisting Events

Reading Time: 10 minutes

In the previous chapter, we’ve tied the last knots of our system architecture. We’ve instructed all applications who wish to participate in the system or in an experience, to send and receive {EVENT} messages from our messaging platform. By doing so, we’ve infused Event-Driven Development and The Reactive Manifesto, making {EVENT} the canonical message of our Enterprise Messaging Architecture.

In turn, it transformed all of our applications into Event Handlers. From the system architecture’s perspective it is agnostic whether the applications are Actors, Agents, backend applications or any other kind. And it is also agnostic to where they are deployed to.

This infusion brings an interesting twist to why and how we store Events. It also uncovers a certain truth.

Past Sense

Events have some very interesting properties. Done correctly, an Event describes something that has already happened. Philosophically speaking, it might be the only thing that is not impermanent, maybe even against the very idea of The Force of Change. Or maybe it is a part of it. What’s in the past no longer Changes. Events are immutable.

For example, if customer 1 had clicked button A at 01/01/2020 19:00:00, it would be forever true that this exact customer had clicked on that exact button at that exact time. It would also be forever true that exactly right afterwards, the light for customer 1 had been turned on. Because that’s what his physical device did.

However, a newer version is installed on customer 2’s device. In this version, clicking on button 1 makes a quack sound. It would still be forever true that customer 2 had clicked on button 1, and it would still be forever true that a quack sound was made.

Consequently, unless the sensory data was gathered and added to the Event, we shall never know what had happened. Unless gathered, we’ll forever never know for how long the button was pressed on, or how bright was the light in lux and in what color, or how loud was the quack sound. That lack of knowledge will forever remain true.

Without it, at best it may be guess work. Some of it would be a good guess made. If a button was clicked on our mobile application running on iPhone XR, the screen resolution is known and constant, even without gathering it. On the contrary, if the very same button was clicked on our web application running in a browser on a desktop computer, if we won’t gather the screen resolution at that exact moment, we’ll never know it. Same as if we won’t gather which browser was used.

Derived Present

Our device’s latest version now quacks. According to our system architecture it would also have a backend application reacting to Quack Events. It’s a Quack Counter Service. All it forever does is persist how many quacks each customer has made. It will be forever true that customer 1 had quacked 20 times and that customer 2 had quacked 30 times, in the entirety of 2022.

On the contrary, our lovely product manager had come up with a new feature, Quack Achievements. Whoever quacks 10 times, will earn a Junior Quacker badge, and whoever quacks 20 times will earn a Master Quacker badge. A few weeks later, it turns out we give away too many badges. From now on it will take 15 quacks to earn a Junior, 25 quacks to earn a Master, and 45 quacks to earn a Grand Quacker badge. We coded something that will grant badges in retrospective, but unfortunately it would also take badges away from the current achievers.

Customer 1 has lost his Master Quacker badge, although his count remained the same. That is because the count is based solely on the sensory data of Quack Events, and badges are derived from Quack Events.

The history of an action made and the history of the sensory data, will forever remain true. Whatever is derived from them, may not be forever true. But they are true for a while, until Changed. It would be forever true that for a fixed period in time, customer 1 had a Master Quacker badge. And it would be forever true that his badge was removed at another point in time. 

Having a badge or not having one are two sides of the same coin, both are temporary states. What defines and transits between those if what we call business logic, or better yet we call it a User Journey, a product/customer experience, or a Flow. All can be modeled as Actors emitting Events, even actions as BADGE_GRANTED and BADGE_REMOVED. It would be forever true that both had happened at their respective time. It’s only the present state derived from them that may Change.

Sourcing of Truth

Not all of Silo’s buttons used to quack. One of them removed a food label, part of the Labeling and Food Management User Journeys. It would emit two events BUTTON_CLICKED and LABEL_REMOVED. Through our messaging platform, the LABEL_REMOVED Event would be routed and fanned out to the backend tuples of our Actors, each will react to it on its own.

One of Silo’s Actors, the Alexa Skill, had a dedicated view stored in a dedicated database. In order to allow textual queries over the existing Food Inventory, it was stored in ElasticSearch. For it, to consume the LABEL_REMOVED Event would entail removing a food item from the database. 

On the contrary, the physical device only managed a portion of the Food Inventory, only what’s inside Silo food containers. For it, LABEL_REMOVED would not be to remove a food item from the database, but Change the state of a container to UNLABELED.

For the mobile application, it was a combination of the two above. It managed the entirety of the existing Food Inventory, but only some of it was inside food containers. For this, it had its own dedicated view on DynamoDB. A view the Alexa Skill was completely unaware of.

Each of the three used to hold only a portion of the data, and in a dedicated view. Each was tightly coupled to the experience provided, and to the limitations of the user interface. According to our system architecture, future Services will not be restricted from doing the same, choosing their own databases and views. If no one holds it, then where is the source of truth?

The funny thing is, no database of a Service can be a source of “The Truth”. Each one would only store a derived state anyhow. One based on processing business logic which Changes through time as well.

Not coincidentally, we already have the Events themselves who hold a certain truth. And as they already pass through our messaging platform, all we need to do is add another Event Handler that will put them all into a database – as is. We called it The Event Store, a fusion of a Messaging Store and Event Sourcing. But if each database has a dedicated view, what would the Event Store’s view be for?

Replaying Truths

You may have noticed Food Inventory and Food Management, no matter for which Actor and which database, only persist the current/existing food a customer has, and not everything they have ever had. And for good reasons.

The first would be, because it is not a part of our MVP. As time was of the essence, we’ve decided it would be added further down our Roadmap. It even included figuring out what “it” is, so for the time being we just named it Food History. Only once we figure it out product-wise, can we come up with a design. And we already knew no matter what it would be, our system architecture would not Restrict it and us to design a Reliable solution.

The second reason was that even after we implement whatever Food History is, it would never be as a prime function as Food Management. It’s just not as important to know what food you stored 2 years ago, as it is important to know what you have stored now.

It would be an entire new Module in our physical device, an entire new tab/micro-service in our mobile/web application. It would also be an independent backend application. All of them would be mutually exclusive as can be from others to continue to ensure the Reliability of the prime functions.

The only problem we would face would be with Food History’s data, as it is currently stored nowhere. But today, we’ve built Food Inventory from the Events of today. Next year we can just build Food History, from past year’s Events. To get it done, we’d need to do an Event Replay from our Event Store.

Today Food Inventory consumes and processes Events one after the other and in the order of their occurrence. Our consuming logic does so in real time during normal operations, but no one said it is a must. Theoretically speaking, we can code a new consumer and make it consume historical Events instead of the ones currently being emitted. Once it’s done, switch it to consume the real time Events.

To do so is to replay only the relevant historical Events and maintain their order of occurrence. To replay only certain Events by name and only those within a time window. These are the requirements of our Event Store’s and its dedicated view. To answer exactly one SQL query:

FROM EventStore
        EVENT_NAME == 'A'
        OR EVENT_NAME == 'B'

And if there’s exactly one, there is no need for the flexibility of an SQL engine. If we can store Events efficiently in files and correctly sharded folders, we may not even need a database at all. If we’d also recall that Events are immutable, we won’t need to perform all CRUD operations on our storage, only Create and Read. We assume less of this view and store. 

What’s left to do is to flush all our Events running through our messaging platform. As we were on AWS, we connected our SNS message broker to Kinesis Data Firehose with a little coding. It flushed all our Events to very large and unoptimized CSV files in S3, an Object Storage.

Offline and once in a while, files went through a process that sharded and compacted them into parquet files, stored again on S3. Those were later queried by Amazon Athena, which can perform SQL queries over S3 very efficiently over parquets. With Amazon Athena, you pay only when queries are running. Those were to be infrequently done. A few days later, the parquet files would be moved to AWS Glacier, a very cold storage that saves major costs.

Although it seems technically complicated to do, it took us less time than to design and implement Food History. We actually haven’t done the last three steps of the above, as there was still no need to actually query the Event Store. All we knew for sure, that once needed we can get it done. It was an eventually beneficial decision made, and executing it partially was also beneficial. 

[Do notice: coding a bridge between SNS and Kinesis is no longer required]

It allowed us the flexibility to postpone decisions to later further down the road, which allowed us to be an efficient company in both the present and the future. Postponing decisions is also beneficial because it gives us the time to know better. It turned out, we had a fault in this design. One that we’d haven’t executed yet.

Our plan was to code a single Event Handler in NodeJS, that would process the Event Replay and the real time Events. Once switched, it would simply start acting as a Service.  Years later, I met with Nir Gazit, Fiverr’s chief architect. He said that everything is exactly as it should be, everything but the last bit of coding one single Event Handler. He said that no database will be able to sustain billions of operations on it, once we had billions of Events to consume.

Instead, the right way to do so is to replay those very same billions of Events into EMR or Spark, who would build the database’s binary files directly and efficiently. My team over at RapidAPI (2021) said the same, and we’ve done exactly so with RapidAPI’s Source of Truth and via it we’ve rebuilt our ClickHouse database.

Event Replay is not limited to launching a new Service. It could be easily reused for Forward Replay, complete disaster recoveries, short time span delta recoveries and ease data migrations. But most of all, it kept us technically decoupled and unlocked from AWS as a vendor. To migrate away from DynamoDB is hard, but to copy-paste files from one cloud Object Storage to another is fairly easy.

The entire Event Store barely has any Cause to Change. It has only one single purpose, to do Event Replays. It may have some future Technical Causes to Change such as replacing Athena with Presto, which will eventually occur. But it will definitely not have Behavioral Causes to Change because it is not a Product, not customer facing. And until a Cause will rise out of nowhere, the entire Event Store is a reliable component frozen in time

That was not the only data pipeline we had at Silo. There was another one with a much bigger impact. On this, in the next chapter.

Leave a Reply