03 The Change Factor, 05 Future Changes

Full Picture: A System Architecture

Reading Time: 10 minutes

In the previous chapter, we went through Silo’s customer-driven requirements, derived from their expectations towards an eventually consistent experience, along with extremely high Reliability of critical and primary functions. 

In the chapters before that, we’ve come up with a candidate for a system architecture. One to answer most of the customer driven requirements by focusing on Cohesion and Reliability. We ended up with a few knots to tie, specifically in the canonical message format.

In this chapter, we’re going to tie it all together. The product requirements and the technical architecture. By tying its loose ends, we’ll no longer have a candidate for it, but finally have something we can call a system architecture. Hopefully.

Recap

If we recall, Food Management is a closed and abstract set of User Journeys decomposed to multiple independent applications. It relies on Food Inventory, data currently persisted as a table inside a database. It might be directly shared between all of our backend tuples, or indirectly through an encapsulating microservice.  

Operationally speaking, while Food Inventory is down, either the microservice or its database, the Food Management experiences for both Actors will no longer function. It negates our Cohesion and Reliability and our goal of making sure Silo’s physical device will be as Reliable as a kitchen appliance.

Throughout this series, we’ve tried overcoming this issue with component reusability, for each Service to have an instance of a microservice, and use messaging to maintain consistency between them. Turns out microservices and persistence combined are extremely expensive, which creates a conundrum where it’s cheaper and more maintainable for it to be a Service instead.

From the Change perspective, in the above design exists a non-beneficial split between the Services and the microservices. If we look at the Change Directions for Behavioral Causes, we’ll see any Change going all the way to the data encapsulated in the database, through the microservice. Eventually leading to Instabilities affecting multiple Actors together, against our customers’ expectations.

It will frequently happen because the Food Management experience relies on the Food Inventory data. A Change to the experience is a Change to the data structure. Redundant work and deployment of incorrectly split applications is something we call fragmentation, and it is a source of Inefficiency. It is somewhat expected because it’s an outcome of something shared between mutually exclusive applications.

The above design is also missing something critical, as the design actually requires several database instances, at least one per Service. Coded into Services, experiences and User Journeys not only rely on data but have their own data, a state to persist. As an application should be stateless in favor of scaling, the state of the experience would be persisted somewhere, most likely a database. 

Dedication

From before, we do have our candidate for a system architecture. A messaging based solution enabling our applications to share nothing without Restricting each one’s design. It avoids and overcomes not only the above issues, but all the faults in design for multiple Actors. It is extremely important to notice we are not blindly using it. The only reason we can is because our customers’ expectations are set to have an eventually consistent experience between our Actors. Let’s apply it now.

As a few seconds delay in consistency between Actors is expected by our customers, we can safely add a messaging platform in between them. As for the expectancy to have a smooth experience while a customer interfacing with a single Actor, it is achieved as well. Each Actor-backend tuple will have its own dedicated database, with hard-consistency if its design requires it.

As our system architecture does not Restrict any use of any kind of database, each tuple can have the one which suits its needs best. A requirement we actually have because each Actor has its own unique user interface, each requires its own dedicated view of Food Inventory. Each Service can also combine it with the User Journeys’ state, and it would still remain a dedicated view.

When it comes to scaling, our messaging platform will be scaling independently on its own. Scaling our storage, either in size or query throughput, is something a database already takes care of for us. Our databases will take care of persistence for us, exactly what messaging does not do.

Our backend applications will have no Cause to Change together. Our Actors shouldn’t Change together, and being physically deployed to other planes they can not Change together. As such, neither will their backend tuples. The Force of Change in action, one entity’s Change in another’s Cause.

Although our system architecture does not Restrict it, there is no apparent need for an additional layer of microservice applications to encapsulate the data. There is one less application to Change and deploy. As Inefficiencies were avoided in our development workflow and because we are better aligned with the Change Stream, it is an eventually beneficial system architecture.

It is also a system architecture which avoids Microservices Death Star, avoids a big ball of mud. Specifically, it avoids the need for an engineer to locally launch an entire system made of tens and hundreds of applications. Only just one portion of it at a time. Maybe even just one application at a time. Another Inefficiency avoided in the development workflow.

Canonical Events

Unfortunately we’ve seen they are not entirely decoupled. The {CHANGED} messages themselves still couple each Actor’s dedicated data view to one another. Our system architecture does allow us to overcome it, by fanning out our Actor’s {REQUEST} message to our multiple backend applications. The only problem remaining is {REQUEST} and {CHANGED} are two completely different messages. How we aligned them, came from somewhere else entirely.

At Silo, during the PoC phase, we also needed to gather how our customers are interacting with  the physical device’s interface. To do so without being able to watch them at all, for the simplest reason that we are in Israel and the devices are inside a US based household.

For example, we needed to know how many times the label button was touched. From it, to learn whether they use and enjoy the experience we provide. From our experience with mobile and web analytics, we thought our physical device should send an Event to our backend for later processing.

Physical devices, almost since forever, are modeled and coded as state machines. First, because Products, Behavioral Applications, are state machines. But more so, because hardware Events needs to be ignored when a user incorrectly clicks a button. For example, during an idle state, the OFF BUTTON will turn off the device. Something we wouldn’t want done during the labeling experience. So if the OFF BUTTON is touched during, the Event would be ignored because the machine is not in the idle state.

In a way, we had no need to code in new Events for usage tracking, as they were already there. Just “waiting” to be sent to our servers. And here’s the catch. The application on the physical device itself is reacting to its own Events. Sent to our servers, it allows our backend applications to react to those as well. 

The thought led us to The Reactive Manifesto, and to the world of Event Driven Architecture. Both were infused as the canonical message of our Enterprise Messaging Architecture. It results in further decoupling our applications and aligning both the {CHANGED} and {REQUEST} messages into a single canonical structure of an Event.

An Event which includes all the available sensory data is an independent message, one each backend application can independently consume. For each to individually transform it into its own dedicated view. No longer the {CHANGED} messages are coupled to the data and to the database it is stored in.

On the contrary, not all Events need to be fanned out. Some should only reach one application, just like an HTTP request. And the {REQUEST} and {REPLY} are not Events but HTTP payloads. With an enterprise messaging pattern called Request-Reply, they were replaced with a tuple of Events named {XXX_REQUESTD} and {XXX_REPLIED}. As our messaging platform allows all kinds of applications to communicate with one another, with some supported routing they have reached only the one application the Events were addressed to.

Unfortunately, a major problem has arised. If we notice carefully, an Event produced from Actor I, will be asynchronously but directly consumed by three backend applications, and not only by its tuple. It shouldn’t have been such a great concern, only that Actor I is Silo’s physical device where its gradual Rollout could be months and years. If for some reason, we’d deploy a version with an Instability and broke the Event contract, the three backend tuples would be blasted with it. And as the Rollout is slow, it will take a long time to recover from it.

Instead, we made the backend applications completely unaware of the fact that each serves another Actor. We can say the backend encapsulates or hides the Actor behind them. Our backend applications would only be reacting to each other, and each one to its own tupled Actor.

Extensibility

We’ve talked a lot about the difference between design and architecture. If we notice carefully, the change done above on our design has not changed the system architecture at all. It still remained a Publish/Subscribe pattern implemented with a messaging platform, with an Event as a canonical message. Exactly what is expected of a system architecture, to not Restrict designs.

From our messaging platform perspective, as long as any application produces and consumes Events, it can participate in whatever experience or User Journey we make. It is agnostic to what the application does and to which physical plane it is deployed to. It constitutes each and every application to be an Event Handler, whether an Actor or a backend application.

[It was an idea somewhat based on Martin Fowler’s thoughts on Event Collaboration, which a year or two later emerged as well in the Data Mesh Architecture of Zhamak Dehghani]

Actor II, our mobile application, was coded in React Native. Our future web application Actor would have been coded in React. Both adhere to The Reactive Manifesto, and have emitted Events as well. To make our backend applications easier to develop, we coded a framework named EventHandler. It encapsulated every Event-driven related issue, and made coding the consumption logic almost entirely similar to processing an incoming HTTP request.

Not all applications are state machines or reactive. One was our third Actor, the Alexa Skill. It was made into sending and receiving Events. It was a Lambda function directly invoked by an incoming HTTP request from 3rd party Alexa servers. It took a little effort to transform it to an ALEXA_SKILL_INVOKED Event emitted into our EventHandler framework. And after fulfilling the HTTP request, an ALEKA_SKILL_ENDED Event was emitted as well.

We had a few more backend applications, serving other Actors and business requirements. For example, one was called Household Service. Its responsibility was to manage the entity relationship between physical objects such as containers, devices, mobiles, and customers in the household who owns them. Its main usage was for a mobile tab to view the products you own. As it was expected to be infrequently used, we had chosen AWS Aurora as the database.

We had a future requirement in mind, for our product/sales analytics who may perform future unknown SQL queries for reporting, once in a while. Would they query the encapsulating Household Service or perform the queries directly on its database? Would we provide them a dedicated read replica to refrain them from being a noisy neighbor? Or would it be an entire new Service and database suited exactly for their needs? Whichever we choose in the future, our system architecture Restricts no design from happening.

Freezer

Our abstract set of User Journey, the Food Management will grow to be more than just one User Journey and experience, because it is also an entire business domain. One experience of it, the Labeling, was more critical than querying the Food Inventory.

There is nothing preventing labeling to be its own mutually exclusive Module in each Actor, and nothing preventing its tupled backend application to be mutually exclusive from others as well. As a result, we’d have a closed and small set of Modules and applications that are the critical function. In these ones we can deeply invest our efforts towards Reliability, while others do require less so. For example, only the first would have multiple concurrent instances running for high availability. For others, just one instance would be enough. Costs reduced.

When an entire User Journey is decoupled from all others and its abstract parent, all who participate in it will have the potential to be frozen for much longer. Eventually, the limited User Journey itself will no longer have a Cause to Change, because there would be nothing new to it. Once reached, they would remain frozen for a long long time, gaining an even higher Reliability as barely any Change will fuel their evolutionary processes.

We had two more Event Handlers, completely different from others. One for tracing and monitoring, the other to answer a very important question: if each and every Service holds a dedicated view of the truth, which of them is the source of truth? Or better yet, what is the truth? On this, in the next chapter.

Leave a Reply