A few broad rules about microservices

I know, we don’t like microservices anymore, and they are out of fashion. However, I still think about how I was trying to build them until the sudden change in industry fashion convinced me that building modular monoliths was entirely different from how monoliths were meant to be built in the past.

In this post, I want to reflect on a few broad roles I follow(ed) when building microservices. I believe broad rules and flexible heuristics are appropriate when making architectural decisions because context (e.g., business, technical, human) is more critical than (blindly) following rules.

  1. What is a microservice?
  2. Rule 1 – Teams don’t share microservices
  3. Rule 2 – Microservices don’t share (private) data storage
    1. Public vs private data storage
    2. Avoiding resource contention
  4. Rule 3 – Avoid distributing transactions through the network
  5. Rule 4 – Network latency adds up
  6. Rule 5 – No Service to Service calls between services
  7. Conclusions

What is a microservice?

We should first qualify what a microservice represents.

Most people equate a microservice with a single service deployable. For example, my microservice is a REST API web server receiving traffic on port 8080. This is not the right way of thinking about microservices because microservices weren’t invented as a technical concept but as a sociotechnical one, a solution to the organisational scalability problem (in a technical context, nonetheless).

The rules that define microservice boundaries cannot be purely technical (albeit this is a crucial factor); they also need to incorporate aspects like team structure, cognitive load, product boundaries, geographical boundaries, etc.

Therefore, it becomes easier to define what a microservice is not:

  • A microservice is not a single process (e.g., a single JVM running in a container).
  • A microservice is not a single deployable (e.g., a single Docker image).
  • A microservice is not defined in a single repository (e.g., Github).

On the contrary, the following things are perfectly possible:

  • One microservice runs as multiple processes because, in its Kubernetes pod, we also have sidecar containers performing roles like Service Mesh, Ambassador, etc.
  • Multiple deployables working in coordination define one microservice. For example, in a CQRS or Event Sourcing architecture, you might separate your command handler and/or writer model producer from your query handler and/or event consumer.
  • If we choose to use a company-wide mono repo or product mono repos (all parts of a given product), one repository contains multiple microservices.

To summarise these rules, we can say:

A Microservice is an application (i.e., a logical boundary for behavior represented by code and state resources) that is developed, deployed and maintain together.

Rule 1 – Teams don’t share microservices

In other words, every microservice has only one team that owns it.

This principle doesn’t rule out any of the following:

  • Multiple teams (and their products) depending on a given microservice. This is fine because it doesn’t change the fact that only one team owns the microservice.
  • Multiple teams contribute to the microservice code. This, too, is okay, provided the team that owns the microservice is happy to support some kind of coordinated contribution process (like the Inner Source model).

For reasons of accountability, only one team must own a service. We trust teams to build services and run them in production; in exchange, we must empower them to make the right decisions (within whatever architectural framework the company has adopted).

This empowerment is voided if it doesn’t come with the necessary accountability for the consequences of those decisions. “Sharing” ownership, at a minimum, deludes accountability; at worst, it completely prevents it.

Rule 2 – Microservices don’t share (private) data storage

Private data storage is where a single microservice stores information for execution. This can be master data for lookups, historical payments data for de-duplication and/or state validation, etc.

Microservices must be able to change the schema of their data storage freely; after all, it is their private concern.

We achieve this by ensuring that two microservices never share the same private data storage.

Public vs private data storage

It is important to differentiate what private data storage is and when we see it as public.

Private data storage is created to support the functionality of a microservice; when the microservice logic changes, the storage (might) also change. Private data storages contain private data designed to be consumed by the microservice but not shared or available to other microservices. Private data storages don’t make any promises regarding schema changes (e.g., backward compatibility, forward compatibility) beyond what the single microservice requires. To sum up, private data is an implementation concern.

Public data storage is the opposite of what is described above: it is designed for sharing, uses schemas with compatibility guarantees, and is easy to consume. In other words, public data is an API.

The following table contains some examples of public and private data storage:

StorageTypeDescription
Private Kafka TopicsPrivateAkin to tables in a private database.

For an explanation of what a “private topic” means, see: https://javierholguera.com/2024/08/20/naming-kafka-objects-i-topics/#visibility
Shared/External Kafka TopicsPublicAkin to REST API endpoints.

For an explanation of what a “shared/external topic” means, see: https://javierholguera.com/2024/08/20/naming-kafka-objects-i-topics/#visibility
Private Blob Storage (S3/Azure Blob Storage) foldersPrivateOnly used by a single microservice and/or application
Public Blob Storage (S3/Azure Blob Storage) foldersPublicProduced by one microservice, available for others to read.
Relational databasesPrivateOnly used by a single microservice and/or application.
NoSQL databasesPrivateOnly used by a single microservice and/or application.
Like-for-Like shared/external Kafka topics sunk into databasesPublicThis is a particular case of the above.

If a topic producer decides to offer a “queryable” version of the same data as a (SQL/NoSQL) database and it is captured as a like-for-like sink
of a shared/external topic, it is public because:

– Its schema will follow the same compatibility rules as the Kafka
topic(s).

– Its database (and tables/containers) are provisioned exclusively for
sharing purposes, not as a private concern of any specific
microservice.

Avoiding resource contention

Another reason for separating microservices databases is to avoid resource contention.

In a scenario where two microservices share the same database (or other infrastructure resources), it can run into Noisy Neighbour antipattern problems:

  1. Application A receives a spike of traffic/load and starts accessing the shared resources more intensely.
  2. Application B starts randomly failing when it cannot access the shared resources (or it takes longer than it can tolerate, leading to timeouts).

Ensuring every microservice accesses independent resources guarantees we don’t suffer these problems.

This principle can lead to increased infrastructure costs. For that reason, it is perfectly reasonable to consider the following exceptions:

  • Reuse underlying infrastructure in environments before the production environment, where the consequences of occasional resource contention are not particularly worrying.
  • Reuse underlying infrastructure between services whose volume is expected to be low. As long as the microservices aren’t coupled at the logical level (i.e., the data itself, not the storage infrastructure), it is relatively easy to “separate” their infrastructure in the future if required (compared to separating them if coupled at the logical schema level).

For the last point, I would advise against doing this with microservices shared by somewhat distant “organizationally” teams (e.g., crossing departments or division boundaries, minimum timezone overlap, or any other barrier that prevents fluid communication).

Rule 3 – Avoid distributing transactions through the network

I always recommend considering DDD heuristics to drive your microservice design. I use the DDD “Aggregate Root” concept to help me model microservices and their responsibilities. DDD defines “Aggregate Root” as follows:

An Aggregate Root in Domain Driven Design (DDD) is a design concept where an entity or object serves as an entry point to a collection of related objects, or aggregates. The aggregate root guarantees the consistency of changes being made within the aggregate by forbidding external objects from holding references to its members. This means all modifications within the aggregate are controlled and coordinated by the aggregate root, ensuring the aggregate as a whole remains in a consistent state. This concept helps enforce business rules and simplifies the model by limiting relationships between objects.

An aggregate root should always have one single “source of truth”, i.e., one microservice that manages its state (and modifications). We want this because it means we avoid (as much as possible) distributing transactions over multiple services (through the network).

The alternative (i.e., distributed transactions) suffers from a variety of problems:

  • Performance problems when leveraging Distributed Transaction Coordination technology like XA Transactions or Microsoft DTC (i.e., 2-phase commits).
  • Complexity when using patterns like Saga pattern and/or Compensating Transaction pattern.

Designing your Aggregate Roots perfectly doesn’t guarantee you won’t need some of those patterns. However, it will minimise how often you need them.

In summary, if your microservice setup splits an aggregate root, you are doing it wrong; you should “merge” those two services.

Rule 4 – Network latency adds up

Crossing the network is one of the slowest operations. It also introduces massive uncertainty and new failure scenarios compared to running a single process in the same memory space. Jonas Boner has a fantastic talk about the “dangers” of networks’ non-deterministic behaviour compared to the “consistency” one can expect from in-memory communication.

This is true when you call other microservices (e.g., directly via REST or indirectly via asynchronous communication) and when talking to external infrastructure dependencies like databases.

When considering “dividing” your system into multiple microservices, consider the impact on end-to-end latency against any non-functional requirements for latency (e.g., 99th percentile latency).

Rule 5 – No Service to Service calls between services

This rule only applies if you are following a strict “Event Driven Architecture”. Even if that scenario, there will be cases where S2S calls will be “necessary” to avoid unnecessary complexity.

One of the benefits of microservice architecture is the decoupling that we get from services that
depend on each other indirectly. In a monolith, all modules live and fail together, causing a large “blast radius” when something goes wrong (i.e., the whole thing fails).

In traditional microservices (e.g., sync communication based via REST/HTTP or gRPC), there is a decoupling in “space” (i.e., the services don’t share the same hardware). However, they are still coupled “in time” (i.e., to an extent, they all need to be healthy for the system to perform). Some patterns, like circuit breakers, aim to mitigate this risk.

Avoiding S2S calls breaks the couple “in time” by introducing a middleware (e.g., message broker, distributed log) that guarantees producers and consumers don’t need to be online simultaneously, only the middleware. This middleware software tends to be designed to be highly available and resilient to network and software failure. For example, Kafka has some parts verified using TLA+.

To sum up, “forcing” microservices to communicate asynchronously causes teams to consider their architecture in terms of:

  • Eventual consistency
  • Asynchronous communication
  • Commands and events exchanged between them

This leads to more resilient, highly available systems in exchange for (potential) complexity. If you follow the principles of the Reactive Manifesto, you’ll consider this a staple. However, it might feel technically challenging if you are used to n-tier monoliths sitting on an extensive Oracle/SQLServer database.

Conclusions

There are a few hard rules that one must always follow in anything related to building software. It is such a contextual activity that, for every question, there is almost always an “It depends” answer. That said, having a target architecture, a north star that the team collectively agrees to aim for, is good. When it is not followed, some analysis (ideally recorded for the future) should be done about why a decision was made against the “ideal” design.

In this post, I proposed a few rules I tend to follow (and recommend) when building microservices. Sometimes, it will make sense to break them; however, if you find yourself breaking them “all the time”, you might not be doing microservices in anything other than the name (and that, too, could be okay, but just call it what it is :))

Software engineering trends that are reverting (I)

When I entered the software industry a long time ago, people who had been part of it warned me that software trends came and went and eventually returned. “This thing that you call ‘new’, I have seen it before”. I refused to believe it. Like a wannabe Barney Stinson, I thought ‘new’ was always the way.

I have been around long enough to see this phenomenon with my own eyes. In this series of posts, I want to call out a few examples of “trends” (i.e., new things) that a) aren’t new anymore and b) people are walking away from. The series starts with one of the most “controversial” trends in the last 10-15 years: microservices!

Microservices are so 2010s

Microservices are dead! I’m joking. They are not dead, but they are not the default option anymore. We are back to… monoliths.

While there have always been people who thought microservices weren’t a good idea, the inflexion point was the (in)famous blog post from Amazon Prime video about replacing their serverless architecture with a good, old monolith (FaaS is just an “extreme” version of microservices).

Why was this more significant than the thousands of posts claiming microservices were unnecessary complexity, talking about distributed monoliths and criticising an architectural approach that came from FAANG and only suited FAANG? Well, because… it came from FAANG. The haters could claim that even a FAANG company had realised microservices weren’t a good idea (“We won!”).

Realistically, this would have been anecdotal if it weren’t for something more important than a bunch of guys finding a way to save money when they serve millions of daily viewers (do YOU have THAT problem?).

It’s the economy, stupid

US FED interest rate since 1990

The image above shows the US FED official interest rates. Historically, interest rates have been pretty high (about 5%, according to a recent interview with Nassim Taleb on Bloomberg). From 2008 to post-COVID 2022, we experienced an anomaly: close to 0% rates for almost 15 years. Investors desperate to find good returns for their money poured billions on tech companies, hoping to land the next Google or Facebook/Meta.

Source: https://goingdigital.oecd.org/en/indicator/35

Lots of startups with huge rounding funds started to cosplay as future members of the FAANG club: copy their HR policies, copy their lovely offices, and, of course, copy their architectural solutions because, you know, we are going to be so great that we need to be ready, or we might die of success.

We all built Cloud Native systems with Share-Nothing architectures that followed every principle in the Reactive Manifesto and were prepared to scale… to the moon! 🚀 Microservices were the standard choice unless you were more adventurous and wanted to go full AWS Lambda (or a similar FaaS offering) and embrace FinOps to its purest form.

The only drawback is that it was expensive (let’s ignore complexity for now, shall we?). That didn’t matter when the money was flowing, but now the music has stopped, and everybody is intensely staring at their cloud provider bill and wondering what they can do to pay a fraction of it.

What is next?

Downsizing all things.

BeforeAfterComment
Microservices/FaaSMonolith(s)“Collapse” multiple codebases into one and deploy as a single unit.

The hope is that teams have become more disciplined at modularising (unlikely) and “build systems” have become more efficient in managing large codebases (possibly).
Messaging (Kafka et al)Avoid middleware as much as possibleMiddleware is expensive technology. With monoliths, there will be fewer network calls that require it.

Direct communication (e.g., HTTP, gRPC) will be the standard (again) when necessary. Chuckier monoliths will reduce network traffic compared to microservices
NoSQLRelationalMany NoSQL databases optimise for high throughput / low latency / high durability, which will happily be sacrificed for cost savings. Relational databases are easier to operate and run yourself (i.e., self-host), which is the cheapest option (some NoSQL, like CosmosDB or DynamoDB, can’t be self-hosted).

On the complexity side, relational databases are seen as easier for developers to understand (until you see things like this).
Stream ProcessingGone except for truly big dataStream Processing is expensive and complex. Most businesses won’t care enough about latency to pay for it, nor will have volumes that require it.
KubernetesCloud-specific container solutionsWe should see a transition towards more “Heroklu-like” execution platforms. It will be a tradeoff between flexibility (with K8S offers bucketloads) and cost/simplicity.

Sometimes, containers will be ditched too and replaced by language-specific solutions (like Azure Spring Apps) to raise the abstraction bar even higher.
Multi-region / Multi-AZ deploymentsNo multi-region unless compliance requirement.
Fewer multi-AZ deployments
Elon has proved that a semi-broken Twitter is still good enough, so why wouldn’t companies building less critical software aim for 3-5 9s?
Event-Driven ArchitectureHere to stayThis approach isn’t more or less expensive than Batch Processing (if anything, it’s cheaper) and still models business flows more accurately.

What are we gaining and losing?

Microservices are neither the silver bullet nor the worst idea ever. As with most things, they have PROs and CONs. If we ditch them (or push back harder against their adoption), we will win things and lose things.

What do we win?

  • It is easier to develop against a single codebase.
  • Local testing is simpler because running a single service in your machine is more straightforward than running ten. Remote testing is also more accessible, as hitting one API is less complicated than hitting many across the network.
  • It is also easier to deploy a single service than many.
  • Easier maintainability/evolvability. When a business process has been incorrectly modelled, it is easier to fix on a monolith (with, ideally, single data storage) than across many services with public APIs and different data storages.

What do we lose?

  • Once a codebase is large enough, it is tough to work against it. Software is fractal, which is also valid for “build systems”: you want to divide and conquer.
  • Deploying a single service can be more challenging if multiple people (or, even worse, teams) need to release changes simultaneously. More frequent deployments can alleviate the problem, but most companies don’t go from a dev branch to PROD in hours but days/weeks.
  • The blast radius for incorrect changes will be higher. Systems are more resilient when they are appropriately compartmentalized.
  • Organisations growing (are there any left?) will struggle to increase their team’s productivity linearly with the headcount when the monolith becomes the bottleneck for all software engineering activities.
  • FinOps and general cost observability against business value will massively suffer. A single monolith will lump everything together. With multiple teams involved, it will be harder to understand who is making good implementation decisions and who isn’t, as the cost will be amalgamated into a single data point.

Summary

Microservices are not dead. However, they are suspicious because they are expensive in terms of infrastructure cost and, indirectly, engineering hours due to their increased complexity. However, they are also crucial to unlocking organisational productivity as the engineering team grows beyond a bunch of guys sitting together.

As the industry turns its back to FAANG practices and we sacrifice various “-ilities” on the altar of cost savings, the future of microservices will be decided based on how often we identify when they are the absolute right solution and how well we articulate its case. When in doubt, the answer will be (and perhaps it should have always been) ‘NO’.

As a parting thought, I have been involved in 3 large-scale monolith refactors/rewrites to microservices. All these projects were incredibly complex, significantly delayed and more of a failure than a success (some never entirely completed). Starting with a monolith is, most of the time, the correct answer. However, delaying a transition to smaller, independent services is almost always as bad (if not worse) than starting with microservices would have been in the first place. We are entering a new era where short-time thinking will be even more prevalent than before.