Naming Kafka objects (I) – Topics

There are only two hard things in Computer Science: cache invalidation and naming things.

Phil Karlton

I started working with Apache Kafka in 2015 when I joined Funding Circle. We were early adopters of the technology (and the architectural approach that comes with it) and had the pleasure (not really) to work with pre-1.0 versions. Since then, I have worked with Kafka in every single company I have been part of.

Fast-forward almost 10 years, and I think I have accrued enough experience to take the risk of proposing solutions to one of the hardest problems in software engineering: naming things.

In this first post, I propose a naming convention for the most important object in a Kafka system: Kafka topics. In subsequent posts, I’ll cover other objects like consumer groups, Stream topologies, and Kafka Connect connectors.

  1. DDD all things
  2. Topic naming convention
    1. [environment]
    2. [visibility]
    3. [topic-type]
    4. [domain].[subdomain(s)]
    5. [record-name]
    6. [key-name]
    7. [topic-version]
  3. Conclusions

DDD all things

I asked ChatGTP for a brief definition of what DDD is (emphasis mine):

Domain-Driven Design (DDD) is a software development approach that focuses on modeling software based on the real-world domain it represents, emphasizing collaboration between technical experts and domain experts to create a shared understanding and ensure that the software reflects the core business logic and needs.

In other words, when applying DDD, you have to adopt the vocabulary of the business and structure your system based on how the company structures (and collaborates) itself. This is very aligned with a sociotechnical approach to software architecture.

I like leveraging DDD to structure systems and communicate intent. Naming things is the cornerstone for both. Names (and namespaces) organise concepts (i.e., structure) while revealing what they are “about” (i.e., intent).

Topic naming convention

Kafka topics contain the information stored and streamed in Apache Kafka. In that sense, they are not too different from other data containers like database tables, messaging queues, S3 buckets, files, etc.

With that in mind, there are three factors to take into account:

  • A topic is named after the content (not after any given consumer or producer). If you think about it, you would never name a database table based on what reads/writes from it. Neither you should do with Kafka topics.
  • The topic names must explicitly convey the topic content (i.e., what data is stored in there). One of Kafka’s selling points is the ability for producers and consumers to exchange information without coupling; topic names that clearly define data content underpin this exchange.
  • Your naming convention must avoid naming collisions. It should support the ability to choose the right name for a topic without worrying that this name might be a token.

With that in mind, the following convention ticks all those requirements:


[environment].[visibility].[topic type].[domain].[subdomain(s)].[record name]-by-[key name]-[topic version]

Let’s break down each component.

[environment]

Mandatory: yes

The environment describes the “logical” environment that a topic belongs to.

Take into account Kafka clusters can be expensive, whether self-hosted (e.g., K8S via Strimzi Operator) or via SaaS (e.g., AWS MSK, Confluent Cloud). If you have a low-volume environment like most pre-production environments, it makes economic sense to “reuse” the same physical cluster for multiple logical environments (e.g., DEV, QA, Sandbox, UAT, Pre-PROD), particularly in an environment where cost savings are king (see https://javierholguera.com/2024/07/02/software-engineering-trends-that-are-reverting-i/).

It also supports ephemeral environments if you achieve the necessary CI/CD maturity.

[visibility]

Mandatory: yes

David Parnas enunciated the principle of “information hiding” in 1972. It’s 2024, and we still haven’t found a better way to reduce coupling and complexity than “hiding what you don’t need to see”; in other words, encapsulation and abstraction are still king to reduce cognitive overload and improve maintainability. This is true for data as well, not just code/behaviour.

With that in mind, it makes sense to categorise topics depending on “who should have access to this data”. Kafka doesn’t have any such taxonomy built-in; instead, you can “play” with roles and ACLs to limit/grant access. However, that is more of an InfoSec approach than naturally categorising topics (and data) based on their nature. One can argue that if topics are properly categorised in terms of visibility, we are one step away from automating ACL access that enforces such visibility at the producer/consumer level.

Another strong reason to categorise visibility is incompatible changes (backward and/or forward). When we know they apply to a topic with limited visibility, we can more easily manage the blast radius of such changes.

I propose the following visibility options.

ValueDescription
ExternalFor topics used to exchange data with systems outside yours (e.g., third parties, SaaS, customers) if they support a Kafka-based interface.

These topics require strong backward and forward compatibility (unless otherwise negotiated with the other parties).
SharedTopics that are visible to all other components of the system between system areas (similar to internal REST APIs).

These topics require strong backward and forward compatibility
InternalTopics that should only be accessed within the same DDD domain or bounced context.

Ideally, that would be a limited number of well-known applications that can be maintained and deployed together.
PrivateTopics that are owned by a single application. They are similar to a private database/table that supports the functioning of a single microservice.

Compatibility is decided on a per-application basis by the team maintaining it.

[topic-type]

Mandatory: yes

Not all messages are born equal. Clemens Vasters has a good introduction to different types of messages in his “Events, Data points and messages” talk from 2017. While Kafka is not the best option for some of these types of messages, we all end up “abusing” it to an extent, if anything, because having multiple messaging technologies in your stack might be worse than stretching one of them beyond its optimal use case.

With that in mind, and without going to the level of granularity that Clemens proposes, below covers the “most common” types. Choosing the right type helps convey a lot of meaning about the nature of the data that is stored in the topic.

Message Typetopic-type valueDescription
CommandscommandA command conveys the intention to change the state of the system (e.g., “ConfirmPayment”).
EventseventAn event captures something that has happened and is being broadcasted for other systems to consider (e.g., “Payment Confirmed”).
EntitiesentityAn entity groups fields that represent a single, consistent unit of information.

When you design event-first architectures, events are modelled first (to support business processes), and entities “emerge” from them to group information, generally around DDD boundaries (for example, “Payment”, “Customer”, “Account).
Change Data CapturecdcCDC topics define a stream of changes happening to the state of a given entity, commonly represented as a database table. In other words, they are a stream of CRUD changes to a table.

Capturing the before/after of the affected entity simplifies downstream consumers.
NotificationsnotificationUnlike a normal event, they don’t contain all the information that was captured as part of the event happening. Instead, they contain a link back to a place where the information was captured.

They are useful when the payload would be “too big” to embed in the event, so instead, the consumer has the option to “trace back” to independent storage like S3 for full details.

[domain].[subdomain(s)]

Mandatory: yes

This part leverages DDD (or any other approach that you choose to organise your system) to guarantee there aren’t any naming collisions.

It is also quite powerful as a first layer to navigate the hundreds (or thousands) of topics in your system. If you combine visibility AND domain/subdomains(s), you can quickly identify the appropriate stream containing the information that your consumer requires.

For example, if your consumer needs to react to a payment being confirmed, the following topic would be quite easy to find.

prod.shared.event.payments.reconciliation.payment-confirmed-by-payment-id

You can search by the environment, visibility, and domain/subdomain and list a few dozen topics at most that will be part of the “reconciliation” domain before finding the right one.

[record-name]

Mandatory: yes

A descriptive name for the data stored in the topic is similar to naming database tables.

The record name “grammar” will be influenced by the topic type.

Topic typeRecord nameExample
CommandsPresent Tense + NounSendEmail, ApproveOrder
EventsNoun + Past TensePaymentReceived, AccountClosed
EntitiesNounOrder, Account, Payment, Customer
Change Data CaptureTable Name
NotificationsNoun + Past TenseLike ‘events’

[key-name]

Mandatory: yes

The name of the field that serves as the key for the record. It is VERY important to clearly identify what field in the message payload is used as the key to partition records in the topic.

Remember that in Kafka, strict ordering is only guaranteed within the partition (and not guaranteed across partitions in a given topic). Therefore, whatever key is used to partition will be the ordering criteria.

For example, if we have an “entities” topic partitioned by the entity’s natural key, we know all changes to that entity will be strictly ordered. If, for any reason, we chose a different key, the same changes to that entity might be “consumed” in a different order than they happened across different partitions within the topic.

[topic-version]

Mandatory: no

The final field is only here because incompatible changes DO happen despite our best intentions to avoid them.

When we face such change, we normally need to follow a process like this:

  1. Create a second version of the topic topic (i.e., -v2).
  2. Start a migrating topology that upcasts messages from -v1 to -v2 continuously.
  3. Point all our consumers to the new -v2 topic.
  4. Once the consumers are up-to-date, migrate all our producers to the -v2 topic.
  5. Remove -v1 topic.

Since this situation shouldn’t happen for most of your topics, this name component is optional (until you need it).

Conclusions

It is not a simple naming convention, and although it is unlikely, you must be careful not to overrun the topic name length limit (255 characters).

However, if you follow it thoroughly, you will find that what would be otherwise a mess of topics where it is impossible to find the right information becomes a structured, well-organised, self-serving data platform.

3 thoughts on “Naming Kafka objects (I) – Topics

Leave a comment