In a previous post in this “naming” series, we discussed how to name Kafka topics. The intention was to name them semantically meaningfully while avoiding collisions and ambiguity. The focus was on the “nature” of the topics’ data.
This post will discuss the two main “clients” connected to those topics: producers who write data into them and consumers who read data from them. The focus will move away from the “data” towards the applications involved in the data flow.
Producers first
Nothing can be consumed if produced first; therefore, let’s start with producers.
They do a very “simple” job:
- Pull metadata from the cluster to understand which brokers take the “leader” role for which topics/partitions.
- Serialise the data into byte arrays.
- Send the data to the appropriate broker.
In reality, it is much more complicated than this, but this is a good enough abstraction. Out of the dozens of configuration settings producers support, only two settings accept a “naming convention”.
client.id
The Kafka documentation defines client.id as follows:
An id string to pass to the server when making requests. The purpose of this is to be able to track the source of requests beyond just ip/port by allowing a logical application name to be included in server-side request logging.
We want a naming convention that makes mapping Producer applications to domains and teams easy. Furthermore, these names should be descriptive enough to understand what the Producer application aims to achieve.
Organising your JMX metrics
There is also an extra role that client.id plays that people tend to forget: it namespaces observability metrics. For example, producers emit metrics under the following JMX MBean namespaces:
kafka.producer:type=producer-metrics,client-id={clientId}kafka.producer:type=producer-node-metrics,client-id={clientId},node-id=([0-9]+)kafka.producer:type=producer-topic-metrics,client-id={clientId},topic={topic}
Notice how all of them use clientId as part of the namespace name. Therefore, if we don’t assign meaningful values to client.id, we won’t be able to distinguish the appropriate metrics when multiple producers consolidate their metrics into a single metrics system (like Prometheus), especially if they come from the same application (i.e., 1 application using N producers).
client.idalso regularly features in other observability components like logs.
Naming convention
The proposed convention looks like this:
[environment]-com.[your-company].[domain].[subdomain(s)].[app name].[entity/event name]
| Component | Description |
[environment] | (Logical) environment that the producer is part of. For more details, see https://javierholguera.com/2024/08/20/naming-kafka-objects-i-topics/#environment |
com.[your-company] | Follows a “Java-like” namespacing approach to avoid collisions with other components emitting metrics to the centralised metric database |
[domain].[subdomain(s)] | Leveraging DDD to organise your system “logically” based on business/domain components. Break down into a domain and subdomains as explained in https://javierholguera.com/2024/08/20/naming-kafka-objects-i-topics/#domain-subdomain-s |
[app-name] | The app name should be specific enough to make it easy to find the codebase involved and the team that owns it. |
[entity/event-name] | Describes what information the producer is sending. It doesn’t need to include the full topic name since the context is already clear (e.g., payment, transaction, account). This field is not mandatory. |
Why do we need an entity/event name?
When your application has multiple producers, client.id needs to be unique for each one. Therefore, the ‘entity/event’ in the last section of the client.id name disambiguates them. You don’t need to define an entity/event name if you only use one producer for the application.
Don’t we need a ‘version’ part?
Other naming conventions define a ‘version’ as part of their respective names. This is only necessary when the client is related to state; for example, Consumers and Streams apps must store committed offsets.
Producers, on the other hand, are completely stateless. Adding a ‘version’ part would only make sense if we keep multiple Producer application versions running side-by-side. Even then, one would argue that versioning the application itself would be a better strategy than versioning the Producer client.id
transactional.id
The Kafka documentation defines transactional.id as follows:
The TransactionalId to use for transactional delivery. This enables reliability semantics which span multiple producer sessions since it allows the client to guarantee that transactions using the same TransactionalId have been completed prior to starting any new transactions. If no TransactionalId is provided, then the producer is limited to idempotent delivery. If a TransactionalId is configured, enable.idempotence is implied
There are a few “small” differences between client.id and transactional.id:
client.iddoesn’t need to be unique (but I strongly recommend it).transactional.idMUST be unique.client.idis more “visible” towards developers (through O11Y).transactional.idis mostly opaque, operating behind the scenes in the transaction management subsystem.client.idcan change, although it would make your O11Y information very confusing.transactional.idMUST be stable between restarts.
Other than that, there is nothing special about transactional.id so I recommend using the same naming convention that I have proposed for client.id in the section above.
Consumers second
We have sorted consumers and they are happily producing data. It’s time to look at the other side: consumers.
They too do a very “simple” job:
- Get a bunch of topic/partitions assigned as part of the consumer group partition assignment process.
- Connect to the brokers acting as leaders for those topic/partitions.
- Regularly (attempt) to pull new data from the assigned topic/partitions.
- When there is something available, read it (as byte arrays) through the connection.
- When it arrives to the application space, deserialise the data into actual objects.
A few configuration settings play a roll in this process.
group.id
The Kafka documentation defines group.id as follows:
A unique string that identifies the consumer group this consumer belongs to. This property is required if the consumer uses either the group management functionality by using subscribe(topic) or the Kafka-based offset management strategy.
We want a naming convention that makes mapping Consumers applications to domains and teams easy. Furthermore, these names should be descriptive enough to understand what the Consumer application aims to achieve.
The proposed naming convention is as follows:
[environment]-com.[company-name].[domain].[subdomain(s)].[app name].[entity/event-name]-[version]
| Component | Description |
[environment] | (Logical) environment that the consumer is part of. For more details, see https://javierholguera.com/2024/08/20/naming-kafka-objects-i-topics/#environment |
com.[your-company] | Follows a “Java-like” namespacing approach to avoid collisions with other components emitting metrics to the centralised metric database |
[domain].[subdomain(s)] | Leveraging DDD to organise your system “logically” based on business/domain components. Break down into a domain and subdomains as explained in https://javierholguera.com/2024/08/20/naming-kafka-objects-i-topics/#domain-subdomain-s |
[app-name] | The app name should be specific enough to make it easy to find the codebase involved and the team that owns it. |
[entity/event-name] | Describes what information the producer is sending. It doesn’t need to include the full topic name since the context is already clear (e.g., payment, transaction, account). This field is not mandatory. |
[version] | Only introduce or change this value if you need to run side-by-side versions of the app or simply start from scratch. Format: vXY (e.g., ‘v01’, ‘v14’). This field is not mandatory. |
Why do we need an entity/event name?
When your application has multiple consumers, it needs a unique group.id for every one of them. Therefore, the ‘entity/event’ in the last section of the group.id name should disambiguate between them, and it becomes mandatory.
You don’t need to define an entity/event name if you only use one consumer for the application.
Why versioning the group.id value?
The Kafka Consumer uses group.id to define a consumer group for multiple instances of the same application. Those instances collaborate within the group, sharing partitions, picking up partitions from failed instances and committing offsets so other instances don’t process records that another instance has processed already.
Offsets are committed under the group.id name. Therefore, it is critical to use the same group.id value across application deployments to guarantee that it continues to consume from where it left it.
However, there are times when we might want to change the group.id and effectively reset the consumer. The easiest way to do that is to change the group.id. In this case, we can use ‘version’ to have a new consumer group that ignores where the previous deployment instances got up to and falls back to auto.offset.reset to decide where to start consuming.
If I’m versioning my application, should I use it for the ‘version’ value?
Short answer: NO
Longer answer: you probably are (loosely) semantic versioning your application; every merged PR will represent a new version. You don’t want to change your group.id every time your application version changes. The ‘version’ mentioned in the group.id is very specific to the consumer group and how it manages offsets. Don’t mix the two together.
group.instance.id
The Kafka documentation defines group.instance.id as follows:
A unique identifier of the consumer instance provided by the end user. Only non-empty strings are permitted. If set, the consumer is treated as a static member, which means that only one instance with this ID is allowed in the consumer group at any time. This can be used in combination with a larger session timeout to avoid group rebalances caused by transient unavailability (e.g. process restarts). If not set, the consumer will join the group as a dynamic member, which is the traditional behavior.
In other words, while group.id identifies 1 or more instances that belong to a consumer group, group.instance.id identifies unique instances.
The main purpose of group.instance.id is to enable static membership to the consumer group. This helps reducing group rebalancings when instances are not available briefly. The assumption is it is better to delay whatever partitions are consumed by the temporarily missing instance than rebalance the complete group, affecting all other instances.
I recommend using the same group.id naming convention PLUS something that identifies the instance uniquely and is stable between restarts.
client.id
client.id serves the exact same purpose in consumers and producers. Therefore, I will refer you to the previous section for producer’s client.id for a naming convention proposal. See https://javierholguera.com/2024/09/12/naming-kafka-objects-i-producers-and-consumers/#client-id
Conclusions
Naming is difficult and requires care. However, investing in good naming conventions reduces accidental complexity, helps with debugging and diagnosing your system, and supports development through its end-to-end lifecycle.
In this post, I proposed multiple naming conventions that aim to be semantically meaningful, allow you to organize your system into sensible components, and support your system’s incremental evolution.
















