You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Viktor Somogyi-Vass (Jira)" <ji...@apache.org> on 2022/10/05 11:33:00 UTC

[jira] [Created] (KAFKA-14281) Multi-level rack awareness

Viktor Somogyi-Vass created KAFKA-14281:
-------------------------------------------

Summary: Multi-level rack awareness
Key: KAFKA-14281
URL: https://issues.apache.org/jira/browse/KAFKA-14281
Project: Kafka
Issue Type: Improvement
Components: core
Affects Versions: 3.4.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass

h1. Motivation

With replication services data can be replicated across independent Kafka clusters in multiple data center. In addition, many customers need "stretch clusters" - a single Kafka cluster that spans across multiple data centers. This architecture has the following useful characteristics:
- Data is natively replicated into all data centers by Kafka topic replication.
- No data is lost when 1 DC is lost and no configuration change is required - design is implicitly relying on native Kafka replication.
- From operational point of view, it is much easier to configure and operate such a topology than a replication scenario via MM2.

Kafka should provide "native" support for stretch clusters, covering any special aspects of operations of stretch cluster.

h2. Multi-level rack awareness

Additionally, stretch clusters are implemented using the rack awareness feature, where each DC is represented as a rack. This ensures that replicas are spread across DCs evenly. Unfortunately, there are cases where this is too limiting - in case there are actual racks inside the DCs, we cannot specify those. Consider having 3 DCs with 2 racks each:

/DC1/R1, /DC1/R2
/DC2/R1, /DC2/R2
/DC3/R1, /DC3/R2

If we were to use racks as DC1, DC2, DC3, we lose the rack-level information of the setup. This means that it is possible that when we are using RF=6, that the 2 replicas assigned to DC1 will both end up in the same rack.

If we were to use racks as /DC1/R1, /DC1/R2, etc, then when using RF=3, it is possible that 2 replicas end up in the same DC, e.g. /DC1/R1, /DC1/R2, /DC2/R1.

Because of this, Kafka should support "multi-level" racks, which means that rack IDs should be able to describe some kind of a hierarchy. With this feature, brokers should be able to:
# spread replicas evenly based on the top level of the hierarchy (i.e. first, between DCs)
# then inside a top-level unit (DC), if there are multiple replicas, they should be spread evenly among lower-level units (i.e. between racks, then between physical hosts, and so on)
## repeat for all levels

--
This message was sent by Atlassian Jira
(v8.20.10#820010)