You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by GitBox <gi...@apache.org> on 2021/01/22 15:41:41 UTC

[GitHub] [kafka-site] miguno opened a new pull request #324: KAFKA-8930: MirrorMaker v2 documentation

miguno opened a new pull request #324:
URL: https://github.com/apache/kafka-site/pull/324


   This adds a new user-facing documentation "Geo-replication (Cross-Cluster Data Mirroring)" section to the Kafka Operations documentation that covers MirrorMaker v2.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] bbejeck commented on pull request #324: KAFKA-8930: MirrorMaker v2 documentation

Posted by GitBox <gi...@apache.org>.

bbejeck commented on pull request #324:
URL: https://github.com/apache/kafka-site/pull/324#issuecomment-768404929


   > We should also add these docs to kafka/docs repo
   
   @omkreddy, yes a PR for kafka/docs is coming soon


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] ryannedolan commented on a change in pull request #324: KAFKA-8930: MirrorMaker v2 documentation

Posted by GitBox <gi...@apache.org>.

ryannedolan commented on a change in pull request #324:
URL: https://github.com/apache/kafka-site/pull/324#discussion_r563180323



##########
File path: 27/ops.html
##########
@@ -553,7 +539,558 @@ <h3 class="anchor-heading"><a id="datacenters" class="anchor-link"></a><a href="
   <p>
   It is generally <i>not</i> advisable to run a <i>single</i> Kafka cluster that spans multiple datacenters over a high-latency link. This will incur very high replication latency both for Kafka writes and ZooKeeper writes, and neither Kafka nor ZooKeeper will remain available in all locations if the network between locations is unavailable.
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.3 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="georeplication" class="anchor-link"></a><a href="#georeplication">6.3 Geo-Replication (Cross-Cluster Data Mirroring)</a></h3>
+
+  <h4 class="anchor-heading"><a id="georeplication-overview" class="anchor-link"></a><a href="#georeplication-overview">Geo-Replication Overview</a></h4>
+
+  <p>
+    Kafka administrators can define data flows that cross the boundaries of individual Kafka clusters, data centers, or geo-regions. Such event streaming setups are often needed for organizational, technical, or legal requirements. Common scenarios include:
+  </p>
+
+  <ul>
+    <li>Geo-replication</li>
+    <li>Disaster recovery</li>
+    <li>Feeding edge clusters into a central, aggregate cluster</li>
+    <li>Physical isolation of clusters (such as production vs. testing)</li>
+    <li>Cloud migration or hybrid cloud deployments</li>
+    <li>Legal and compliance requirements</li>
+  </ul>
+
+  <p>
+    Administrators can set up such inter-cluster data flows with Kafka's MirrorMaker (version 2), a tool to replicate data between different Kafka environments in a streaming manner. MirrorMaker is built on top of the Kafka Connect framework and supports features such as:
+  </p>
+
+  <ul>
+    <li>Replicates topics (data plus configurations)</li>
+    <li>Replicates consumer groups including offsets to migrate applications between clusters</li>
+    <li>Replicates ACLs</li>
+    <li>Preserves partitioning</li>
+    <li>Automatically detects new topics and partitions</li>
+    <li>Provides a wide range of metrics, such as end-to-end replication latency across multiple data centers/clusters</li>
+    <li>Fault-tolerant and horizontally scalable operations</li>
+  </ul>
+
+  <p>
+  <em>Note: Geo-replication with MirrorMaker replicates data across Kafka clusters. This inter-cluster replication is different from Kafka's <a href="#replication">intra-cluster replication</a>, which replicates data within the same Kafka cluster.</em>
+  </p>
+
+  <h4 class="anchor-heading"><a id="georeplication-flows" class="anchor-link"></a><a href="#georeplication-flows">What Are Replication Flows</a></h4>
+
+  <p>
+    With MirrorMaker, Kafka administrators can replicate topics, topic configurations, consumer groups and their offsets, and ACLs from one or more source Kafka clusters to one or more target Kafka clusters, i.e., across cluster environments. In a nutshell, MirrorMaker consumes data from the source cluster with source connectors, and then replicates the data by producing to the target cluster with sink connectors.

Review comment:
       "with sink connectors" is not true at the moment, since I don't think we have a sink connector yet. And even when we do, it would usually be sufficient to use source _or_ sink connector. There are certainly cases where this sentence is true, but I think it's misleading as a general statement.
   
   Maybe "In a nutshell, MirrorMaker uses Connectors to consume from source clusters and produce to target clusters" or something like that.

##########
File path: 27/ops.html
##########
@@ -553,7 +539,558 @@ <h3 class="anchor-heading"><a id="datacenters" class="anchor-link"></a><a href="
   <p>
   It is generally <i>not</i> advisable to run a <i>single</i> Kafka cluster that spans multiple datacenters over a high-latency link. This will incur very high replication latency both for Kafka writes and ZooKeeper writes, and neither Kafka nor ZooKeeper will remain available in all locations if the network between locations is unavailable.
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.3 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="georeplication" class="anchor-link"></a><a href="#georeplication">6.3 Geo-Replication (Cross-Cluster Data Mirroring)</a></h3>
+
+  <h4 class="anchor-heading"><a id="georeplication-overview" class="anchor-link"></a><a href="#georeplication-overview">Geo-Replication Overview</a></h4>
+
+  <p>
+    Kafka administrators can define data flows that cross the boundaries of individual Kafka clusters, data centers, or geo-regions. Such event streaming setups are often needed for organizational, technical, or legal requirements. Common scenarios include:
+  </p>
+
+  <ul>
+    <li>Geo-replication</li>
+    <li>Disaster recovery</li>
+    <li>Feeding edge clusters into a central, aggregate cluster</li>
+    <li>Physical isolation of clusters (such as production vs. testing)</li>
+    <li>Cloud migration or hybrid cloud deployments</li>
+    <li>Legal and compliance requirements</li>
+  </ul>
+
+  <p>
+    Administrators can set up such inter-cluster data flows with Kafka's MirrorMaker (version 2), a tool to replicate data between different Kafka environments in a streaming manner. MirrorMaker is built on top of the Kafka Connect framework and supports features such as:
+  </p>
+
+  <ul>
+    <li>Replicates topics (data plus configurations)</li>
+    <li>Replicates consumer groups including offsets to migrate applications between clusters</li>
+    <li>Replicates ACLs</li>
+    <li>Preserves partitioning</li>
+    <li>Automatically detects new topics and partitions</li>
+    <li>Provides a wide range of metrics, such as end-to-end replication latency across multiple data centers/clusters</li>
+    <li>Fault-tolerant and horizontally scalable operations</li>
+  </ul>
+
+  <p>
+  <em>Note: Geo-replication with MirrorMaker replicates data across Kafka clusters. This inter-cluster replication is different from Kafka's <a href="#replication">intra-cluster replication</a>, which replicates data within the same Kafka cluster.</em>
+  </p>
+
+  <h4 class="anchor-heading"><a id="georeplication-flows" class="anchor-link"></a><a href="#georeplication-flows">What Are Replication Flows</a></h4>
+
+  <p>
+    With MirrorMaker, Kafka administrators can replicate topics, topic configurations, consumer groups and their offsets, and ACLs from one or more source Kafka clusters to one or more target Kafka clusters, i.e., across cluster environments. In a nutshell, MirrorMaker consumes data from the source cluster with source connectors, and then replicates the data by producing to the target cluster with sink connectors.
+  </p>
+
+  <p>
+    These directional flows from source to target clusters are called replication flows. They are defined with the format <code>{source_cluster}->{target_cluster}</code> in the MirrorMaker configuration file as described later. Administrators can create complex replication topologies based on these flows.
+  </p>
+
+  <p>
+    Here are some example patterns:
+  </p>
+
+  <ul>
+    <li>Active/Active high availability deployments: <code>A->B, B->A</code></li>
+    <li>Active/Passive or Active/Standby high availability deployments: <code>A->B</code></li>
+    <li>Aggregation (e.g., from many clusters to one): <code>A->K, B->K, C->K</code></li>
+    <li>Fan-out (e.g., from one to many clusters): <code>K->A, K->B, K->C</code></li>
+    <li>Forwarding: <code>A->B, B->C, C->D</code></li>
+  </ul>
+
+  <p>
+    By default, a flow replicates all topics and consumer groups. However, each replication flow can be configured independently. For instance, you can define that only specific topics or consumer groups are replicated from the source cluster to the target cluster.
+  </p>
+
+  <p>
+    Here is a first example on how to configure data replication from a <code>primary</code> cluster to a <code>secondary</code> cluster (an active/passive setup):
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Basic settings
+clusters = primary, secondary
+primary.bootstrap.servers = broker3-primary:9092
+secondary.bootstrap.servers = broker5-secondary:9092
+
+# Define replication flows
+primary->secondary.enable = true
+primary->secondary.topics = foobar-topic, quux-.*
+</code></pre>
+
+
+  <h4 class="anchor-heading"><a id="georeplication-mirrormaker" class="anchor-link"></a><a href="#georeplication-mirrormaker">Configuring Geo-Replication</a></h4>
+
+  <p>
+    The following sections describe how to configure and run a dedicated MirrorMaker cluster. If you want to run MirrorMaker within an existing Kafka Connect cluster or other supported deployment setups, please refer to <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0">KIP-382: MirrorMaker 2.0</a> and be aware that the names of configuration settings may vary between deployment modes.
+  </p>
+
+  <p>
+    Beyond what's covered in the following sections, further examples and information on configuration settings are available at:
+  </p>
+
+  <ul>
+	  <li><a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorMakerConfig.java">MirrorMakerConfig</a>, <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorConnectorConfig.java">MirrorConnectorConfig</a></li>
+	  <li><a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/DefaultTopicFilter.java">DefaultTopicFilter</a> for topics, <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/DefaultGroupFilter.java">DefaultGroupFilter</a> for consumer groups</li>
+	  <li>Example configuration settings in <a href="https://github.com/apache/kafka/blob/trunk/config/connect-mirror-maker.properties">connect-mirror-maker.properties</a>, <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0">KIP-382: MirrorMaker 2.0</a></li>
+  </ul>
+
+  <h5 class="anchor-heading"><a id="georeplication-config-syntax" class="anchor-link"></a><a href="#georeplication-config-syntax">Configuration File Syntax</a></h5>
+
+  <p>
+    The MirrorMaker configuration file is typically named <code>connect-mirror-maker.properties</code>. You can configure a variety of components in this file:
+  </p>
+
+  <ul>
+    <li>MirrorMaker settings: global settings including cluster definitions (aliases), plus custom settings per replication flow</li>
+    <li>Kafka Connect and connector settings</li>
+    <li>Kafka producer, consumer, and admin client settings</li>
+  </ul>
+
+  <p>
+    Example: Define MirrorMaker settings (explained in more detail later).
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Global settings
+clusters = us-west, us-east   # defines cluster aliases
+us-west.bootstrap.servers = broker3-west:9092
+us-east.bootstrap.servers = broker5-east:9092
+
+topics = .*   # all topics to be replicated by default
+
+# Specific replication flow settings (here: flow from us-west to us-east)
+us-west->us-east.enable = true
+us-west->us.east.topics = foo.*, bar.*  # override the default above
+</code></pre>
+
+  <p>
+    MirrorMaker is based on the Kafka Connect framework. Any Kafka Connect, source connector, and sink connector settings as described in the <a href="#connectconfigs">documentation chapter on Kafka Connect</a> can be used directly in the MirrorMaker configuration, without having to change or prefix the name of the configuration setting.
+  </p>
+
+  <p>
+    Example: Define custom Kafka Connect settings to be used by MirrorMaker.
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Setting Kafka Connect defaults for MirrorMaker
+tasks.max = 5
+</code></pre>
+
+  <p>
+  Most of the default Kafka Connect settings work well for MirrorMaker out-of-the-box, with the exception of <code>tasks.max</code>. In order to evenly distribute the workload across more than one MirrorMaker process, it is recommended to set <code>tasks.max</code> to at least <code>2</code> (preferably higher) depending on the available hardware resources and the total number of topic-partitions to be replicated.
+  </p>
+
+  <p>
+  You can further customize MirrorMaker's Kafka Connect settings <em>per source or target cluster</em> (more precisely, you can specify Kafka Connect worker-level configuration settings "per connector"). Use the format of <code>{cluster}.{config_name}</code> in the MirrorMaker configuration file.
+  </p>
+
+  <p>
+    Example: Define custom connector settings for the <code>us-west</code> cluster.
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># us-west custom settings
+us-west.offset.storage.topic = my-mirrormaker-offsets
+</code></pre>
+
+  <p>
+    MirrorMaker internally uses the Kafka producer, consumer, and admin clients. Custom settings for these clients are often needed. To override the defaults, use the following format in the MirrorMaker configuration file:
+  </p>
+
+  <ul>
+    <li><code>{source}.consumer.{consumer_config_name}</code></li>
+    <li><code>{target}.producer.{producer_config_name}</code></li>
+    <li><code>{source_or_target}.admin.{admin_config_name}</code></li>
+  </ul>
+
+  <p>
+    Example: Define custom producer, consumer, admin client settings.
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># us-west cluster (from which to consume)
+us-west.consumer.isolation.level = read_committed
+us-west.admin.bootstrap.servers = broker57-primary:9092
+
+# us-east cluster (to which to produce)
+us-east.producer.compression.type = gzip
+us-east.producer.buffer.memory = 32768
+us-east.admin.bootstrap.servers = broker8-secondary:9092
+</code></pre>
+
+  <h5 class="anchor-heading"><a id="georeplication-flow-create" class="anchor-link"></a><a href="#georeplication-flow-create">Creating and Enabling Replication Flows</a></h5>
+
+  <p>
+    To define a replication flow, you must first define the respective source and target Kafka clusters in the MirrorMaker configuration file.
+  </p>
+
+  <ul>
+    <li><code>clusters</code> (required): comma-separated list of Kafka cluster "aliases"</li>
+    <li><code>{clusterAlias}.bootstrap.servers</code> (required): connection information for the specific cluster; comma-separated list of "bootstrap" Kafka brokers
+  </ul>
+
+  <p>
+    Example: Define two cluster aliases <code>primary</code> and <code>secondary</code>, including their connection information.
+  </p>
+
+<pre class="line-numbers"><code class="language-text">clusters = primary, secondary
+primary.bootstrap.servers = broker10-primary:9092,broker-11-primary:9092
+secondary.bootstrap.servers = broker5-secondary:9092,broker6-secondary:9092
+</code></pre>
+
+  <p>
+    Secondly, you must explicitly enable individual replication flows with <code>{source}->{target}.enabled = true</code> as needed. Remember that flows are directional: if you need two-way (bidirectional) replication, you must enable flows in both directions.
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Enable replication from primary to secondary
+primary->secondary.enable = true
+</code></pre>
+
+  <p>
+    By default, a replication flow will replicate all but a few special topics and consumer groups from the source cluster to the target cluster, and automatically detect any newly created topics and groups. The names of replicated topics in the target cluster will be prefixed with the name of the source cluster (see section further below). For example, the topic <code>foo</code> in the source cluster <code>us-west</code> would be replicated to a topic named <code>us-west.foo</code> in the target cluster <code>us-east</code>.
+  </p>
+
+  <p>
+    The subsequent sections explain how to customize this basic setup according to your needs.
+  </p>
+
+  <h5 class="anchor-heading"><a id="georeplication-flow-configure" class="anchor-link"></a><a href="#georeplication-flow-configure">Configuring Replication Flows</a></h5>
+
+  <p>
+The configuration of a replication flow is a combination of top-level default settings (e.g., <code>topics</code>), on top of which flow-specific settings, if any, are applied (e.g., <code>us-west->us-east.topics</code>). To change the top-level defaults, add the respective top-level setting to the MirrorMaker configuration file. To override the defaults for a specific replication flow only, use the syntax format <code>{source}->{target}.{config.name}</code>.
+  </p>
+
+  <p>
+    The most important settings are:
+  </p>
+
+  <ul>
+    <li><code>topics</code>: list of topics or a regular expression that defines which topics in the source cluster to replicate (default: <code>topics = .*</code>)
+    <li><code>topics.exclude</code>: list of topics or a regular expression to subsequently exclude topics that were matched by the <code>topics</code> setting (default: <code>topics.exclude = .*[\-\.]internal, .*\.replica, __.*</code>)
+    <li><code>groups</code>: list of topics or regular expression that defines which consumer groups in the source cluster to replicate (default: <code>groups = .*</code>)
+    <li><code>groups.exclude</code>: list of topics or a regular expression to subsequently exclude consumer groups that were matched by the <code>groups</code> setting (default: <code>groups.exclude = console-consumer-.*, connect-.*, __.*</code>)
+    <li><code>{source}->{target}.enable</code>: set to <code>true</code> to enable the replication flow (default: <code>false</code>)
+  </ul>
+
+  <p>
+    Example:
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Custom top-level defaults that apply to all replication flows
+topics = .*
+groups = consumer-group1, consumer-group2
+
+# Don't forget to enable a flow!
+us-west->us-east.enable = true
+
+# Custom settings for specific replication flows
+us-west->us-east.topics = foo.*
+us-west->us-east.groups = bar.*
+us-west->us-east.emit.heartbeats = false
+</code></pre>
+
+  <p>
+    Additional configuration settings are supported, some of which are listed below. In most cases, you can leave these settings at their default values. See <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorMakerConfig.java">MirrorMakerConfig</a> and <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorConnectorConfig.java">MirrorConnectorConfig</a> for further details.
+  </p>
+
+  <ul>
+    <li><code>refresh.topics.enabled</code>: whether to check for new topics in the source cluster periodically (default: true)
+    <li><code>refresh.topics.interval.seconds</code>: frequency of checking for new topics in the source cluster; lower values than the default may lead to performance degradation (default: 6000, every ten minutes)
+    <li><code>refresh.groups.enabled</code>: whether to check for new consumer groups in the source cluster periodically (default: true)
+    <li><code>refresh.groups.interval.seconds</code>: frequency of checking for new consumer groups in the source cluster; lower values than the default may lead to performance degradation (default: 6000, every ten minutes)
+    <li><code>sync.topic.configs.enabled</code>: whether to replicate topic configurations from the source cluster (default: true)
+    <li><code>sync.topic.acls.enabled</code>: whether to sync ACLs from the source cluster (default: true)
+    <li><code>emit.heartbeats.enabled</code>: whether to emit heartbeats periodically (default: true)
+    <li><code>emit.heartbeats.interval.seconds</code>: frequency at which heartbeats are emitted (default: 5, every five seconds)
+    <li><code>heartbeats.topic.replication.factor</code>: replication factor of MirrorMaker's internal heartbeat topics (default: 3)
+    <li><code>emit.checkpoints.enabled</code>: whether to emit MirrorMaker's consumer offsets periodically (default: true)
+    <li><code>emit.checkpoints.interval.seconds</code>: frequency at which checkpoints are emitted (default: 60, every minute)
+    <li><code>checkpoints.topic.replication.factor</code>: replication factor of MirrorMaker's internal checkpoints topics (default: 3)
+    <li><code>sync.group.offsets.enabled</code>: whether to periodically write the translated offsets of replicated consumer groups (in the source cluster) to <code>__consumer_offsets</code> topic in target cluster, as long as no active consumers in that group are connected to the target cluster (default: true)
+    <li><code>sync.group.offsets.interval.seconds</code>: frequency at which consumer group offsets are synced (default: 60, every minute)
+    <li><code>offset-syncs.topic.replication.factor</code>: replication factor of MirrorMaker's internal offset-sync topics (default: 3)
+  </ul>
+
+  <h5 class="anchor-heading"><a id="georeplication-flow-secure" class="anchor-link"></a><a href="#georeplication-flow-secure">Securing Replication Flows</a></h5>
+
+  <p>
+    MirrorMaker supports the same <a href="#connectconfigs">security settings as Kafka Connect</a>, so please refer to the linked section for further information.
+  </p>
+
+  <p>
+    Example: Encrypt communication between MirrorMaker and the <code>us-east</code> cluster.
+  </p>
+
+<pre class="line-numbers"><code class="language-text">us-east.security.protocol=SSL
+us-east.ssl.truststore.location=/path/to/truststore.jks
+us-east.ssl.truststore.password=my-secret-password
+us-east.ssl.keystore.location=/path/to/keystore.jks
+us-east.ssl.keystore.password=my-secret-password
+us-east.ssl.key.password=my-secret-password
+</code></pre>
+
+  <h5 class="anchor-heading"><a id="georeplication-topic-naming" class="anchor-link"></a><a href="#georeplication-topic-naming">Custom Naming of Replicated Topics in Target Clusters</a></h5>
+
+  <p>
+    Replicated topics in a target cluster—sometimes called <em>remote</em> topics—are renamed according to a replication policy. MirrorMaker uses this policy to ensure that events (aka records, messages) from different clusters are not written to the same topic-partition. By default as per <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror-client/src/main/java/org/apache/kafka/connect/mirror/DefaultReplicationPolicy.java">DefaultReplicationPolicy</a>, the names of replicated topics in the target clusters have the format <code>{source}.{source_topic_name}</code>:

Review comment:
       I think "records" is more prevalent in Kafka docs vs "events". Maybe verify that and stick with whatever the rest of the docs use. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] bbejeck merged pull request #324: KAFKA-8930: MirrorMaker v2 documentation

Posted by GitBox <gi...@apache.org>.

bbejeck merged pull request #324:
URL: https://github.com/apache/kafka-site/pull/324


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] miguno commented on a change in pull request #324: KAFKA-8930: MirrorMaker v2 documentation

Posted by GitBox <gi...@apache.org>.

miguno commented on a change in pull request #324:
URL: https://github.com/apache/kafka-site/pull/324#discussion_r563562309



##########
File path: 27/ops.html
##########
@@ -553,7 +539,558 @@ <h3 class="anchor-heading"><a id="datacenters" class="anchor-link"></a><a href="
   <p>
   It is generally <i>not</i> advisable to run a <i>single</i> Kafka cluster that spans multiple datacenters over a high-latency link. This will incur very high replication latency both for Kafka writes and ZooKeeper writes, and neither Kafka nor ZooKeeper will remain available in all locations if the network between locations is unavailable.
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.3 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="georeplication" class="anchor-link"></a><a href="#georeplication">6.3 Geo-Replication (Cross-Cluster Data Mirroring)</a></h3>
+
+  <h4 class="anchor-heading"><a id="georeplication-overview" class="anchor-link"></a><a href="#georeplication-overview">Geo-Replication Overview</a></h4>
+
+  <p>
+    Kafka administrators can define data flows that cross the boundaries of individual Kafka clusters, data centers, or geo-regions. Such event streaming setups are often needed for organizational, technical, or legal requirements. Common scenarios include:
+  </p>
+
+  <ul>
+    <li>Geo-replication</li>
+    <li>Disaster recovery</li>
+    <li>Feeding edge clusters into a central, aggregate cluster</li>
+    <li>Physical isolation of clusters (such as production vs. testing)</li>
+    <li>Cloud migration or hybrid cloud deployments</li>
+    <li>Legal and compliance requirements</li>
+  </ul>
+
+  <p>
+    Administrators can set up such inter-cluster data flows with Kafka's MirrorMaker (version 2), a tool to replicate data between different Kafka environments in a streaming manner. MirrorMaker is built on top of the Kafka Connect framework and supports features such as:
+  </p>
+
+  <ul>
+    <li>Replicates topics (data plus configurations)</li>
+    <li>Replicates consumer groups including offsets to migrate applications between clusters</li>
+    <li>Replicates ACLs</li>
+    <li>Preserves partitioning</li>
+    <li>Automatically detects new topics and partitions</li>
+    <li>Provides a wide range of metrics, such as end-to-end replication latency across multiple data centers/clusters</li>
+    <li>Fault-tolerant and horizontally scalable operations</li>
+  </ul>
+
+  <p>
+  <em>Note: Geo-replication with MirrorMaker replicates data across Kafka clusters. This inter-cluster replication is different from Kafka's <a href="#replication">intra-cluster replication</a>, which replicates data within the same Kafka cluster.</em>
+  </p>
+
+  <h4 class="anchor-heading"><a id="georeplication-flows" class="anchor-link"></a><a href="#georeplication-flows">What Are Replication Flows</a></h4>
+
+  <p>
+    With MirrorMaker, Kafka administrators can replicate topics, topic configurations, consumer groups and their offsets, and ACLs from one or more source Kafka clusters to one or more target Kafka clusters, i.e., across cluster environments. In a nutshell, MirrorMaker consumes data from the source cluster with source connectors, and then replicates the data by producing to the target cluster with sink connectors.
+  </p>
+
+  <p>
+    These directional flows from source to target clusters are called replication flows. They are defined with the format <code>{source_cluster}->{target_cluster}</code> in the MirrorMaker configuration file as described later. Administrators can create complex replication topologies based on these flows.
+  </p>
+
+  <p>
+    Here are some example patterns:
+  </p>
+
+  <ul>
+    <li>Active/Active high availability deployments: <code>A->B, B->A</code></li>
+    <li>Active/Passive or Active/Standby high availability deployments: <code>A->B</code></li>
+    <li>Aggregation (e.g., from many clusters to one): <code>A->K, B->K, C->K</code></li>
+    <li>Fan-out (e.g., from one to many clusters): <code>K->A, K->B, K->C</code></li>
+    <li>Forwarding: <code>A->B, B->C, C->D</code></li>
+  </ul>
+
+  <p>
+    By default, a flow replicates all topics and consumer groups. However, each replication flow can be configured independently. For instance, you can define that only specific topics or consumer groups are replicated from the source cluster to the target cluster.
+  </p>
+
+  <p>
+    Here is a first example on how to configure data replication from a <code>primary</code> cluster to a <code>secondary</code> cluster (an active/passive setup):
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Basic settings
+clusters = primary, secondary
+primary.bootstrap.servers = broker3-primary:9092
+secondary.bootstrap.servers = broker5-secondary:9092
+
+# Define replication flows
+primary->secondary.enable = true
+primary->secondary.topics = foobar-topic, quux-.*
+</code></pre>
+
+
+  <h4 class="anchor-heading"><a id="georeplication-mirrormaker" class="anchor-link"></a><a href="#georeplication-mirrormaker">Configuring Geo-Replication</a></h4>
+
+  <p>
+    The following sections describe how to configure and run a dedicated MirrorMaker cluster. If you want to run MirrorMaker within an existing Kafka Connect cluster or other supported deployment setups, please refer to <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0">KIP-382: MirrorMaker 2.0</a> and be aware that the names of configuration settings may vary between deployment modes.
+  </p>
+
+  <p>
+    Beyond what's covered in the following sections, further examples and information on configuration settings are available at:
+  </p>
+
+  <ul>
+	  <li><a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorMakerConfig.java">MirrorMakerConfig</a>, <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorConnectorConfig.java">MirrorConnectorConfig</a></li>
+	  <li><a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/DefaultTopicFilter.java">DefaultTopicFilter</a> for topics, <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/DefaultGroupFilter.java">DefaultGroupFilter</a> for consumer groups</li>
+	  <li>Example configuration settings in <a href="https://github.com/apache/kafka/blob/trunk/config/connect-mirror-maker.properties">connect-mirror-maker.properties</a>, <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0">KIP-382: MirrorMaker 2.0</a></li>
+  </ul>
+
+  <h5 class="anchor-heading"><a id="georeplication-config-syntax" class="anchor-link"></a><a href="#georeplication-config-syntax">Configuration File Syntax</a></h5>
+
+  <p>
+    The MirrorMaker configuration file is typically named <code>connect-mirror-maker.properties</code>. You can configure a variety of components in this file:
+  </p>
+
+  <ul>
+    <li>MirrorMaker settings: global settings including cluster definitions (aliases), plus custom settings per replication flow</li>
+    <li>Kafka Connect and connector settings</li>
+    <li>Kafka producer, consumer, and admin client settings</li>
+  </ul>
+
+  <p>
+    Example: Define MirrorMaker settings (explained in more detail later).
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Global settings
+clusters = us-west, us-east   # defines cluster aliases
+us-west.bootstrap.servers = broker3-west:9092
+us-east.bootstrap.servers = broker5-east:9092
+
+topics = .*   # all topics to be replicated by default
+
+# Specific replication flow settings (here: flow from us-west to us-east)
+us-west->us-east.enable = true
+us-west->us.east.topics = foo.*, bar.*  # override the default above
+</code></pre>
+
+  <p>
+    MirrorMaker is based on the Kafka Connect framework. Any Kafka Connect, source connector, and sink connector settings as described in the <a href="#connectconfigs">documentation chapter on Kafka Connect</a> can be used directly in the MirrorMaker configuration, without having to change or prefix the name of the configuration setting.
+  </p>
+
+  <p>
+    Example: Define custom Kafka Connect settings to be used by MirrorMaker.
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Setting Kafka Connect defaults for MirrorMaker
+tasks.max = 5
+</code></pre>
+
+  <p>
+  Most of the default Kafka Connect settings work well for MirrorMaker out-of-the-box, with the exception of <code>tasks.max</code>. In order to evenly distribute the workload across more than one MirrorMaker process, it is recommended to set <code>tasks.max</code> to at least <code>2</code> (preferably higher) depending on the available hardware resources and the total number of topic-partitions to be replicated.
+  </p>
+
+  <p>
+  You can further customize MirrorMaker's Kafka Connect settings <em>per source or target cluster</em> (more precisely, you can specify Kafka Connect worker-level configuration settings "per connector"). Use the format of <code>{cluster}.{config_name}</code> in the MirrorMaker configuration file.
+  </p>
+
+  <p>
+    Example: Define custom connector settings for the <code>us-west</code> cluster.
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># us-west custom settings
+us-west.offset.storage.topic = my-mirrormaker-offsets
+</code></pre>
+
+  <p>
+    MirrorMaker internally uses the Kafka producer, consumer, and admin clients. Custom settings for these clients are often needed. To override the defaults, use the following format in the MirrorMaker configuration file:
+  </p>
+
+  <ul>
+    <li><code>{source}.consumer.{consumer_config_name}</code></li>
+    <li><code>{target}.producer.{producer_config_name}</code></li>
+    <li><code>{source_or_target}.admin.{admin_config_name}</code></li>
+  </ul>
+
+  <p>
+    Example: Define custom producer, consumer, admin client settings.
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># us-west cluster (from which to consume)
+us-west.consumer.isolation.level = read_committed
+us-west.admin.bootstrap.servers = broker57-primary:9092
+
+# us-east cluster (to which to produce)
+us-east.producer.compression.type = gzip
+us-east.producer.buffer.memory = 32768
+us-east.admin.bootstrap.servers = broker8-secondary:9092
+</code></pre>
+
+  <h5 class="anchor-heading"><a id="georeplication-flow-create" class="anchor-link"></a><a href="#georeplication-flow-create">Creating and Enabling Replication Flows</a></h5>
+
+  <p>
+    To define a replication flow, you must first define the respective source and target Kafka clusters in the MirrorMaker configuration file.
+  </p>
+
+  <ul>
+    <li><code>clusters</code> (required): comma-separated list of Kafka cluster "aliases"</li>
+    <li><code>{clusterAlias}.bootstrap.servers</code> (required): connection information for the specific cluster; comma-separated list of "bootstrap" Kafka brokers
+  </ul>
+
+  <p>
+    Example: Define two cluster aliases <code>primary</code> and <code>secondary</code>, including their connection information.
+  </p>
+
+<pre class="line-numbers"><code class="language-text">clusters = primary, secondary
+primary.bootstrap.servers = broker10-primary:9092,broker-11-primary:9092
+secondary.bootstrap.servers = broker5-secondary:9092,broker6-secondary:9092
+</code></pre>
+
+  <p>
+    Secondly, you must explicitly enable individual replication flows with <code>{source}->{target}.enabled = true</code> as needed. Remember that flows are directional: if you need two-way (bidirectional) replication, you must enable flows in both directions.
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Enable replication from primary to secondary
+primary->secondary.enable = true
+</code></pre>
+
+  <p>
+    By default, a replication flow will replicate all but a few special topics and consumer groups from the source cluster to the target cluster, and automatically detect any newly created topics and groups. The names of replicated topics in the target cluster will be prefixed with the name of the source cluster (see section further below). For example, the topic <code>foo</code> in the source cluster <code>us-west</code> would be replicated to a topic named <code>us-west.foo</code> in the target cluster <code>us-east</code>.
+  </p>
+
+  <p>
+    The subsequent sections explain how to customize this basic setup according to your needs.
+  </p>
+
+  <h5 class="anchor-heading"><a id="georeplication-flow-configure" class="anchor-link"></a><a href="#georeplication-flow-configure">Configuring Replication Flows</a></h5>
+
+  <p>
+The configuration of a replication flow is a combination of top-level default settings (e.g., <code>topics</code>), on top of which flow-specific settings, if any, are applied (e.g., <code>us-west->us-east.topics</code>). To change the top-level defaults, add the respective top-level setting to the MirrorMaker configuration file. To override the defaults for a specific replication flow only, use the syntax format <code>{source}->{target}.{config.name}</code>.
+  </p>
+
+  <p>
+    The most important settings are:
+  </p>
+
+  <ul>
+    <li><code>topics</code>: list of topics or a regular expression that defines which topics in the source cluster to replicate (default: <code>topics = .*</code>)
+    <li><code>topics.exclude</code>: list of topics or a regular expression to subsequently exclude topics that were matched by the <code>topics</code> setting (default: <code>topics.exclude = .*[\-\.]internal, .*\.replica, __.*</code>)
+    <li><code>groups</code>: list of topics or regular expression that defines which consumer groups in the source cluster to replicate (default: <code>groups = .*</code>)
+    <li><code>groups.exclude</code>: list of topics or a regular expression to subsequently exclude consumer groups that were matched by the <code>groups</code> setting (default: <code>groups.exclude = console-consumer-.*, connect-.*, __.*</code>)
+    <li><code>{source}->{target}.enable</code>: set to <code>true</code> to enable the replication flow (default: <code>false</code>)
+  </ul>
+
+  <p>
+    Example:
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Custom top-level defaults that apply to all replication flows
+topics = .*
+groups = consumer-group1, consumer-group2
+
+# Don't forget to enable a flow!
+us-west->us-east.enable = true
+
+# Custom settings for specific replication flows
+us-west->us-east.topics = foo.*
+us-west->us-east.groups = bar.*
+us-west->us-east.emit.heartbeats = false
+</code></pre>
+
+  <p>
+    Additional configuration settings are supported, some of which are listed below. In most cases, you can leave these settings at their default values. See <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorMakerConfig.java">MirrorMakerConfig</a> and <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorConnectorConfig.java">MirrorConnectorConfig</a> for further details.
+  </p>
+
+  <ul>
+    <li><code>refresh.topics.enabled</code>: whether to check for new topics in the source cluster periodically (default: true)
+    <li><code>refresh.topics.interval.seconds</code>: frequency of checking for new topics in the source cluster; lower values than the default may lead to performance degradation (default: 6000, every ten minutes)
+    <li><code>refresh.groups.enabled</code>: whether to check for new consumer groups in the source cluster periodically (default: true)
+    <li><code>refresh.groups.interval.seconds</code>: frequency of checking for new consumer groups in the source cluster; lower values than the default may lead to performance degradation (default: 6000, every ten minutes)
+    <li><code>sync.topic.configs.enabled</code>: whether to replicate topic configurations from the source cluster (default: true)
+    <li><code>sync.topic.acls.enabled</code>: whether to sync ACLs from the source cluster (default: true)
+    <li><code>emit.heartbeats.enabled</code>: whether to emit heartbeats periodically (default: true)
+    <li><code>emit.heartbeats.interval.seconds</code>: frequency at which heartbeats are emitted (default: 5, every five seconds)
+    <li><code>heartbeats.topic.replication.factor</code>: replication factor of MirrorMaker's internal heartbeat topics (default: 3)
+    <li><code>emit.checkpoints.enabled</code>: whether to emit MirrorMaker's consumer offsets periodically (default: true)
+    <li><code>emit.checkpoints.interval.seconds</code>: frequency at which checkpoints are emitted (default: 60, every minute)
+    <li><code>checkpoints.topic.replication.factor</code>: replication factor of MirrorMaker's internal checkpoints topics (default: 3)
+    <li><code>sync.group.offsets.enabled</code>: whether to periodically write the translated offsets of replicated consumer groups (in the source cluster) to <code>__consumer_offsets</code> topic in target cluster, as long as no active consumers in that group are connected to the target cluster (default: true)
+    <li><code>sync.group.offsets.interval.seconds</code>: frequency at which consumer group offsets are synced (default: 60, every minute)
+    <li><code>offset-syncs.topic.replication.factor</code>: replication factor of MirrorMaker's internal offset-sync topics (default: 3)
+  </ul>
+
+  <h5 class="anchor-heading"><a id="georeplication-flow-secure" class="anchor-link"></a><a href="#georeplication-flow-secure">Securing Replication Flows</a></h5>
+
+  <p>
+    MirrorMaker supports the same <a href="#connectconfigs">security settings as Kafka Connect</a>, so please refer to the linked section for further information.
+  </p>
+
+  <p>
+    Example: Encrypt communication between MirrorMaker and the <code>us-east</code> cluster.
+  </p>
+
+<pre class="line-numbers"><code class="language-text">us-east.security.protocol=SSL
+us-east.ssl.truststore.location=/path/to/truststore.jks
+us-east.ssl.truststore.password=my-secret-password
+us-east.ssl.keystore.location=/path/to/keystore.jks
+us-east.ssl.keystore.password=my-secret-password
+us-east.ssl.key.password=my-secret-password
+</code></pre>
+
+  <h5 class="anchor-heading"><a id="georeplication-topic-naming" class="anchor-link"></a><a href="#georeplication-topic-naming">Custom Naming of Replicated Topics in Target Clusters</a></h5>
+
+  <p>
+    Replicated topics in a target cluster—sometimes called <em>remote</em> topics—are renamed according to a replication policy. MirrorMaker uses this policy to ensure that events (aka records, messages) from different clusters are not written to the same topic-partition. By default as per <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror-client/src/main/java/org/apache/kafka/connect/mirror/DefaultReplicationPolicy.java">DefaultReplicationPolicy</a>, the names of replicated topics in the target clusters have the format <code>{source}.{source_topic_name}</code>:

Review comment:
       The use of "events" has become more prevalent, see e.g. the updated introduction page (https://kafka.apache.org/intro) and quickstart (https://kafka.apache.org/quickstart). Hence I prefer to leave the text as-is. The included "(aka records, messages)" parenthesis makes the nomenclature sufficiently clear to the reader, imho.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] miguno commented on a change in pull request #324: KAFKA-8930: MirrorMaker v2 documentation

Posted by GitBox <gi...@apache.org>.

miguno commented on a change in pull request #324:
URL: https://github.com/apache/kafka-site/pull/324#discussion_r563562851



##########
File path: 27/ops.html
##########
@@ -553,7 +539,558 @@ <h3 class="anchor-heading"><a id="datacenters" class="anchor-link"></a><a href="
   <p>
   It is generally <i>not</i> advisable to run a <i>single</i> Kafka cluster that spans multiple datacenters over a high-latency link. This will incur very high replication latency both for Kafka writes and ZooKeeper writes, and neither Kafka nor ZooKeeper will remain available in all locations if the network between locations is unavailable.
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.3 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="georeplication" class="anchor-link"></a><a href="#georeplication">6.3 Geo-Replication (Cross-Cluster Data Mirroring)</a></h3>
+
+  <h4 class="anchor-heading"><a id="georeplication-overview" class="anchor-link"></a><a href="#georeplication-overview">Geo-Replication Overview</a></h4>
+
+  <p>
+    Kafka administrators can define data flows that cross the boundaries of individual Kafka clusters, data centers, or geo-regions. Such event streaming setups are often needed for organizational, technical, or legal requirements. Common scenarios include:
+  </p>
+
+  <ul>
+    <li>Geo-replication</li>
+    <li>Disaster recovery</li>
+    <li>Feeding edge clusters into a central, aggregate cluster</li>
+    <li>Physical isolation of clusters (such as production vs. testing)</li>
+    <li>Cloud migration or hybrid cloud deployments</li>
+    <li>Legal and compliance requirements</li>
+  </ul>
+
+  <p>
+    Administrators can set up such inter-cluster data flows with Kafka's MirrorMaker (version 2), a tool to replicate data between different Kafka environments in a streaming manner. MirrorMaker is built on top of the Kafka Connect framework and supports features such as:
+  </p>
+
+  <ul>
+    <li>Replicates topics (data plus configurations)</li>
+    <li>Replicates consumer groups including offsets to migrate applications between clusters</li>
+    <li>Replicates ACLs</li>
+    <li>Preserves partitioning</li>
+    <li>Automatically detects new topics and partitions</li>
+    <li>Provides a wide range of metrics, such as end-to-end replication latency across multiple data centers/clusters</li>
+    <li>Fault-tolerant and horizontally scalable operations</li>
+  </ul>
+
+  <p>
+  <em>Note: Geo-replication with MirrorMaker replicates data across Kafka clusters. This inter-cluster replication is different from Kafka's <a href="#replication">intra-cluster replication</a>, which replicates data within the same Kafka cluster.</em>
+  </p>
+
+  <h4 class="anchor-heading"><a id="georeplication-flows" class="anchor-link"></a><a href="#georeplication-flows">What Are Replication Flows</a></h4>
+
+  <p>
+    With MirrorMaker, Kafka administrators can replicate topics, topic configurations, consumer groups and their offsets, and ACLs from one or more source Kafka clusters to one or more target Kafka clusters, i.e., across cluster environments. In a nutshell, MirrorMaker consumes data from the source cluster with source connectors, and then replicates the data by producing to the target cluster with sink connectors.

Review comment:
       Thanks, @ryannedolan. Text updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] miguno commented on pull request #324: KAFKA-8930: MirrorMaker v2 documentation

Posted by GitBox <gi...@apache.org>.

miguno commented on pull request #324:
URL: https://github.com/apache/kafka-site/pull/324#issuecomment-768416513


   kafka/docs PR is up at https://github.com/apache/kafka/pull/9983


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] omkreddy commented on pull request #324: KAFKA-8930: MirrorMaker v2 documentation

Posted by GitBox <gi...@apache.org>.

omkreddy commented on pull request #324:
URL: https://github.com/apache/kafka-site/pull/324#issuecomment-768372236


   We should also add these docs to `kafka/docs` repo


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka-site] miguno commented on a change in pull request #324: KAFKA-8930: MirrorMaker v2 documentation

Posted by GitBox <gi...@apache.org>.

miguno commented on a change in pull request #324:
URL: https://github.com/apache/kafka-site/pull/324#discussion_r563562309



##########
File path: 27/ops.html
##########
@@ -553,7 +539,558 @@ <h3 class="anchor-heading"><a id="datacenters" class="anchor-link"></a><a href="
   <p>
   It is generally <i>not</i> advisable to run a <i>single</i> Kafka cluster that spans multiple datacenters over a high-latency link. This will incur very high replication latency both for Kafka writes and ZooKeeper writes, and neither Kafka nor ZooKeeper will remain available in all locations if the network between locations is unavailable.
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.3 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="georeplication" class="anchor-link"></a><a href="#georeplication">6.3 Geo-Replication (Cross-Cluster Data Mirroring)</a></h3>
+
+  <h4 class="anchor-heading"><a id="georeplication-overview" class="anchor-link"></a><a href="#georeplication-overview">Geo-Replication Overview</a></h4>
+
+  <p>
+    Kafka administrators can define data flows that cross the boundaries of individual Kafka clusters, data centers, or geo-regions. Such event streaming setups are often needed for organizational, technical, or legal requirements. Common scenarios include:
+  </p>
+
+  <ul>
+    <li>Geo-replication</li>
+    <li>Disaster recovery</li>
+    <li>Feeding edge clusters into a central, aggregate cluster</li>
+    <li>Physical isolation of clusters (such as production vs. testing)</li>
+    <li>Cloud migration or hybrid cloud deployments</li>
+    <li>Legal and compliance requirements</li>
+  </ul>
+
+  <p>
+    Administrators can set up such inter-cluster data flows with Kafka's MirrorMaker (version 2), a tool to replicate data between different Kafka environments in a streaming manner. MirrorMaker is built on top of the Kafka Connect framework and supports features such as:
+  </p>
+
+  <ul>
+    <li>Replicates topics (data plus configurations)</li>
+    <li>Replicates consumer groups including offsets to migrate applications between clusters</li>
+    <li>Replicates ACLs</li>
+    <li>Preserves partitioning</li>
+    <li>Automatically detects new topics and partitions</li>
+    <li>Provides a wide range of metrics, such as end-to-end replication latency across multiple data centers/clusters</li>
+    <li>Fault-tolerant and horizontally scalable operations</li>
+  </ul>
+
+  <p>
+  <em>Note: Geo-replication with MirrorMaker replicates data across Kafka clusters. This inter-cluster replication is different from Kafka's <a href="#replication">intra-cluster replication</a>, which replicates data within the same Kafka cluster.</em>
+  </p>
+
+  <h4 class="anchor-heading"><a id="georeplication-flows" class="anchor-link"></a><a href="#georeplication-flows">What Are Replication Flows</a></h4>
+
+  <p>
+    With MirrorMaker, Kafka administrators can replicate topics, topic configurations, consumer groups and their offsets, and ACLs from one or more source Kafka clusters to one or more target Kafka clusters, i.e., across cluster environments. In a nutshell, MirrorMaker consumes data from the source cluster with source connectors, and then replicates the data by producing to the target cluster with sink connectors.
+  </p>
+
+  <p>
+    These directional flows from source to target clusters are called replication flows. They are defined with the format <code>{source_cluster}->{target_cluster}</code> in the MirrorMaker configuration file as described later. Administrators can create complex replication topologies based on these flows.
+  </p>
+
+  <p>
+    Here are some example patterns:
+  </p>
+
+  <ul>
+    <li>Active/Active high availability deployments: <code>A->B, B->A</code></li>
+    <li>Active/Passive or Active/Standby high availability deployments: <code>A->B</code></li>
+    <li>Aggregation (e.g., from many clusters to one): <code>A->K, B->K, C->K</code></li>
+    <li>Fan-out (e.g., from one to many clusters): <code>K->A, K->B, K->C</code></li>
+    <li>Forwarding: <code>A->B, B->C, C->D</code></li>
+  </ul>
+
+  <p>
+    By default, a flow replicates all topics and consumer groups. However, each replication flow can be configured independently. For instance, you can define that only specific topics or consumer groups are replicated from the source cluster to the target cluster.
+  </p>
+
+  <p>
+    Here is a first example on how to configure data replication from a <code>primary</code> cluster to a <code>secondary</code> cluster (an active/passive setup):
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Basic settings
+clusters = primary, secondary
+primary.bootstrap.servers = broker3-primary:9092
+secondary.bootstrap.servers = broker5-secondary:9092
+
+# Define replication flows
+primary->secondary.enable = true
+primary->secondary.topics = foobar-topic, quux-.*
+</code></pre>
+
+
+  <h4 class="anchor-heading"><a id="georeplication-mirrormaker" class="anchor-link"></a><a href="#georeplication-mirrormaker">Configuring Geo-Replication</a></h4>
+
+  <p>
+    The following sections describe how to configure and run a dedicated MirrorMaker cluster. If you want to run MirrorMaker within an existing Kafka Connect cluster or other supported deployment setups, please refer to <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0">KIP-382: MirrorMaker 2.0</a> and be aware that the names of configuration settings may vary between deployment modes.
+  </p>
+
+  <p>
+    Beyond what's covered in the following sections, further examples and information on configuration settings are available at:
+  </p>
+
+  <ul>
+	  <li><a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorMakerConfig.java">MirrorMakerConfig</a>, <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorConnectorConfig.java">MirrorConnectorConfig</a></li>
+	  <li><a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/DefaultTopicFilter.java">DefaultTopicFilter</a> for topics, <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/DefaultGroupFilter.java">DefaultGroupFilter</a> for consumer groups</li>
+	  <li>Example configuration settings in <a href="https://github.com/apache/kafka/blob/trunk/config/connect-mirror-maker.properties">connect-mirror-maker.properties</a>, <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0">KIP-382: MirrorMaker 2.0</a></li>
+  </ul>
+
+  <h5 class="anchor-heading"><a id="georeplication-config-syntax" class="anchor-link"></a><a href="#georeplication-config-syntax">Configuration File Syntax</a></h5>
+
+  <p>
+    The MirrorMaker configuration file is typically named <code>connect-mirror-maker.properties</code>. You can configure a variety of components in this file:
+  </p>
+
+  <ul>
+    <li>MirrorMaker settings: global settings including cluster definitions (aliases), plus custom settings per replication flow</li>
+    <li>Kafka Connect and connector settings</li>
+    <li>Kafka producer, consumer, and admin client settings</li>
+  </ul>
+
+  <p>
+    Example: Define MirrorMaker settings (explained in more detail later).
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Global settings
+clusters = us-west, us-east   # defines cluster aliases
+us-west.bootstrap.servers = broker3-west:9092
+us-east.bootstrap.servers = broker5-east:9092
+
+topics = .*   # all topics to be replicated by default
+
+# Specific replication flow settings (here: flow from us-west to us-east)
+us-west->us-east.enable = true
+us-west->us.east.topics = foo.*, bar.*  # override the default above
+</code></pre>
+
+  <p>
+    MirrorMaker is based on the Kafka Connect framework. Any Kafka Connect, source connector, and sink connector settings as described in the <a href="#connectconfigs">documentation chapter on Kafka Connect</a> can be used directly in the MirrorMaker configuration, without having to change or prefix the name of the configuration setting.
+  </p>
+
+  <p>
+    Example: Define custom Kafka Connect settings to be used by MirrorMaker.
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Setting Kafka Connect defaults for MirrorMaker
+tasks.max = 5
+</code></pre>
+
+  <p>
+  Most of the default Kafka Connect settings work well for MirrorMaker out-of-the-box, with the exception of <code>tasks.max</code>. In order to evenly distribute the workload across more than one MirrorMaker process, it is recommended to set <code>tasks.max</code> to at least <code>2</code> (preferably higher) depending on the available hardware resources and the total number of topic-partitions to be replicated.
+  </p>
+
+  <p>
+  You can further customize MirrorMaker's Kafka Connect settings <em>per source or target cluster</em> (more precisely, you can specify Kafka Connect worker-level configuration settings "per connector"). Use the format of <code>{cluster}.{config_name}</code> in the MirrorMaker configuration file.
+  </p>
+
+  <p>
+    Example: Define custom connector settings for the <code>us-west</code> cluster.
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># us-west custom settings
+us-west.offset.storage.topic = my-mirrormaker-offsets
+</code></pre>
+
+  <p>
+    MirrorMaker internally uses the Kafka producer, consumer, and admin clients. Custom settings for these clients are often needed. To override the defaults, use the following format in the MirrorMaker configuration file:
+  </p>
+
+  <ul>
+    <li><code>{source}.consumer.{consumer_config_name}</code></li>
+    <li><code>{target}.producer.{producer_config_name}</code></li>
+    <li><code>{source_or_target}.admin.{admin_config_name}</code></li>
+  </ul>
+
+  <p>
+    Example: Define custom producer, consumer, admin client settings.
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># us-west cluster (from which to consume)
+us-west.consumer.isolation.level = read_committed
+us-west.admin.bootstrap.servers = broker57-primary:9092
+
+# us-east cluster (to which to produce)
+us-east.producer.compression.type = gzip
+us-east.producer.buffer.memory = 32768
+us-east.admin.bootstrap.servers = broker8-secondary:9092
+</code></pre>
+
+  <h5 class="anchor-heading"><a id="georeplication-flow-create" class="anchor-link"></a><a href="#georeplication-flow-create">Creating and Enabling Replication Flows</a></h5>
+
+  <p>
+    To define a replication flow, you must first define the respective source and target Kafka clusters in the MirrorMaker configuration file.
+  </p>
+
+  <ul>
+    <li><code>clusters</code> (required): comma-separated list of Kafka cluster "aliases"</li>
+    <li><code>{clusterAlias}.bootstrap.servers</code> (required): connection information for the specific cluster; comma-separated list of "bootstrap" Kafka brokers
+  </ul>
+
+  <p>
+    Example: Define two cluster aliases <code>primary</code> and <code>secondary</code>, including their connection information.
+  </p>
+
+<pre class="line-numbers"><code class="language-text">clusters = primary, secondary
+primary.bootstrap.servers = broker10-primary:9092,broker-11-primary:9092
+secondary.bootstrap.servers = broker5-secondary:9092,broker6-secondary:9092
+</code></pre>
+
+  <p>
+    Secondly, you must explicitly enable individual replication flows with <code>{source}->{target}.enabled = true</code> as needed. Remember that flows are directional: if you need two-way (bidirectional) replication, you must enable flows in both directions.
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Enable replication from primary to secondary
+primary->secondary.enable = true
+</code></pre>
+
+  <p>
+    By default, a replication flow will replicate all but a few special topics and consumer groups from the source cluster to the target cluster, and automatically detect any newly created topics and groups. The names of replicated topics in the target cluster will be prefixed with the name of the source cluster (see section further below). For example, the topic <code>foo</code> in the source cluster <code>us-west</code> would be replicated to a topic named <code>us-west.foo</code> in the target cluster <code>us-east</code>.
+  </p>
+
+  <p>
+    The subsequent sections explain how to customize this basic setup according to your needs.
+  </p>
+
+  <h5 class="anchor-heading"><a id="georeplication-flow-configure" class="anchor-link"></a><a href="#georeplication-flow-configure">Configuring Replication Flows</a></h5>
+
+  <p>
+The configuration of a replication flow is a combination of top-level default settings (e.g., <code>topics</code>), on top of which flow-specific settings, if any, are applied (e.g., <code>us-west->us-east.topics</code>). To change the top-level defaults, add the respective top-level setting to the MirrorMaker configuration file. To override the defaults for a specific replication flow only, use the syntax format <code>{source}->{target}.{config.name}</code>.
+  </p>
+
+  <p>
+    The most important settings are:
+  </p>
+
+  <ul>
+    <li><code>topics</code>: list of topics or a regular expression that defines which topics in the source cluster to replicate (default: <code>topics = .*</code>)
+    <li><code>topics.exclude</code>: list of topics or a regular expression to subsequently exclude topics that were matched by the <code>topics</code> setting (default: <code>topics.exclude = .*[\-\.]internal, .*\.replica, __.*</code>)
+    <li><code>groups</code>: list of topics or regular expression that defines which consumer groups in the source cluster to replicate (default: <code>groups = .*</code>)
+    <li><code>groups.exclude</code>: list of topics or a regular expression to subsequently exclude consumer groups that were matched by the <code>groups</code> setting (default: <code>groups.exclude = console-consumer-.*, connect-.*, __.*</code>)
+    <li><code>{source}->{target}.enable</code>: set to <code>true</code> to enable the replication flow (default: <code>false</code>)
+  </ul>
+
+  <p>
+    Example:
+  </p>
+
+<pre class="line-numbers"><code class="language-text"># Custom top-level defaults that apply to all replication flows
+topics = .*
+groups = consumer-group1, consumer-group2
+
+# Don't forget to enable a flow!
+us-west->us-east.enable = true
+
+# Custom settings for specific replication flows
+us-west->us-east.topics = foo.*
+us-west->us-east.groups = bar.*
+us-west->us-east.emit.heartbeats = false
+</code></pre>
+
+  <p>
+    Additional configuration settings are supported, some of which are listed below. In most cases, you can leave these settings at their default values. See <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorMakerConfig.java">MirrorMakerConfig</a> and <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorConnectorConfig.java">MirrorConnectorConfig</a> for further details.
+  </p>
+
+  <ul>
+    <li><code>refresh.topics.enabled</code>: whether to check for new topics in the source cluster periodically (default: true)
+    <li><code>refresh.topics.interval.seconds</code>: frequency of checking for new topics in the source cluster; lower values than the default may lead to performance degradation (default: 6000, every ten minutes)
+    <li><code>refresh.groups.enabled</code>: whether to check for new consumer groups in the source cluster periodically (default: true)
+    <li><code>refresh.groups.interval.seconds</code>: frequency of checking for new consumer groups in the source cluster; lower values than the default may lead to performance degradation (default: 6000, every ten minutes)
+    <li><code>sync.topic.configs.enabled</code>: whether to replicate topic configurations from the source cluster (default: true)
+    <li><code>sync.topic.acls.enabled</code>: whether to sync ACLs from the source cluster (default: true)
+    <li><code>emit.heartbeats.enabled</code>: whether to emit heartbeats periodically (default: true)
+    <li><code>emit.heartbeats.interval.seconds</code>: frequency at which heartbeats are emitted (default: 5, every five seconds)
+    <li><code>heartbeats.topic.replication.factor</code>: replication factor of MirrorMaker's internal heartbeat topics (default: 3)
+    <li><code>emit.checkpoints.enabled</code>: whether to emit MirrorMaker's consumer offsets periodically (default: true)
+    <li><code>emit.checkpoints.interval.seconds</code>: frequency at which checkpoints are emitted (default: 60, every minute)
+    <li><code>checkpoints.topic.replication.factor</code>: replication factor of MirrorMaker's internal checkpoints topics (default: 3)
+    <li><code>sync.group.offsets.enabled</code>: whether to periodically write the translated offsets of replicated consumer groups (in the source cluster) to <code>__consumer_offsets</code> topic in target cluster, as long as no active consumers in that group are connected to the target cluster (default: true)
+    <li><code>sync.group.offsets.interval.seconds</code>: frequency at which consumer group offsets are synced (default: 60, every minute)
+    <li><code>offset-syncs.topic.replication.factor</code>: replication factor of MirrorMaker's internal offset-sync topics (default: 3)
+  </ul>
+
+  <h5 class="anchor-heading"><a id="georeplication-flow-secure" class="anchor-link"></a><a href="#georeplication-flow-secure">Securing Replication Flows</a></h5>
+
+  <p>
+    MirrorMaker supports the same <a href="#connectconfigs">security settings as Kafka Connect</a>, so please refer to the linked section for further information.
+  </p>
+
+  <p>
+    Example: Encrypt communication between MirrorMaker and the <code>us-east</code> cluster.
+  </p>
+
+<pre class="line-numbers"><code class="language-text">us-east.security.protocol=SSL
+us-east.ssl.truststore.location=/path/to/truststore.jks
+us-east.ssl.truststore.password=my-secret-password
+us-east.ssl.keystore.location=/path/to/keystore.jks
+us-east.ssl.keystore.password=my-secret-password
+us-east.ssl.key.password=my-secret-password
+</code></pre>
+
+  <h5 class="anchor-heading"><a id="georeplication-topic-naming" class="anchor-link"></a><a href="#georeplication-topic-naming">Custom Naming of Replicated Topics in Target Clusters</a></h5>
+
+  <p>
+    Replicated topics in a target cluster—sometimes called <em>remote</em> topics—are renamed according to a replication policy. MirrorMaker uses this policy to ensure that events (aka records, messages) from different clusters are not written to the same topic-partition. By default as per <a href="https://github.com/apache/kafka/blob/trunk/connect/mirror-client/src/main/java/org/apache/kafka/connect/mirror/DefaultReplicationPolicy.java">DefaultReplicationPolicy</a>, the names of replicated topics in the target clusters have the format <code>{source}.{source_topic_name}</code>:

Review comment:
       The use of "events" has become more prevalent, see e.g. the updated introduction page (https://kafka.apache.org/intro) and quickstart (https://kafka.apache.org/quickstart). Hence I prefer to leave the text as-is. The included "(aka records, messages)" parenthesis makes the nomenclature sufficiently clear to the reader, imho.

##########
File path: 27/ops.html
##########
@@ -553,7 +539,558 @@ <h3 class="anchor-heading"><a id="datacenters" class="anchor-link"></a><a href="
   <p>
   It is generally <i>not</i> advisable to run a <i>single</i> Kafka cluster that spans multiple datacenters over a high-latency link. This will incur very high replication latency both for Kafka writes and ZooKeeper writes, and neither Kafka nor ZooKeeper will remain available in all locations if the network between locations is unavailable.
 
-  <h3 class="anchor-heading"><a id="config" class="anchor-link"></a><a href="#config">6.3 Kafka Configuration</a></h3>
+  <h3 class="anchor-heading"><a id="georeplication" class="anchor-link"></a><a href="#georeplication">6.3 Geo-Replication (Cross-Cluster Data Mirroring)</a></h3>
+
+  <h4 class="anchor-heading"><a id="georeplication-overview" class="anchor-link"></a><a href="#georeplication-overview">Geo-Replication Overview</a></h4>
+
+  <p>
+    Kafka administrators can define data flows that cross the boundaries of individual Kafka clusters, data centers, or geo-regions. Such event streaming setups are often needed for organizational, technical, or legal requirements. Common scenarios include:
+  </p>
+
+  <ul>
+    <li>Geo-replication</li>
+    <li>Disaster recovery</li>
+    <li>Feeding edge clusters into a central, aggregate cluster</li>
+    <li>Physical isolation of clusters (such as production vs. testing)</li>
+    <li>Cloud migration or hybrid cloud deployments</li>
+    <li>Legal and compliance requirements</li>
+  </ul>
+
+  <p>
+    Administrators can set up such inter-cluster data flows with Kafka's MirrorMaker (version 2), a tool to replicate data between different Kafka environments in a streaming manner. MirrorMaker is built on top of the Kafka Connect framework and supports features such as:
+  </p>
+
+  <ul>
+    <li>Replicates topics (data plus configurations)</li>
+    <li>Replicates consumer groups including offsets to migrate applications between clusters</li>
+    <li>Replicates ACLs</li>
+    <li>Preserves partitioning</li>
+    <li>Automatically detects new topics and partitions</li>
+    <li>Provides a wide range of metrics, such as end-to-end replication latency across multiple data centers/clusters</li>
+    <li>Fault-tolerant and horizontally scalable operations</li>
+  </ul>
+
+  <p>
+  <em>Note: Geo-replication with MirrorMaker replicates data across Kafka clusters. This inter-cluster replication is different from Kafka's <a href="#replication">intra-cluster replication</a>, which replicates data within the same Kafka cluster.</em>
+  </p>
+
+  <h4 class="anchor-heading"><a id="georeplication-flows" class="anchor-link"></a><a href="#georeplication-flows">What Are Replication Flows</a></h4>
+
+  <p>
+    With MirrorMaker, Kafka administrators can replicate topics, topic configurations, consumer groups and their offsets, and ACLs from one or more source Kafka clusters to one or more target Kafka clusters, i.e., across cluster environments. In a nutshell, MirrorMaker consumes data from the source cluster with source connectors, and then replicates the data by producing to the target cluster with sink connectors.

Review comment:
       Thanks, @ryannedolan. Text updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org