You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Ananya Sen <an...@gmail.com> on 2020/07/11 06:37:56 UTC

Mirror Maker 2.0 Queries

Hi

I was exploring the Mirror maker 2.0. I read through this
https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
documentation
and I have  a few questions.

   1. For running mirror maker as a dedicated mirror maker cluster, the
   documentation specifies a config file and a starter script. Is this mirror
   maker process distributed ?
   2. I could not find any port configuration for the above mirror maker
   process, So can we configure mirror maker itself to run as a cluster i.e
   running the process instance across multiple server to avoid downtime due
   to server crash.
   3. If we could somehow run the mirror maker as a distributed process
   then does that mean that topic and consumer offset replication will be
   shared among those mirror maker processes?
   4. What is the default port of this mirror maker process and how can we
   override it?

Looking forward to your reply.


Thanks & Regards
Ananya Sen

Re: Mirror Maker 2.0 Queries

Posted by Ryanne Dolan <ry...@gmail.com>.
Ananya, yes the driver is distributed, but each worker only communicates
via kafka. They do not listen on any ports.

Ryanne

On Sat, Jul 11, 2020, 11:28 AM Ananya Sen <an...@gmail.com> wrote:

> Hi
>
> I was exploring the Mirror maker 2.0. I read through this
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> documentation
> and I have  a few questions.
>
>    1. For running mirror maker as a dedicated mirror maker cluster, the
>    documentation specifies a config file and a starter script. Is this
> mirror
>    maker process distributed ?
>    2. I could not find any port configuration for the above mirror maker
>    process, So can we configure mirror maker itself to run as a cluster i.e
>    running the process instance across multiple server to avoid downtime
> due
>    to server crash.
>    3. If we could somehow run the mirror maker as a distributed process
>    then does that mean that topic and consumer offset replication will be
>    shared among those mirror maker processes?
>    4. What is the default port of this mirror maker process and how can we
>    override it?
>
> Looking forward to your reply.
>
>
> Thanks & Regards
> Ananya Sen
>

Re: Mirror Maker 2.0 Queries

Posted by Ananya Sen <an...@gmail.com>.
Any help here would be greatly appreciated.

On Sat, Aug 8, 2020, 12:13 PM Ananya Sen <an...@gmail.com> wrote:

> Thank you Ryanne for the quick response.
> I further want to clarify a few points.
>
> The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka
> connect we have multiple workers and each worker has some assigned task. To
> map this to Mirror Maker 2.0, A mirror Maker will driver have some workers.
>
> 1) Can this number of workers be configured?
> 2) What is the default value of this worker configuration?
> 3) Does every topic partition given a new task?
> 4) Does every consumer group - topic pair given a new task for replicating
> offset?
>
> Also, consider a case where I have 1000 topics in a Kafka cluster and each
> topic has a high amount of data + new data is being written at high
> throughput. Now I want to set up a mirror maker 2.0 on this cluster to
> replicate all the old data (which is retained in the topic) as well as the
> new incoming data in a backup cluster. How can I scale up the mirror maker
> instance so that I can have very little lag?
>
> On 2020/07/11 06:37:56, Ananya Sen <an...@gmail.com> wrote:
> > Hi
> >
> > I was exploring the Mirror maker 2.0. I read through this
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > documentation
> > and I have  a few questions.
> >
> >    1. For running mirror maker as a dedicated mirror maker cluster, the
> >    documentation specifies a config file and a starter script. Is this
> mirror
> >    maker process distributed ?
> >    2. I could not find any port configuration for the above mirror maker
> >    process, So can we configure mirror maker itself to run as a cluster
> i.e
> >    running the process instance across multiple server to avoid downtime
> due
> >    to server crash.
> >    3. If we could somehow run the mirror maker as a distributed process
> >    then does that mean that topic and consumer offset replication will be
> >    shared among those mirror maker processes?
> >    4. What is the default port of this mirror maker process and how can
> we
> >    override it?
> >
> > Looking forward to your reply.
> >
> >
> > Thanks & Regards
> > Ananya Sen
> >
>

Re: Mirror Maker 2.0 Queries

Posted by Ananya Sen <an...@gmail.com>.
Thanks a lot Ryanne. That was really very helpful.

On Thu, Aug 20, 2020, 11:49 PM Ryanne Dolan <ry...@gmail.com> wrote:

> > Can we configure tasks.max for each of these connectors separately?
>
> I don't believe that's currently possible. If you need fine-grained control
> over each Connector like that, you might consider running MM2's Connectors
> manually on a bunch of Connect clusters. This requires more effort to set
> up, but enables you to control the configuration of each Connector using
> the Connect REST API.
>
> Ryanne
>
> On Thu, Aug 20, 2020 at 12:30 PM Ananya Sen <an...@gmail.com>
> wrote:
>
> > Thanks, Ryanne. That answers my questions. I was actually missing this
> > "tasks.max" property. Thanks for pointing that out.
> >
> > Furthermore, as per the KIP of Mirror Maker 2.0, there are 3 types of
> > connectors in a Mirror Maker Cluster:
> >
> >    1. KafkaSourceConnector - focus on replicating topic partitions
> >    2. KafkaCheckpointConnector - focus on replicating consumer groups
> >    3. KafkaHeartbeatConnector - focus on checking cluster availability
> >
> > *Can we configure tasks.max for each of these connectors separately? That
> > is, Can I have 3 tasks for KafkaSourceConnector, 5
> > for KafkaCheckpointConnector, and 1 for KafkaHeartbeatConnector?*
> >
> >
> >
> > Regards
> > Ananya Sen
> >
> > On Thu, Aug 20, 2020 at 6:39 PM Ryanne Dolan <ry...@gmail.com>
> > wrote:
> >
> > > Ananya, see responses below.
> > >
> > > > Can this number of workers be configured?
> > >
> > > The number of workers is not exactly configurable, but you can control
> it
> > > by spinning up drivers and using the '--clusters' flag. A driver
> instance
> > > without '--clusters' will run one worker for each A->B replication
> flow.
> > So
> > > e.g. if you've got two clusters being replicated bidirectionally,
> you'll
> > > have an A->B worker and a B->A worker on each MM2 driver.
> > >
> > > You can use the '--clusters' flag to limit what clusters are targeted
> > for a
> > > given driver, which is useful in many ways, including to limit the
> number
> > > of workers for a given worker. So e.g. if you've got 10 clusters all
> > being
> > > replicated in a full mesh you can run a driver with '--clusters A' and
> it
> > > will have only 9 workers, one for each of the other clusters.
> > >
> > > Also note that there is a configuration property 'tasks.max' that
> > controls
> > > the number of tasks available to workers. Each A->B flow is replicated
> > by a
> > > Herd of Workers (in Connect terminology), and Herds work on Tasks. By
> > > default, 'tasks.max' is one, which means there will only be one task
> for
> > > each Herd, regardless of how many drivers and workers you spin up. You
> > > definitely want to change this property. You can tweak this for each
> A->B
> > > replication flow independently to strike the right balance. If
> > 'tasks.max'
> > > is the same or more than the total number of topic-partitions being
> > > replicated, it will mean each topic-partition is replicated in a
> > dedicated
> > > task, which is probably not an efficient use of resource overhead.
> > >
> > > > Does every topic partition given a new task?
> > >
> > > No, topic-partitions are spread out across tasks. Each topic's
> partitions
> > > are divided round-robin among available tasks. However, keep in mind
> that
> > > if 'tasks.max' is too high, you could end up with one topic-partition
> in
> > > each task.
> > >
> > > > Does every consumer group - topic pair given a new task for
> replicating
> > > offset?
> > >
> > > No, consumer-groups are also spread out across tasks. As with
> > > topic-partitions, 'tasks.max' applies.
> > >
> > > > How can I scale up the mirror maker instance so that I can have very
> > > little lag?
> > >
> > > Tweak 'tasks.max' and spin up more driver instances.
> > >
> > > Ryanne
> > >
> > > On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen <an...@gmail.com>
> > wrote:
> > >
> > > > Thank you Ryanne for the quick response.
> > > > I further want to clarify a few points.
> > > >
> > > > The mirror maker 2.0 is based on the Kafka Connect framework. In
> Kafka
> > > > connect we have multiple workers and each worker has some assigned
> > task.
> > > To
> > > > map this to Mirror Maker 2.0, A mirror Maker will driver have some
> > > workers.
> > > >
> > > > 1) Can this number of workers be configured?
> > > > 2) What is the default value of this worker configuration?
> > > > 3) Does every topic partition given a new task?
> > > > 4) Does every consumer group - topic pair given a new task for
> > > replicating
> > > > offset?
> > > >
> > > > Also, consider a case where I have 1000 topics in a Kafka cluster and
> > > each
> > > > topic has a high amount of data + new data is being written at high
> > > > throughput. Now I want to set up a mirror maker 2.0 on this cluster
> to
> > > > replicate all the old data (which is retained in the topic) as well
> as
> > > the
> > > > new incoming data in a backup cluster. How can I scale up the mirror
> > > maker
> > > > instance so that I can have very little lag?
> > > >
> > > > On 2020/07/11 06:37:56, Ananya Sen <an...@gmail.com> wrote:
> > > > > Hi
> > > > >
> > > > > I was exploring the Mirror maker 2.0. I read through this
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > > > > documentation
> > > > > and I have  a few questions.
> > > > >
> > > > >    1. For running mirror maker as a dedicated mirror maker cluster,
> > the
> > > > >    documentation specifies a config file and a starter script. Is
> > this
> > > > mirror
> > > > >    maker process distributed ?
> > > > >    2. I could not find any port configuration for the above mirror
> > > maker
> > > > >    process, So can we configure mirror maker itself to run as a
> > cluster
> > > > i.e
> > > > >    running the process instance across multiple server to avoid
> > > downtime
> > > > due
> > > > >    to server crash.
> > > > >    3. If we could somehow run the mirror maker as a distributed
> > process
> > > > >    then does that mean that topic and consumer offset replication
> > will
> > > be
> > > > >    shared among those mirror maker processes?
> > > > >    4. What is the default port of this mirror maker process and how
> > can
> > > > we
> > > > >    override it?
> > > > >
> > > > > Looking forward to your reply.
> > > > >
> > > > >
> > > > > Thanks & Regards
> > > > > Ananya Sen
> > > > >
> > > >
> > >
> >
>

Re: Mirror Maker 2.0 Queries

Posted by Ryanne Dolan <ry...@gmail.com>.
> Can we configure tasks.max for each of these connectors separately?

I don't believe that's currently possible. If you need fine-grained control
over each Connector like that, you might consider running MM2's Connectors
manually on a bunch of Connect clusters. This requires more effort to set
up, but enables you to control the configuration of each Connector using
the Connect REST API.

Ryanne

On Thu, Aug 20, 2020 at 12:30 PM Ananya Sen <an...@gmail.com> wrote:

> Thanks, Ryanne. That answers my questions. I was actually missing this
> "tasks.max" property. Thanks for pointing that out.
>
> Furthermore, as per the KIP of Mirror Maker 2.0, there are 3 types of
> connectors in a Mirror Maker Cluster:
>
>    1. KafkaSourceConnector - focus on replicating topic partitions
>    2. KafkaCheckpointConnector - focus on replicating consumer groups
>    3. KafkaHeartbeatConnector - focus on checking cluster availability
>
> *Can we configure tasks.max for each of these connectors separately? That
> is, Can I have 3 tasks for KafkaSourceConnector, 5
> for KafkaCheckpointConnector, and 1 for KafkaHeartbeatConnector?*
>
>
>
> Regards
> Ananya Sen
>
> On Thu, Aug 20, 2020 at 6:39 PM Ryanne Dolan <ry...@gmail.com>
> wrote:
>
> > Ananya, see responses below.
> >
> > > Can this number of workers be configured?
> >
> > The number of workers is not exactly configurable, but you can control it
> > by spinning up drivers and using the '--clusters' flag. A driver instance
> > without '--clusters' will run one worker for each A->B replication flow.
> So
> > e.g. if you've got two clusters being replicated bidirectionally, you'll
> > have an A->B worker and a B->A worker on each MM2 driver.
> >
> > You can use the '--clusters' flag to limit what clusters are targeted
> for a
> > given driver, which is useful in many ways, including to limit the number
> > of workers for a given worker. So e.g. if you've got 10 clusters all
> being
> > replicated in a full mesh you can run a driver with '--clusters A' and it
> > will have only 9 workers, one for each of the other clusters.
> >
> > Also note that there is a configuration property 'tasks.max' that
> controls
> > the number of tasks available to workers. Each A->B flow is replicated
> by a
> > Herd of Workers (in Connect terminology), and Herds work on Tasks. By
> > default, 'tasks.max' is one, which means there will only be one task for
> > each Herd, regardless of how many drivers and workers you spin up. You
> > definitely want to change this property. You can tweak this for each A->B
> > replication flow independently to strike the right balance. If
> 'tasks.max'
> > is the same or more than the total number of topic-partitions being
> > replicated, it will mean each topic-partition is replicated in a
> dedicated
> > task, which is probably not an efficient use of resource overhead.
> >
> > > Does every topic partition given a new task?
> >
> > No, topic-partitions are spread out across tasks. Each topic's partitions
> > are divided round-robin among available tasks. However, keep in mind that
> > if 'tasks.max' is too high, you could end up with one topic-partition in
> > each task.
> >
> > > Does every consumer group - topic pair given a new task for replicating
> > offset?
> >
> > No, consumer-groups are also spread out across tasks. As with
> > topic-partitions, 'tasks.max' applies.
> >
> > > How can I scale up the mirror maker instance so that I can have very
> > little lag?
> >
> > Tweak 'tasks.max' and spin up more driver instances.
> >
> > Ryanne
> >
> > On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen <an...@gmail.com>
> wrote:
> >
> > > Thank you Ryanne for the quick response.
> > > I further want to clarify a few points.
> > >
> > > The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka
> > > connect we have multiple workers and each worker has some assigned
> task.
> > To
> > > map this to Mirror Maker 2.0, A mirror Maker will driver have some
> > workers.
> > >
> > > 1) Can this number of workers be configured?
> > > 2) What is the default value of this worker configuration?
> > > 3) Does every topic partition given a new task?
> > > 4) Does every consumer group - topic pair given a new task for
> > replicating
> > > offset?
> > >
> > > Also, consider a case where I have 1000 topics in a Kafka cluster and
> > each
> > > topic has a high amount of data + new data is being written at high
> > > throughput. Now I want to set up a mirror maker 2.0 on this cluster to
> > > replicate all the old data (which is retained in the topic) as well as
> > the
> > > new incoming data in a backup cluster. How can I scale up the mirror
> > maker
> > > instance so that I can have very little lag?
> > >
> > > On 2020/07/11 06:37:56, Ananya Sen <an...@gmail.com> wrote:
> > > > Hi
> > > >
> > > > I was exploring the Mirror maker 2.0. I read through this
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > > > documentation
> > > > and I have  a few questions.
> > > >
> > > >    1. For running mirror maker as a dedicated mirror maker cluster,
> the
> > > >    documentation specifies a config file and a starter script. Is
> this
> > > mirror
> > > >    maker process distributed ?
> > > >    2. I could not find any port configuration for the above mirror
> > maker
> > > >    process, So can we configure mirror maker itself to run as a
> cluster
> > > i.e
> > > >    running the process instance across multiple server to avoid
> > downtime
> > > due
> > > >    to server crash.
> > > >    3. If we could somehow run the mirror maker as a distributed
> process
> > > >    then does that mean that topic and consumer offset replication
> will
> > be
> > > >    shared among those mirror maker processes?
> > > >    4. What is the default port of this mirror maker process and how
> can
> > > we
> > > >    override it?
> > > >
> > > > Looking forward to your reply.
> > > >
> > > >
> > > > Thanks & Regards
> > > > Ananya Sen
> > > >
> > >
> >
>

Re: Mirror Maker 2.0 Queries

Posted by Ananya Sen <an...@gmail.com>.
Thanks, Ryanne. That answers my questions. I was actually missing this
"tasks.max" property. Thanks for pointing that out.

Furthermore, as per the KIP of Mirror Maker 2.0, there are 3 types of
connectors in a Mirror Maker Cluster:

   1. KafkaSourceConnector - focus on replicating topic partitions
   2. KafkaCheckpointConnector - focus on replicating consumer groups
   3. KafkaHeartbeatConnector - focus on checking cluster availability

*Can we configure tasks.max for each of these connectors separately? That
is, Can I have 3 tasks for KafkaSourceConnector, 5
for KafkaCheckpointConnector, and 1 for KafkaHeartbeatConnector?*



Regards
Ananya Sen

On Thu, Aug 20, 2020 at 6:39 PM Ryanne Dolan <ry...@gmail.com> wrote:

> Ananya, see responses below.
>
> > Can this number of workers be configured?
>
> The number of workers is not exactly configurable, but you can control it
> by spinning up drivers and using the '--clusters' flag. A driver instance
> without '--clusters' will run one worker for each A->B replication flow. So
> e.g. if you've got two clusters being replicated bidirectionally, you'll
> have an A->B worker and a B->A worker on each MM2 driver.
>
> You can use the '--clusters' flag to limit what clusters are targeted for a
> given driver, which is useful in many ways, including to limit the number
> of workers for a given worker. So e.g. if you've got 10 clusters all being
> replicated in a full mesh you can run a driver with '--clusters A' and it
> will have only 9 workers, one for each of the other clusters.
>
> Also note that there is a configuration property 'tasks.max' that controls
> the number of tasks available to workers. Each A->B flow is replicated by a
> Herd of Workers (in Connect terminology), and Herds work on Tasks. By
> default, 'tasks.max' is one, which means there will only be one task for
> each Herd, regardless of how many drivers and workers you spin up. You
> definitely want to change this property. You can tweak this for each A->B
> replication flow independently to strike the right balance. If 'tasks.max'
> is the same or more than the total number of topic-partitions being
> replicated, it will mean each topic-partition is replicated in a dedicated
> task, which is probably not an efficient use of resource overhead.
>
> > Does every topic partition given a new task?
>
> No, topic-partitions are spread out across tasks. Each topic's partitions
> are divided round-robin among available tasks. However, keep in mind that
> if 'tasks.max' is too high, you could end up with one topic-partition in
> each task.
>
> > Does every consumer group - topic pair given a new task for replicating
> offset?
>
> No, consumer-groups are also spread out across tasks. As with
> topic-partitions, 'tasks.max' applies.
>
> > How can I scale up the mirror maker instance so that I can have very
> little lag?
>
> Tweak 'tasks.max' and spin up more driver instances.
>
> Ryanne
>
> On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen <an...@gmail.com> wrote:
>
> > Thank you Ryanne for the quick response.
> > I further want to clarify a few points.
> >
> > The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka
> > connect we have multiple workers and each worker has some assigned task.
> To
> > map this to Mirror Maker 2.0, A mirror Maker will driver have some
> workers.
> >
> > 1) Can this number of workers be configured?
> > 2) What is the default value of this worker configuration?
> > 3) Does every topic partition given a new task?
> > 4) Does every consumer group - topic pair given a new task for
> replicating
> > offset?
> >
> > Also, consider a case where I have 1000 topics in a Kafka cluster and
> each
> > topic has a high amount of data + new data is being written at high
> > throughput. Now I want to set up a mirror maker 2.0 on this cluster to
> > replicate all the old data (which is retained in the topic) as well as
> the
> > new incoming data in a backup cluster. How can I scale up the mirror
> maker
> > instance so that I can have very little lag?
> >
> > On 2020/07/11 06:37:56, Ananya Sen <an...@gmail.com> wrote:
> > > Hi
> > >
> > > I was exploring the Mirror maker 2.0. I read through this
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > > documentation
> > > and I have  a few questions.
> > >
> > >    1. For running mirror maker as a dedicated mirror maker cluster, the
> > >    documentation specifies a config file and a starter script. Is this
> > mirror
> > >    maker process distributed ?
> > >    2. I could not find any port configuration for the above mirror
> maker
> > >    process, So can we configure mirror maker itself to run as a cluster
> > i.e
> > >    running the process instance across multiple server to avoid
> downtime
> > due
> > >    to server crash.
> > >    3. If we could somehow run the mirror maker as a distributed process
> > >    then does that mean that topic and consumer offset replication will
> be
> > >    shared among those mirror maker processes?
> > >    4. What is the default port of this mirror maker process and how can
> > we
> > >    override it?
> > >
> > > Looking forward to your reply.
> > >
> > >
> > > Thanks & Regards
> > > Ananya Sen
> > >
> >
>

Re: Mirror Maker 2.0 Queries

Posted by Ryanne Dolan <ry...@gmail.com>.
Ananya, see responses below.

> Can this number of workers be configured?

The number of workers is not exactly configurable, but you can control it
by spinning up drivers and using the '--clusters' flag. A driver instance
without '--clusters' will run one worker for each A->B replication flow. So
e.g. if you've got two clusters being replicated bidirectionally, you'll
have an A->B worker and a B->A worker on each MM2 driver.

You can use the '--clusters' flag to limit what clusters are targeted for a
given driver, which is useful in many ways, including to limit the number
of workers for a given worker. So e.g. if you've got 10 clusters all being
replicated in a full mesh you can run a driver with '--clusters A' and it
will have only 9 workers, one for each of the other clusters.

Also note that there is a configuration property 'tasks.max' that controls
the number of tasks available to workers. Each A->B flow is replicated by a
Herd of Workers (in Connect terminology), and Herds work on Tasks. By
default, 'tasks.max' is one, which means there will only be one task for
each Herd, regardless of how many drivers and workers you spin up. You
definitely want to change this property. You can tweak this for each A->B
replication flow independently to strike the right balance. If 'tasks.max'
is the same or more than the total number of topic-partitions being
replicated, it will mean each topic-partition is replicated in a dedicated
task, which is probably not an efficient use of resource overhead.

> Does every topic partition given a new task?

No, topic-partitions are spread out across tasks. Each topic's partitions
are divided round-robin among available tasks. However, keep in mind that
if 'tasks.max' is too high, you could end up with one topic-partition in
each task.

> Does every consumer group - topic pair given a new task for replicating
offset?

No, consumer-groups are also spread out across tasks. As with
topic-partitions, 'tasks.max' applies.

> How can I scale up the mirror maker instance so that I can have very
little lag?

Tweak 'tasks.max' and spin up more driver instances.

Ryanne

On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen <an...@gmail.com> wrote:

> Thank you Ryanne for the quick response.
> I further want to clarify a few points.
>
> The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka
> connect we have multiple workers and each worker has some assigned task. To
> map this to Mirror Maker 2.0, A mirror Maker will driver have some workers.
>
> 1) Can this number of workers be configured?
> 2) What is the default value of this worker configuration?
> 3) Does every topic partition given a new task?
> 4) Does every consumer group - topic pair given a new task for replicating
> offset?
>
> Also, consider a case where I have 1000 topics in a Kafka cluster and each
> topic has a high amount of data + new data is being written at high
> throughput. Now I want to set up a mirror maker 2.0 on this cluster to
> replicate all the old data (which is retained in the topic) as well as the
> new incoming data in a backup cluster. How can I scale up the mirror maker
> instance so that I can have very little lag?
>
> On 2020/07/11 06:37:56, Ananya Sen <an...@gmail.com> wrote:
> > Hi
> >
> > I was exploring the Mirror maker 2.0. I read through this
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > documentation
> > and I have  a few questions.
> >
> >    1. For running mirror maker as a dedicated mirror maker cluster, the
> >    documentation specifies a config file and a starter script. Is this
> mirror
> >    maker process distributed ?
> >    2. I could not find any port configuration for the above mirror maker
> >    process, So can we configure mirror maker itself to run as a cluster
> i.e
> >    running the process instance across multiple server to avoid downtime
> due
> >    to server crash.
> >    3. If we could somehow run the mirror maker as a distributed process
> >    then does that mean that topic and consumer offset replication will be
> >    shared among those mirror maker processes?
> >    4. What is the default port of this mirror maker process and how can
> we
> >    override it?
> >
> > Looking forward to your reply.
> >
> >
> > Thanks & Regards
> > Ananya Sen
> >
>

Re: Mirror Maker 2.0 Queries

Posted by Ananya Sen <an...@gmail.com>.
Thank you Ryanne for the quick response. 
I further want to clarify a few points.

The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka connect we have multiple workers and each worker has some assigned task. To map this to Mirror Maker 2.0, A mirror Maker will driver have some workers. 

1) Can this number of workers be configured? 
2) What is the default value of this worker configuration? 
3) Does every topic partition given a new task?
4) Does every consumer group - topic pair given a new task for replicating offset?

Also, consider a case where I have 1000 topics in a Kafka cluster and each topic has a high amount of data + new data is being written at high throughput. Now I want to set up a mirror maker 2.0 on this cluster to replicate all the old data (which is retained in the topic) as well as the new incoming data in a backup cluster. How can I scale up the mirror maker instance so that I can have very little lag? 

On 2020/07/11 06:37:56, Ananya Sen <an...@gmail.com> wrote: 
> Hi
> 
> I was exploring the Mirror maker 2.0. I read through this
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> documentation
> and I have  a few questions.
> 
>    1. For running mirror maker as a dedicated mirror maker cluster, the
>    documentation specifies a config file and a starter script. Is this mirror
>    maker process distributed ?
>    2. I could not find any port configuration for the above mirror maker
>    process, So can we configure mirror maker itself to run as a cluster i.e
>    running the process instance across multiple server to avoid downtime due
>    to server crash.
>    3. If we could somehow run the mirror maker as a distributed process
>    then does that mean that topic and consumer offset replication will be
>    shared among those mirror maker processes?
>    4. What is the default port of this mirror maker process and how can we
>    override it?
> 
> Looking forward to your reply.
> 
> 
> Thanks & Regards
> Ananya Sen
>