You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Alex Melville <am...@g.hmc.edu> on 2015/02/22 01:18:10 UTC

Best Way To Verify MirrorMaker Copy

Howdy Kafka Team,


We are trying to aggregate every topic on different geo-separate clusters
all into one central kafka cluster. We have the guarantee that the number
of partitions for a given topic will be the same on the source and target
clusters. Due to our particular use case, we need to make sure that the
ordering of the events in any given partition on a source cluster is in
exactly the same order on the corresponding partition in the target cluster.

So far we've use our custom producer to push messages that use a String key
and byte[] message type to the source cluster. But when we go to use the
Mirrormaker to copy from the source to the target cluster, if we use the
same partitioner that our custom producer uses then we get an error saying "[B
cannot be cast to java.lang.String". We understand this to mean that the MM
consumer is trying to partition the source cluster's data using a String
key, but since the message residing on the source cluster is in byte[]
form, using a String key makes no sense. However we need the producer that
pushes to the target cluster to use the exact same partitioning scheme our
custom producer used, so that the ordering on the source and target
partitions is exactly the same. How can we ensure this?


Once we have correctly mirrored exactly ordered partitions, what is the
best way to verify that the source and target partitions do store messages
in the exact same order? Right now we are thinking about writing a
SimpleConsumer that iterates through the logs of source and target
partition, comparing them to each other as the iteration ensues, but it'd
be nice if there was an existing tool for doing this, or if could have some
guarantee that the MM will retain partition ordering by default.


Cheers,


Alex Melville

Re: Best Way To Verify MirrorMaker Copy

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.
If you are using old producer for mirror maker, you can specify a custom
partitioner for the mirror maker producer which has exact same logic to
partition message as your custom producer does. If you are using new java
producer, currently there is no way to do it. We are working on adding a
message handler to mirror maker, after that you may use the message
handler to specify which partition you want to send each message to.

In terms of verification, I think consuming all the messages and compare
them probably is still necessary for a strong guarantee. I donĀ¹t think we
have tools available for data verification.

-Jiangjie (Becket) Qin

On 2/21/15, 4:18 PM, "Alex Melville" <am...@g.hmc.edu> wrote:

>Howdy Kafka Team,
>
>
>We are trying to aggregate every topic on different geo-separate clusters
>all into one central kafka cluster. We have the guarantee that the number
>of partitions for a given topic will be the same on the source and target
>clusters. Due to our particular use case, we need to make sure that the
>ordering of the events in any given partition on a source cluster is in
>exactly the same order on the corresponding partition in the target
>cluster.
>
>So far we've use our custom producer to push messages that use a String
>key
>and byte[] message type to the source cluster. But when we go to use the
>Mirrormaker to copy from the source to the target cluster, if we use the
>same partitioner that our custom producer uses then we get an error
>saying "[B
>cannot be cast to java.lang.String". We understand this to mean that the
>MM
>consumer is trying to partition the source cluster's data using a String
>key, but since the message residing on the source cluster is in byte[]
>form, using a String key makes no sense. However we need the producer that
>pushes to the target cluster to use the exact same partitioning scheme our
>custom producer used, so that the ordering on the source and target
>partitions is exactly the same. How can we ensure this?
>
>
>Once we have correctly mirrored exactly ordered partitions, what is the
>best way to verify that the source and target partitions do store messages
>in the exact same order? Right now we are thinking about writing a
>SimpleConsumer that iterates through the logs of source and target
>partition, comparing them to each other as the iteration ensues, but it'd
>be nice if there was an existing tool for doing this, or if could have
>some
>guarantee that the MM will retain partition ordering by default.
>
>
>Cheers,
>
>
>Alex Melville