You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jiri Humpolicek <Ji...@seznam.cz> on 2018/01/11 08:46:12 UTC
Does MirrorMaker ensures exactly-once delivery across clusters?
Hi Everyone,
since kafka 0.11.x supports exactly-once semantics, I want to be sure, that
it is possible to achieve exactly-once delivery across kafka clusters using
MirrorMaker.
We have got two locations with "primary" cluster in each location and for
each location we have got one "aggregation" cluster which mirrors data from
all primary clusters.
Currently we deduplicate messages after copying data from aggregation kafka
to HDFS by separete YARN application. But in aggregation kafka duplicates
remains. So I want to ensure that there are no duplicates and data loss in
kafka as well. In this case our deduplication yarn application could not be
use anymore.
If it is possible, how to configure MirrorMaker to achieve exactly-once
delivery across primary and aggregation clusters?
Thanks and have a nice day, Jiri Humpolicek
Re: Does MirrorMaker ensures exactly-once delivery across clusters?
Posted by "Matthias J. Sax" <ma...@confluent.io>.
From a transaction point of view yes.
However, the MirrorMake consumer must know to read its offsets from the
target cluster instead of the source cluster, and this is quite
unnatural for a consumer... So it's a little bit trickier than just
picky backing commits on the producer...
-Matthias
On 1/11/18 6:45 PM, Stephane Maarek wrote:
> One could refactor MirrorMaker to commit the source cluster's offset in the target cluster's instead (in a special topic)
> This would technically allow achieving exactly once using the Transactional API.
>
> But there's work associated with that
> Let me know if I’m missing something
>
> On 12/1/18, 6:15 am, "Matthias J. Sax" <ma...@confluent.io> wrote:
>
> No.
>
> Transactions are designed to work within a single cluster, not cross
> cluster, ie, if you have a read-process-write pattern similar to what
> Kafka Streams does.
>
> -Matthias
>
> On 1/11/18 12:46 AM, Jiri Humpolicek wrote:
> > Hi Everyone,
> >
> > since kafka 0.11.x supports exactly-once semantics, I want to be sure, that
> > it is possible to achieve exactly-once delivery across kafka clusters using
> > MirrorMaker.
> >
> > We have got two locations with "primary" cluster in each location and for
> > each location we have got one "aggregation" cluster which mirrors data from
> > all primary clusters.
> >
> > Currently we deduplicate messages after copying data from aggregation kafka
> > to HDFS by separete YARN application. But in aggregation kafka duplicates
> > remains. So I want to ensure that there are no duplicates and data loss in
> > kafka as well. In this case our deduplication yarn application could not be
> > use anymore.
> >
> > If it is possible, how to configure MirrorMaker to achieve exactly-once
> > delivery across primary and aggregation clusters?
> >
> >
> > Thanks and have a nice day, Jiri Humpolicek
> >
>
>
>
>
Re: Does MirrorMaker ensures exactly-once delivery across clusters?
Posted by Stephane Maarek <st...@simplemachines.com.au>.
One could refactor MirrorMaker to commit the source cluster's offset in the target cluster's instead (in a special topic)
This would technically allow achieving exactly once using the Transactional API.
But there's work associated with that
Let me know if I’m missing something
On 12/1/18, 6:15 am, "Matthias J. Sax" <ma...@confluent.io> wrote:
No.
Transactions are designed to work within a single cluster, not cross
cluster, ie, if you have a read-process-write pattern similar to what
Kafka Streams does.
-Matthias
On 1/11/18 12:46 AM, Jiri Humpolicek wrote:
> Hi Everyone,
>
> since kafka 0.11.x supports exactly-once semantics, I want to be sure, that
> it is possible to achieve exactly-once delivery across kafka clusters using
> MirrorMaker.
>
> We have got two locations with "primary" cluster in each location and for
> each location we have got one "aggregation" cluster which mirrors data from
> all primary clusters.
>
> Currently we deduplicate messages after copying data from aggregation kafka
> to HDFS by separete YARN application. But in aggregation kafka duplicates
> remains. So I want to ensure that there are no duplicates and data loss in
> kafka as well. In this case our deduplication yarn application could not be
> use anymore.
>
> If it is possible, how to configure MirrorMaker to achieve exactly-once
> delivery across primary and aggregation clusters?
>
>
> Thanks and have a nice day, Jiri Humpolicek
>
Re: Does MirrorMaker ensures exactly-once delivery across clusters?
Posted by "Matthias J. Sax" <ma...@confluent.io>.
No.
Transactions are designed to work within a single cluster, not cross
cluster, ie, if you have a read-process-write pattern similar to what
Kafka Streams does.
-Matthias
On 1/11/18 12:46 AM, Jiri Humpolicek wrote:
> Hi Everyone,
>
> since kafka 0.11.x supports exactly-once semantics, I want to be sure, that
> it is possible to achieve exactly-once delivery across kafka clusters using
> MirrorMaker.
>
> We have got two locations with "primary" cluster in each location and for
> each location we have got one "aggregation" cluster which mirrors data from
> all primary clusters.
>
> Currently we deduplicate messages after copying data from aggregation kafka
> to HDFS by separete YARN application. But in aggregation kafka duplicates
> remains. So I want to ensure that there are no duplicates and data loss in
> kafka as well. In this case our deduplication yarn application could not be
> use anymore.
>
> If it is possible, how to configure MirrorMaker to achieve exactly-once
> delivery across primary and aggregation clusters?
>
>
> Thanks and have a nice day, Jiri Humpolicek
>