You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jiri Humpolicek <Ji...@seznam.cz> on 2018/01/11 08:46:12 UTC

Does MirrorMaker ensures exactly-once delivery across clusters?

Hi Everyone, 

since kafka 0.11.x supports exactly-once semantics, I want to be sure, that 
it is possible to achieve exactly-once delivery across kafka clusters using 
MirrorMaker. 

We have got two locations with "primary" cluster in each location and for 
each location we have got one "aggregation" cluster which mirrors data from 
all primary clusters. 

Currently we deduplicate messages after copying data from aggregation kafka 
to HDFS by separete YARN application. But in aggregation kafka duplicates 
remains. So I want to ensure that there are no duplicates and data loss in 
kafka as well. In this case our deduplication yarn application could not be 
use anymore. 

If it is possible, how to configure MirrorMaker to achieve exactly-once 
delivery across primary and aggregation clusters? 


Thanks and have a nice day, Jiri Humpolicek 

Re: Does MirrorMaker ensures exactly-once delivery across clusters?

Posted by "Matthias J. Sax" <ma...@confluent.io>.
From a transaction point of view yes.

However, the MirrorMake consumer must know to read its offsets from the
target cluster instead of the source cluster, and this is quite
unnatural for a consumer... So it's a little bit trickier than just
picky backing commits on the producer...


-Matthias

On 1/11/18 6:45 PM, Stephane Maarek wrote:
> One could refactor MirrorMaker to commit the source cluster's offset in the target cluster's instead (in a special topic) 
> This would technically allow achieving exactly once using the Transactional API.  
> 
> But there's work associated with that  
> Let me know if I’m missing something
> 
> On 12/1/18, 6:15 am, "Matthias J. Sax" <ma...@confluent.io> wrote:
> 
>     No.
>     
>     Transactions are designed to work within a single cluster, not cross
>     cluster, ie, if you have a read-process-write pattern similar to what
>     Kafka Streams does.
>     
>     -Matthias
>     
>     On 1/11/18 12:46 AM, Jiri Humpolicek wrote:
>     > Hi Everyone, 
>     > 
>     > since kafka 0.11.x supports exactly-once semantics, I want to be sure, that 
>     > it is possible to achieve exactly-once delivery across kafka clusters using 
>     > MirrorMaker. 
>     > 
>     > We have got two locations with "primary" cluster in each location and for 
>     > each location we have got one "aggregation" cluster which mirrors data from 
>     > all primary clusters. 
>     > 
>     > Currently we deduplicate messages after copying data from aggregation kafka 
>     > to HDFS by separete YARN application. But in aggregation kafka duplicates 
>     > remains. So I want to ensure that there are no duplicates and data loss in 
>     > kafka as well. In this case our deduplication yarn application could not be 
>     > use anymore. 
>     > 
>     > If it is possible, how to configure MirrorMaker to achieve exactly-once 
>     > delivery across primary and aggregation clusters? 
>     > 
>     > 
>     > Thanks and have a nice day, Jiri Humpolicek 
>     > 
>     
>     
> 
> 


Re: Does MirrorMaker ensures exactly-once delivery across clusters?

Posted by Stephane Maarek <st...@simplemachines.com.au>.
One could refactor MirrorMaker to commit the source cluster's offset in the target cluster's instead (in a special topic) 
This would technically allow achieving exactly once using the Transactional API.  

But there's work associated with that  
Let me know if I’m missing something

On 12/1/18, 6:15 am, "Matthias J. Sax" <ma...@confluent.io> wrote:

    No.
    
    Transactions are designed to work within a single cluster, not cross
    cluster, ie, if you have a read-process-write pattern similar to what
    Kafka Streams does.
    
    -Matthias
    
    On 1/11/18 12:46 AM, Jiri Humpolicek wrote:
    > Hi Everyone, 
    > 
    > since kafka 0.11.x supports exactly-once semantics, I want to be sure, that 
    > it is possible to achieve exactly-once delivery across kafka clusters using 
    > MirrorMaker. 
    > 
    > We have got two locations with "primary" cluster in each location and for 
    > each location we have got one "aggregation" cluster which mirrors data from 
    > all primary clusters. 
    > 
    > Currently we deduplicate messages after copying data from aggregation kafka 
    > to HDFS by separete YARN application. But in aggregation kafka duplicates 
    > remains. So I want to ensure that there are no duplicates and data loss in 
    > kafka as well. In this case our deduplication yarn application could not be 
    > use anymore. 
    > 
    > If it is possible, how to configure MirrorMaker to achieve exactly-once 
    > delivery across primary and aggregation clusters? 
    > 
    > 
    > Thanks and have a nice day, Jiri Humpolicek 
    > 
    
    



Re: Does MirrorMaker ensures exactly-once delivery across clusters?

Posted by "Matthias J. Sax" <ma...@confluent.io>.
No.

Transactions are designed to work within a single cluster, not cross
cluster, ie, if you have a read-process-write pattern similar to what
Kafka Streams does.

-Matthias

On 1/11/18 12:46 AM, Jiri Humpolicek wrote:
> Hi Everyone, 
> 
> since kafka 0.11.x supports exactly-once semantics, I want to be sure, that 
> it is possible to achieve exactly-once delivery across kafka clusters using 
> MirrorMaker. 
> 
> We have got two locations with "primary" cluster in each location and for 
> each location we have got one "aggregation" cluster which mirrors data from 
> all primary clusters. 
> 
> Currently we deduplicate messages after copying data from aggregation kafka 
> to HDFS by separete YARN application. But in aggregation kafka duplicates 
> remains. So I want to ensure that there are no duplicates and data loss in 
> kafka as well. In this case our deduplication yarn application could not be 
> use anymore. 
> 
> If it is possible, how to configure MirrorMaker to achieve exactly-once 
> delivery across primary and aggregation clusters? 
> 
> 
> Thanks and have a nice day, Jiri Humpolicek 
>