You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by wenxing zheng <we...@gmail.com> on 2017/11/23 03:07:17 UTC

Can HDFS sink support exactly once delivery to avoid the duplication of data

Dear experts,

When using the HDFS sinks with the KafkaChannel, we found that the data
might be duplicated due to the writing timeout.

Can the HDFS sink support the writing of the flume events in Kafka in
exactly once?

Appreciated for any advice.
Regards, Wenxing

Re: Can HDFS sink support exactly once delivery to avoid the duplication of data

Posted by Ferenc Szabo <fs...@cloudera.com>.
Dear Wenxing,

the current implementation of the HDFS sink is at least once delivery. The
exactly once delivery is a harder problem to solve, so I would not expect a
solution for that in the near future.

Regards, Ferenc Szabo

On Thu, Nov 23, 2017 at 4:07 AM, wenxing zheng <we...@gmail.com>
wrote:

> Dear experts,
>
> When using the HDFS sinks with the KafkaChannel, we found that the data
> might be duplicated due to the writing timeout.
>
> Can the HDFS sink support the writing of the flume events in Kafka in
> exactly once?
>
> Appreciated for any advice.
> Regards, Wenxing
>