You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Chris Horrocks (JIRA)" <ji...@apache.org> on 2016/07/23 16:14:20 UTC

[jira] [Commented] (FLUME-2173) Exactly once semantics for Flume

    [ https://issues.apache.org/jira/browse/FLUME-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390753#comment-15390753 ] 

Chris Horrocks commented on FLUME-2173:
---------------------------------------

Is there not precedent for this in the way Kafka uses ZK to track offsets for the consumption of each topic? Could each Flume agent not register it's sinks as ZK endpoints and update ZK with the offsets of messages (prepended to the header of flume events) as they are pulled from the channel?

> Exactly once semantics for Flume
> --------------------------------
>
>                 Key: FLUME-2173
>                 URL: https://issues.apache.org/jira/browse/FLUME-2173
>             Project: Flume
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>            Assignee: Hari Shreedharan
>             Fix For: v2.0.0
>
>
> Currently Flume guarantees only at least once semantics. This jira is meant to track exactly once semantics for Flume. My initial idea is to include uuid event ids on events at the original source (use a config to mark a source an original source) and identify destination sinks. At the destination sinks, use a unique ZK Znode to track the events. If once seen (and configured), pull the duplicate out.
> This might need some refactoring, but my belief is we can do this in a backward compatible way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)