You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@crunch.apache.org by "Micah Whitacre (JIRA)" <ji...@apache.org> on 2016/07/13 15:21:20 UTC

[jira] [Updated] (CRUNCH-611) Simplified Kafka Offset Management in HDFS

     [ https://issues.apache.org/jira/browse/CRUNCH-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Micah Whitacre updated CRUNCH-611:
----------------------------------
    Attachment: CRUNCH-611.patch

So this patch provides a basic API for reading/writing Kafka offsets.  It then also provides a simple implementation that reads/writes the values from HDFS.  In theory this then should make regularly schedule Crunch pipeline's easier to implement with regard to offset management.

I did add a few optional dependencies so hopefully these won't cause too bad of conflicts with the Hadoop stack.  We aren't having a problem on our cluster but didn't universally check.  We are also setting out classpath first and running through Oozie so that changes classpath ordering as well.

> Simplified Kafka Offset Management in HDFS
> ------------------------------------------
>
>                 Key: CRUNCH-611
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-611
>             Project: Crunch
>          Issue Type: Improvement
>          Components: IO
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-611.patch
>
>
> With the KafkaSource the responsibility of offset management is the burden of the consumer.  With some simple APIs it is actually trivial to support read/storing these offsets in an HDFS directory as checkpoints for the source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)