You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Eli Reisman (JIRA)" <ji...@apache.org> on 2015/07/27 00:26:04 UTC

[jira] [Updated] (SAMZA-693) Add simple HDFS Producer system to Samza

     [ https://issues.apache.org/jira/browse/SAMZA-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated SAMZA-693:
------------------------------
    Attachment: SAMZA-693-4.patch

This is a big refactor. Includes:

1) More (and better) unit tests
 
2) Pluggable hdfs formats, compression codecs, and path bucketing

3) Two default implementations of SequenceFile writers. One for binary data of any type, another for String output (text lines, JSON, whatever)

4) Fixes some post-review issues (see Apache RB for this ticket for details) 

...but does not remove the text job configuration properties files. Configuring the HDFS producer using real job props files has helped make the code more testable without loosening up encapsulation, and makes the tests more realistic. Hope thats OK.

It passes tests 'gradle clean test' and is ready to go. I'll upload an update to the RB issue as well. Sorry it took so long!

> Add simple HDFS Producer system to Samza
> ----------------------------------------
>
>                 Key: SAMZA-693
>                 URL: https://issues.apache.org/jira/browse/SAMZA-693
>             Project: Samza
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Trivial
>         Attachments: SAMZA-693-1.patch, SAMZA-693-2.patch, SAMZA-693-3.patch, SAMZA-693-4.patch
>
>
> Add a simple HDFS producer and related utilities. I've been using a version of this. Initial patch comes with a very basic test and it's own subproject setup which may or may not be what we want?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)