You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Jakob Homan (JIRA)" <ji...@apache.org> on 2014/05/06 00:05:22 UTC

[jira] [Created] (SAMZA-263) Create SystemConsumer and SystemProducer for HDFS

Jakob Homan created SAMZA-263:
---------------------------------

             Summary: Create SystemConsumer and SystemProducer for HDFS
                 Key: SAMZA-263
                 URL: https://issues.apache.org/jira/browse/SAMZA-263
             Project: Samza
          Issue Type: Improvement
            Reporter: Jakob Homan
            Assignee: Jakob Homan


It would be nice to be able to read/write from HDFS, particularly for bootstrapping purposes.  A few points:

* Per the discussion [about leveldb|https://issues.apache.org/jira/browse/SAMZA-236?focusedCommentId=13985982&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13985982] this support should be separated into its own package and project (jar) for easy testing and severability.
* Similar to the Kafka RegexTopicGenerator, we can enumerate (recursively or not) the files in an HDFS directory during job startup.
* Connectivity with HCatalog would be interesting as well, but should be handled in a separate JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)