You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@datafu.apache.org by "Mathieu Bastian (JIRA)" <ji...@apache.org> on 2014/06/08 03:24:01 UTC

[jira] [Assigned] (DATAFU-51) Add DataFu MR project, a lightweight for implementing Java/Scala MapReduce jobs

     [ https://issues.apache.org/jira/browse/DATAFU-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathieu Bastian reassigned DATAFU-51:
-------------------------------------

    Assignee: Mathieu Bastian

> Add DataFu MR project, a lightweight  for implementing Java/Scala MapReduce jobs
> --------------------------------------------------------------------------------
>
>                 Key: DATAFU-51
>                 URL: https://issues.apache.org/jira/browse/DATAFU-51
>             Project: DataFu
>          Issue Type: New Feature
>            Reporter: Mathieu Bastian
>            Assignee: Mathieu Bastian
>         Attachments: DATAFU-51.patch
>
>
> New lightweight framework to develop Java/Scala MapReduce jobs. Inspired from Matt's work on Hourglass and my experience in developing Java jobs on Hadoop. It's a thin layer on top of the Hadoop API which mostly reduces boilerplate code and automate configuration.
> Features (see details on README):
> * Built-in support for Avro input and output formats
> * Though we recommend using Avro, one can use any input/output format class
> * Mapper, reducer and intermediate key/value classes are inferred when possible
> * Avro schemas are inferred when using POJO objects
> * Staged output to avoid deleting the existing file if the job fails
> * Estimate the number of reducers needed if not provided
> * Supports `#LATEST` suffix in input paths to work with timestamped folders 



--
This message was sent by Atlassian JIRA
(v6.2#6252)