You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Harsh J (JIRA)" <ji...@apache.org> on 2014/01/15 12:56:20 UTC

[jira] [Updated] (AVRO-1439) MultipleInputs equivalent for Avro MR

     [ https://issues.apache.org/jira/browse/AVRO-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated AVRO-1439:
--------------------------

    Attachment: AVRO-1439.patch

Here is a functional patch for the {{mapred}} (Old) APIs with a reflect based test case that illustrates a sample join operation.

I've not yet delved into the {{mapreduce}} (New) APIs, but it would be implemented in nearly the same way.

Any comments on the approach before I begin work on the {{mapreduce}} equivalent?

Here are some implementation points:
- Only works for Specific and Reflect based MR that use {{mapred.AvroInputFormat}} and {{mapred.AvroMapper}}/{{mapred.AvroReducer}} classes.
-- Only schema and map classes can be configured per path.
-- No input format class flexibility like its Apache Hadoop equivalent.
- Passing a schema when adding an input path is mandatory.
- Passing a mapper class when adding an input path is also mandatory.

> MultipleInputs equivalent for Avro MR
> -------------------------------------
>
>                 Key: AVRO-1439
>                 URL: https://issues.apache.org/jira/browse/AVRO-1439
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.8.0
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Minor
>         Attachments: AVRO-1439.patch
>
>
> We have MultipleOutputs-like functionality for Avro today, but lack a MultipleInputs which would make pure-MR joins possible to do with Specific/Reflect Avro MR.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)