You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Harsh J (JIRA)" <ji...@apache.org> on 2014/01/15 12:56:20 UTC
[jira] [Updated] (AVRO-1439) MultipleInputs equivalent for Avro MR
[ https://issues.apache.org/jira/browse/AVRO-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harsh J updated AVRO-1439:
--------------------------
Attachment: AVRO-1439.patch
Here is a functional patch for the {{mapred}} (Old) APIs with a reflect based test case that illustrates a sample join operation.
I've not yet delved into the {{mapreduce}} (New) APIs, but it would be implemented in nearly the same way.
Any comments on the approach before I begin work on the {{mapreduce}} equivalent?
Here are some implementation points:
- Only works for Specific and Reflect based MR that use {{mapred.AvroInputFormat}} and {{mapred.AvroMapper}}/{{mapred.AvroReducer}} classes.
-- Only schema and map classes can be configured per path.
-- No input format class flexibility like its Apache Hadoop equivalent.
- Passing a schema when adding an input path is mandatory.
- Passing a mapper class when adding an input path is also mandatory.
> MultipleInputs equivalent for Avro MR
> -------------------------------------
>
> Key: AVRO-1439
> URL: https://issues.apache.org/jira/browse/AVRO-1439
> Project: Avro
> Issue Type: New Feature
> Components: java
> Affects Versions: 1.8.0
> Reporter: Harsh J
> Assignee: Harsh J
> Priority: Minor
> Attachments: AVRO-1439.patch
>
>
> We have MultipleOutputs-like functionality for Avro today, but lack a MultipleInputs which would make pure-MR joins possible to do with Specific/Reflect Avro MR.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)