You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2010/07/20 18:43:50 UTC

[jira] Commented: (AVRO-581) java: add reducer that separates keys and values when map output is pairs

    [ https://issues.apache.org/jira/browse/AVRO-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890307#action_12890307 ] 

Tom White commented on AVRO-581:
--------------------------------

A few more comments:

* It would be nice to have a unit test for map-only jobs, since they exercise a different code path, and the output doesn't need to be a pair. For a map-only job I get a NPE if I don't call AvroJob.setOutputSchema() - calling AvroJob.setMapOutputSchema() was not sufficient.
* A test using Specific types would be good, to test that it works properly with pairs.
* Should configureAvroJob() only set the input format if it hasn't already been set? E.g. if I set AvroUtf8InputFormat before calling setInputSchema then it will be silently changed to AvroInputFormat.
* Generating a Pair schema could be made more elegant for users by exposing Pair#getPairSchema() as a static method.
* Not related to this patch, but it would be nice if generated specific classes had a copy constructor, and/or a copyInto() method, since it's a common pattern in reducers to save the state of an object since the objects are reused in the values iterator.

> java: add reducer that separates keys and values when map output is pairs
> -------------------------------------------------------------------------
>
>                 Key: AVRO-581
>                 URL: https://issues.apache.org/jira/browse/AVRO-581
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.0
>
>         Attachments: AVRO-581.patch, AVRO-581.patch, AVRO-581.patch
>
>
> We should add a Pair<K,V> class, implementing SpecificRecord, that combines instances of two schemas (specific or generic).  Pairs would be compared by key, ignoring value.  The template for its schema would be:
> {code}
> {"type": "record", "name": "org.apache.avro.mapred.Pair", "fields":[
>   {"name": "key", "type":" <<insert key schema here>>},
>   {"name": "value", "order": "ignore", "type": <<insert value schema>>}
> ]}
> {code}
> When map outputs are instances of this class, a reducer may be used whose reduce method is something like:
> public abstract void reduce(K key, Iterable<V> values);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.