You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2010/06/14 20:42:15 UTC

[jira] Commented: (AVRO-513) java mapreduce api should pass iterator of matching objects to reduce

    [ https://issues.apache.org/jira/browse/AVRO-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878688#action_12878688 ] 

Tom White commented on AVRO-513:
--------------------------------

Generally looks good.

* Does the change to AvroKeySerialization mean that a copy is made per reduce value? This could be improved to be a copy per reduce group, although it's more work. 
* If a subclass wants to override close() then they have to call super.close() or the reducer may not complete. This could be the source of very subtle bugs, so it might be better to mark close() as final, and have another close method for subclasses, closeReducer() perhaps? However, this gets in the way of users who may want to use the non-grouping reduce() method. I don't see a good answer here.
* If I call next() without calling hasNext(), I don't get a NoSuchElementException, which doesn't conform to the Iterator contract. The next() method should check to see if there is a next and throw NoSuchElementException if not.
* Rather than polling the queue, you could use the blocking take() method and interrupt the thread from close() to signal that there are no more values.
* Starting a thread from within a subclass constructor is unsafe (because the object may not be fully constructed, see http://www.ibm.com/developerworks/java/library/j-jtp0618.html#4). Better to call runner.start() just after constructing the runner.

> java mapreduce api should pass iterator of matching objects to reduce
> ---------------------------------------------------------------------
>
>                 Key: AVRO-513
>                 URL: https://issues.apache.org/jira/browse/AVRO-513
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.0
>
>         Attachments: AVRO-513.patch
>
>
> The Java mapreduce API added in AVRO-493 requires reducers implementations to explicitly detect sequences of matching data.
> Rather the reduce method might better look something like:
>    void reduce(Iterator<IN>, Collector<OUT>);
> Where all equal values are passed in a single call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.