You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Joel Bernstein (JIRA)" <ji...@apache.org> on 2015/12/11 03:31:11 UTC

[jira] [Comment Edited] (SOLR-8337) Add ReduceOperation and wire it into the ReducerStream

    [ https://issues.apache.org/jira/browse/SOLR-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025241#comment-15025241 ] 

Joel Bernstein edited comment on SOLR-8337 at 12/11/15 2:30 AM:
----------------------------------------------------------------

Patch adds a single *reduce()* method for the ReduceOperation that returns a single Tuple, which is the final reduction.

The *operate(Tuple)* method will be called for each Tuple that is read by the *ReducerStream*.

The reduce() method will be called each time the group by key changes. This will give the ReduceOperation a chance to finish the reduce algorithm and return a single Tuple. The ReduceOperation will also clear it's internal memory after each call to reduce() to prepare for the next Tuple grouping.


was (Author: joel.bernstein):
Patch adds a single *reduce()* method for ReduceOperation that returns a single Tuple, which is the final reduction.

The *operate(Tuple)* method will be called for each Tuple that is read by the *ReducerStream*.

The reduce() method will be called each time the group by key changes. This will give the ReduceOperation a chance to finish the reduce algorithm and return a single Tuple. The ReduceOperation will also clear it's internal memory after each call to reduce() to prepare for the next Tuple grouping.

> Add ReduceOperation and wire it into the ReducerStream
> ------------------------------------------------------
>
>                 Key: SOLR-8337
>                 URL: https://issues.apache.org/jira/browse/SOLR-8337
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Joel Bernstein
>         Attachments: SOLR-8337.patch, SOLR-8337.patch, SOLR-8337.patch, SOLR-8337.patch, SOLR-8337.patch
>
>
> The current ReducerStream groups all documents that share the same key(s) into a list and emits a single Tuple that contains this list. There is no way to tell the ReducerStream to do something more interesting with groups, for example summing a column within a group, or joining tuples. 
> This ticket adds a new type of operation called a ReduceOperation which is passed to the ReducerStream so that the reduce behavior can be specialized.
> The ReduceOperation has two methods:
> 1) operate(Tuple) : This is called once for each Tuple in a group. This method can be used to aggregate Tuples as they added to a group. 
> 2) reduce() : This is called when the group keys change. This method returns a single Tuple which is output by the ReducerStream. The ReduceOperation must clear it's internal structures when reduce is called as well, to prepare for the next group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org