You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stephan Ewen (JIRA)" <ji...@apache.org> on 2014/11/21 12:05:33 UTC

[jira] [Commented] (FLINK-1267) Add crossGroup operator

    [ https://issues.apache.org/jira/browse/FLINK-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220775#comment-14220775 ] 

Stephan Ewen commented on FLINK-1267:
-------------------------------------

I think it may be useful, but is not really a urgent at this point.

The reason is that the groupReduce variant should be fine in virtually all cases. If you have so many elements that the groupReduce variant runs out of memory, you probably will not be able to compute the cross product of that group with itself in any reasonable time anyways...



> Add crossGroup operator
> -----------------------
>
>                 Key: FLINK-1267
>                 URL: https://issues.apache.org/jira/browse/FLINK-1267
>             Project: Flink
>          Issue Type: New Feature
>          Components: Java API, Local Runtime, Optimizer, Scala API
>    Affects Versions: 0.7.0-incubating
>            Reporter: Fabian Hueske
>            Priority: Minor
>
> A common operator is to pair-wise compare or combine all elements of a group (there were two questions about this on the user mailing list, recently). Right now, this can be done in two ways:
> 1. {{groupReduce}}: consume and store the complete iterator in memory and build all pairs
> 2. do a self-{{Join}}: the engine builds all pairs of the full symmetric Cartesian product.
> Both approaches have drawbacks. The {{groupReduce}} variant requires that the full group fits into memory and is more cumbersome to implement for the user, but pairs can be arbitrarily built. The self-{{Join}} approach pushes most of the work into the system, but the execution strategy does not treat the self-join different from a regular join (both identical inputs are shuffled, etc.) and always builds the full symmetric Cartesian product.
> I propose to add a dedicated {{crossGroup()}} operator, that offers this functionality in a proper way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)