You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Fabian Hueske (JIRA)" <ji...@apache.org> on 2015/02/08 00:40:35 UTC

[jira] [Updated] (FLINK-685) Add support for semi-joins

     [ https://issues.apache.org/jira/browse/FLINK-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fabian Hueske updated FLINK-685:
--------------------------------
    Priority: Minor  (was: Major)

> Add support for semi-joins
> --------------------------
>
>                 Key: FLINK-685
>                 URL: https://issues.apache.org/jira/browse/FLINK-685
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: GitHub Import
>            Priority: Minor
>              Labels: github-import
>             Fix For: pre-apache
>
>
> A semi-join is basically a join filter. One input is "filtering" and the other one is "filtered".
> A tuple of the "filtered" input is emitted exactly once if the "filtering" input has one (ore more) tuples with matching join keys. That means that the output of a semi-join has the same type as the "filtered" input and the "filtering" input is completely discarded.
> In order to support a semi-join, we need to add an additional physical execution strategy, that ensures, that a tuple of the "filtered" input is emitted only once if the "filtering" input has more than one tuple with matching keys. Furthermore, a couple of optimizations compared to standard joins can be done such as storing only keys and not the full tuple of the "filtering" input in a hash table.
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/685
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, java api, runtime, 
> Milestone: Release 0.6 (unplanned)
> Created at: Mon Apr 14 12:05:29 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)