You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Matthias J. Sax (JIRA)" <ji...@apache.org> on 2015/07/06 13:41:04 UTC

[jira] [Created] (FLINK-2320) Enable DataSet DataStream Joins

Matthias J. Sax created FLINK-2320:
--------------------------------------

             Summary: Enable DataSet DataStream Joins
                 Key: FLINK-2320
                 URL: https://issues.apache.org/jira/browse/FLINK-2320
             Project: Flink
          Issue Type: New Feature
            Reporter: Matthias J. Sax


Currently, DataSets and DataStreams cannot be joined with each other. This feature should include the following:

  - extend Streaming API to allow one join input to be a DataSet
    * in a first step, DataSet can be limited to be a DataSource
    * later on, full Flink program could compute DataSet
      -> maybe, Flink program be used update Join-DataSet periodically (in base data changed); including "synchonized" switching from old to new DataSet; update triggered by user/time/base-data-change?
  - in first version, inner-equi join should be sufficient
    * DataSet is used as build side for Hash-Join
    * extend current Hash-Join to consume DataStream as probe input
  - for full programs computing DataSet input, it might be helpful to extend optimizer ?
  - What about other joins? What join algorithm do we need to support (full/left/right) outer joins for Set-Stream-Join? What about theta joins?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)