You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Matthias J. Sax (JIRA)" <ji...@apache.org> on 2015/07/06 13:41:04 UTC
[jira] [Created] (FLINK-2320) Enable DataSet DataStream Joins
Matthias J. Sax created FLINK-2320:
--------------------------------------
Summary: Enable DataSet DataStream Joins
Key: FLINK-2320
URL: https://issues.apache.org/jira/browse/FLINK-2320
Project: Flink
Issue Type: New Feature
Reporter: Matthias J. Sax
Currently, DataSets and DataStreams cannot be joined with each other. This feature should include the following:
- extend Streaming API to allow one join input to be a DataSet
* in a first step, DataSet can be limited to be a DataSource
* later on, full Flink program could compute DataSet
-> maybe, Flink program be used update Join-DataSet periodically (in base data changed); including "synchonized" switching from old to new DataSet; update triggered by user/time/base-data-change?
- in first version, inner-equi join should be sufficient
* DataSet is used as build side for Hash-Join
* extend current Hash-Join to consume DataStream as probe input
- for full programs computing DataSet input, it might be helpful to extend optimizer ?
- What about other joins? What join algorithm do we need to support (full/left/right) outer joins for Set-Stream-Join? What about theta joins?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)