You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/10/15 07:39:00 UTC

[jira] [Updated] (FLINK-8482) Implement and expose option to use min / max / left / right timestamp for joined streamrecords

     [ https://issues.apache.org/jira/browse/FLINK-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated FLINK-8482:
----------------------------------
    Labels: pull-request-available  (was: )

> Implement and expose option to use min / max / left / right timestamp for joined streamrecords
> ----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-8482
>                 URL: https://issues.apache.org/jira/browse/FLINK-8482
>             Project: Flink
>          Issue Type: Sub-task
>            Reporter: Florian Schmidt
>            Assignee: Florian Schmidt
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>
> The idea: Expose the option of which timestamp to use for the result of a join. The idea that is currently the floating around includes the options
>  * _left_: Use timestamp of the element in a join that came from the left stream
>  * _right_: Use timestamp of the element in a join that came from the right stream
>  * _max_: Use the max timestamp of both elements in a join
>  * _min_: Use the max timestamp of both elements in a join
> All options but _max_ require to introduce delaying watermarks in the operator, which is something that we were hesitant to do until now. This should probably under go discussion once more in order to see if / how we want to add this now. We could even think of exposing this in a more general way by adding a base operator that allows delayed watermarks.
> This will also be groundwork for supporting outer joins (FLINK-8483) for which in any case we watermark delays to provide correctness. 
> Also the API for this needs some feedback in order to expose this in a powerful, yet clear way. In my PoC at [1] I used the naming convention left / right to refer to specific streams with currently is not something the api exposes to the user, we should probably use something more clever here.
> Example
> {code:java}
> keyedStreamOne.
>    .intervalJoin(keyedStreamTwo)
>    .between(Time.milliseconds(0), Time.milliseconds(2))
>    .assignMinTimestamp() // alternative .assignMaxTimestamp() .assignLeftTimestamp() .assignRightTimestamp()
>    .process(new ProcessJoinFunction() { /* impl */ })
> {code}
>  
> Any feedback is highly appreciated!
> [1] https://github.com/florianschmidt1994/flink/tree/flink-8482-add-option-for-different-timestamp-strategies-to-interval-join-operator
> cc [~StephanEwen] [~kkl0u]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)