You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Wang Yangjun (JIRA)" <ji...@apache.org> on 2015/12/05 02:28:10 UTC

[jira] [Comment Edited] (FLINK-3109) Join two streams with two different buffer time

    [ https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042530#comment-15042530 ] 

Wang Yangjun edited comment on FLINK-3109 at 12/5/15 1:28 AM:
--------------------------------------------------------------

After implement this feature, nothing changed with any old API. 
We just need add one operator under package org.apache.flink.streaming.api.operators, and do some modification in the class org.apache.flink.streaming.api.datastream.JoinedStreams

https://github.com/apache/flink/compare/master...wangyangjun:master



was (Author: yangjun.wang90@gmail.com):
After implement this feature, nothing changed with any old API. 
We just need add one operator under package org.apache.flink.streaming.api.operators, and do some modification in the class org.apache.flink.streaming.api.datastream.JoinedStreams

> Join two streams with two different buffer time
> -----------------------------------------------
>
>                 Key: FLINK-3109
>                 URL: https://issues.apache.org/jira/browse/FLINK-3109
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 0.10.1
>            Reporter: Wang Yangjun
>              Labels: easyfix, patch
>             Fix For: 0.10.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Current Flink streaming only supports join two streams on the same window. How to solve this problem?
> For example, there are two streams. One is advertisements showed to users. The tuple in which could be described as (id, showed timestamp). The other one is click stream -- (id, clicked timestamp). We want get a joined stream, which includes all the advertisement that is clicked by user in 20 minutes after showed.
> It is possible that after an advertisement is shown, some user click it immediately. It is possible that "click" message arrives server earlier than "show" message because of Internet delay. We assume that the maximum delay is one minute.
> Then the need is that we should alway keep a buffer(20 mins) of "show" stream and another buffer(1 min) of "click" stream.
> It would be grate that there is such an API like.
> showStream.join(clickStream)
>             .where(keySelector)
>             .buffer(Time.of(20, TimeUnit.MINUTES))
>             .equalTo(keySelector)
>             .buffer(Time.of(1, TimeUnit.MINUTES))
>             .apply(JoinFunction)
> http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)