You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by fhueske <gi...@git.apache.org> on 2017/07/13 16:27:00 UTC

[GitHub] flink pull request #4324: [FLINK-6232] [table] Add processing time window in...

GitHub user fhueske opened a pull request:

    https://github.com/apache/flink/pull/4324

    [FLINK-6232] [table] Add processing time window inner join to SQL.

    This is continuation and extension of PR #3715 and #4266 by @hongyuhong.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fhueske/flink table-join

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4324.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4324
    
----
commit 10b219678f13c0c21889f97f267dcf4c517045e5
Author: hongyuhong <ho...@huawei.com>
Date:   2017-07-06T03:24:04Z

    [FLINK-6232] [table] Add support for processing time inner windowed stream join.

commit 3d671a2d1867aea2f3d4eee30b2772045917d6d4
Author: Fabian Hueske <fh...@apache.org>
Date:   2017-07-12T22:49:30Z

    [FLINK-6232] [table] Add SQL documentation for time window join.
    
    - Add support for window join predicates in WHERE clause.
    - Refactoring of WindowJoinUtil.
    - Minor refactorings of join classes.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4324: [FLINK-6232] [table] Add processing time window inner joi...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4324
  
    Hi @hongyuhong, it would be great if you could have a look as well since this PR is mostly your work.  
    I'll appreciate any feedback.
    
    Thank you, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4324: [FLINK-6232] [table] Add processing time window in...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/4324


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4324: [FLINK-6232] [table] Add processing time window inner joi...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/4324
  
    The code looks good to me. Good refactoring!
    
    +1 to merge.
    
    BTW, I have a question. This time-windowed join is different with `DataStream.join(...).window(...)`.  The `DataStream.join.window` is joining two streams on a same window (such as 1 hour tumbling window).  Do we have plan to support it? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4324: [FLINK-6232] [table] Add processing time window inner joi...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4324
  
    Hi @hongyuhong, thanks for your feedback!
    I agree, passing good error messages to users is important. However, this is very difficult in the optimizer because the optimizer is exploring different paths. If we throw an exception in the optimizer, we would kill the exploration and miss a valid plan. That's why I'm very hesitant to throw exceptions in rules. However, if the optimizer does not find a valid execution plan, it quits with an error message that tells the user that no valid plan could be found but does not point to a particular problem (such as an invalid join condition). 
    
    A good solution for this is still needed. Maybe we can collect potential problems during optimization and provide these when the translation fails.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4324: [FLINK-6232] [table] Add processing time window inner joi...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4324
  
    Merging.
    
    I'll open a JIRA to discuss how to pass error messages to users.
    
    Thanks @hongyuhong and @wuchong!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4324: [FLINK-6232] [table] Add processing time window inner joi...

Posted by hongyuhong <gi...@git.apache.org>.
Github user hongyuhong commented on the issue:

    https://github.com/apache/flink/pull/4324
  
    Hi @fhueske, sorry for the late feedback. The refactoring looks pretty great to me, i think it's good to be merged. The only thing i worry about is that there is no enough message for the user to figure out the exact error, do we have plan to improve this?
    
    Thanks very much.
    Yuhong


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4324: [FLINK-6232] [table] Add processing time window inner joi...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4324
  
    Thanks for the review @wuchong. 
    
    The `DataStream` join semantics can be implemented with this join by first window grouping both inputs with a UDAGG that collects all records in a list, and then joins the lists on the same time with `left.t BETWEEN right.t AND right.t` (we do not support a single `==` predicate on time so we need `>=` and `<=`, but this could be added, IMO). And then a UDF that crosses the lists. 
    
    I think this is how you would need to do it in SQL (you could use UNNEST to do the Cartesian product). The Table API could have a syntactic shortcut for that but the internal logical representation would again be the same. 
    
    Not sure if it adds much value. I think the windowed join semantics of this PR are nicer because they do not have cut-off points at the window boundaries (two records could be only 3 msecs apart but would not join because they are in different tumbling windows).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---