You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by rtudoran <gi...@git.apache.org> on 2017/07/05 15:58:01 UTC

[GitHub] flink pull request #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream ...

GitHub user rtudoran opened a pull request:

    https://github.com/apache/flink/pull/4263

    [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

    Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
    If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
    In addition to going through the list, please provide a meaningful description of your changes.
    
    - [x ] General
      - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
      - The pull request addresses only one issue
      - Each commit in the PR has a meaningful commit message (including the JIRA id)
    
    - [x ] Documentation
      - Documentation has been added for new functionality
      - Old documentation affected by the pull request has been updated
      - JavaDoc for public methods has been added
    
    - [x ] Tests & Build
      - Functionality added by the pull request is covered by tests
      - `mvn clean verify` has been executed successfully locally or a Travis build has passed


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/huawei-flink/flink FLINK-6075-OFRe4

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4263.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4263
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    Thank your for the work , I will also look into this in the next days. BTW, can you please close #3714 @rtudoran ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    @rtudoran I have exactly the same idea with you. Calcite doesn't push partition fields to LogicalSort. Actually, LogicalSort doesn't have a partition field. So I make a custom rule to transform LogicalWindow into a LogicalRank (I customize it) when the query pattern matched. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    Hi @rtudoran, I had a look at the PR and left a comment on [FLINK-6075](https://issues.apache.org/jira/browse/FLINK-6075).
    
    I think we do not need retraction for `OFFSET` and `FETCH` on `ORDER BY *time ASC` and can implement this by extending the existing `ProcessFunction` for sorting. We only need to buffer and retract rows if we support `OFFSET` and `FETCH` for arbitrary sort orders.
    
    Let me know what you think,
    Fabian



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    Hi @fhueske ,  I left this comment under the JIRA. To make sure you know our situation, I post it here:
    
    >I think the most useful feature is PR3 "ORDER BY <any-attr> OFFSET FETCH" (i.e. TopN). It is an important requirement. And there is something more complex such as TopN for each group (e.g. https://stackoverflow.com/questions/176964/select-top-10-records-for-each-category). We have an idea (prototype) to support it, and planning to make a whole design for the TopN (any-attribute), and hope that to meet soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream ...

Posted by rtudoran <gi...@git.apache.org>.
Github user rtudoran closed the pull request at:

    https://github.com/apache/flink/pull/4263


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by rtudoran <gi...@git.apache.org>.
Github user rtudoran commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    @wuchong sounds good to me. Please ping me when you have the PR if you want me to take a look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    @rtudoran  "SELECT x FROM stream ORDER BY *time FETCH 2" do not need retraction. Because it is order by ascending time. The query will only emit the first 2 rows and after that drop all rows. We need retraction if the query is order by time desc. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by rtudoran <gi...@git.apache.org>.
Github user rtudoran commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    @fhueske ,@wuchong  thanks for the feedback. I will modify all these today - tomorrow and ping you when they are pushed.
    
    @wuchong - TopN for each group should not be hard to implement. In fact if you look at the logic of how things are implemented we are in a similar setup as for OVER windows. We have the input stream and we apply a keyBy and the processfunction (for order by / fetch/offset). Thus we would only need the Calcite syntax to make sure the partition fields are pushed in the LogicalSort object. Did you try to see if you have directly a group by if this is pushed in the sort object?
    Otherwise we can make a rule to combine a Group object with the sort object


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    Hi @rtudoran, you are right, I said we will need retraction for OFFSET and FETCH. However, it is only required for the general case and the special case of `ORDER BY *time ASC` without updates should be implemented without. 
    
    IMO, the semantics of a query are fixed and not really up for discussion. Just imagine a regular database would execute the same query on a table that holds the same data as the stream. This defines the result of the streaming query that our operators must produce.
    
    Moving forward, I would modify this PR to support `ORDER BY` with `OFFSET` and `FETCH` for append-only input by extending the existing operators with counters and open JIRAs for the other two cases (`ORDER BY *time DESC` and `ORDER BY *`).
    
    Best, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by rtudoran <gi...@git.apache.org>.
Github user rtudoran commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    @fhueske
    I have implemented the support for offset and fetch for both rowtime and proctime. 
    This PR will close the JIRA issue. Please have a look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by rtudoran <gi...@git.apache.org>.
Github user rtudoran commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    @wuchong @fhueske 
    I closed that old PR (i forgot about it).
    Thanks for looking into this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by rtudoran <gi...@git.apache.org>.
Github user rtudoran commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    @fhueske ,@wuchong I am closing this PR as i opened a new one for offset/fetch without retraction 
    Check #4380


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    Thanks @rtudoran. 
    
    Regarding PR it is up to you whether you'd like to update the current or open a new one. I think the new version will have less changes (LOCs and files) because we only need to add a few counters and conditions to `ProcTimeSortProcessFunction` and `RowTimeSortProcessFunction`. 
    
    Regarding the JIRA, I would change the title and description of FLINK-6075 describe the `ORDER BY *time ASC` case (i.e., what is already implemented and merged) and open new JIRAs for 
    - `ORDER BY *time ASC OFFSET FETCH`
    - `ORDER BY *time DESC OFFSET FETCH`
    - `ORDER BY * OFFSET FETCH`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by rtudoran <gi...@git.apache.org>.
Github user rtudoran commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    @fhueske @wuchong
    Thanks for the feedback. Fine for me to eliminate the retraction for the *time ASC cases. I will modify the PR (to support *time ASC cases) over the next days and re-submit it. I am not sure if you prefer it to be a new commit here or if i should open a new PR?
    
    @fhueske
    Regarding the other cases you mentioned - to support time DESC ... do you want to modify the JIRA as well? can you modify it or do you want me to add the sub-jiras for the cases you mentioned?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4263: [FLINK-6075] - Support Limit/Top(Sort) for Stream SQL

Posted by rtudoran <gi...@git.apache.org>.
Github user rtudoran commented on the issue:

    https://github.com/apache/flink/pull/4263
  
    @fhueske You were the one  that argue that retraction support is needed for offset and fetch. When we were discussing this i was not into having the retraction. I think finally it depends on what behavior we want to give for the function. If we have a SELECT x FROM stream ORDER BY *time FETCH 2 when we emit the new latest data - do we still want to invalidate what we have emitted previously (in this case we would emit the retraction), otherwise if we consider that that was the valid input at that point in time - we do not need the retraction. I am fine either way. Let me know. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---