You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by wuchong <gi...@git.apache.org> on 2017/01/12 14:30:52 UTC

[GitHub] flink pull request #3107: [FLINK-5441] [table] Directly allow SQL queries on...

GitHub user wuchong opened a pull request:

    https://github.com/apache/flink/pull/3107

    [FLINK-5441] [table] Directly allow SQL queries on a Table

    Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
    If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
    In addition to going through the list, please provide a meaningful description of your changes.
    
    - [ ] General
      - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
      - The pull request addresses only one issue
      - Each commit in the PR has a meaningful commit message (including the JIRA id)
    
    - [ ] Documentation
      - Documentation has been added for new functionality
      - Old documentation affected by the pull request has been updated
      - JavaDoc for public methods has been added
    
    - [ ] Tests & Build
      - Functionality added by the pull request is covered by tests
      - `mvn clean verify` has been executed successfully locally or a Travis build has passed
    
    This PR allows calling SQL directly on a table :
    
    ```scala
    myTable.sql("SELECT a, b, c FROM _ WHERE d = 12")
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wuchong/flink sql-FLINK-5441

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3107.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3107
    
----
commit 528d7c2ad61b0247d296d083d3d4942621701cf8
Author: Jark Wu <wu...@alibaba-inc.com>
Date:   2017-01-12T14:26:52Z

    [FLINK-5441] [table] Directly allow SQL queries on a Table

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by fpompermaier <gi...@git.apache.org>.
Github user fpompermaier commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    It's for this reason that I think that Flink needs a "source catalog"
    somewhere..
    
    On 22 February 2017 at 09:35, twalthr <no...@github.com> wrote:
    
    > We don't have to intercept inline string arguments. We just have to make
    > sure that Table.toString() returns an unique identifier that the method
    > previously registered in it's table environment. Table.toString() should
    > then always return the same identifier.
    >
    > \u2014
    > You are receiving this because you are subscribed to this thread.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/flink/pull/3107#issuecomment-281603488>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/ADmeJYvF3DJZFWQ2oYcR_mUgC-2WFll8ks5re_NdgaJpZM4Lhz5Y>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    Thanks @wuchong. Will merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by KurtYoung <gi...@git.apache.org>.
Github user KurtYoung commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    I still think this is weird. Currently all API from `Table` is somehow act on the table object, just like a normal OOP program. After introducing `sql` method to `Table`, it can do something completely unrelated to the current object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    Hmm, I think registering tables in `toString()` (which IMO should be free of side effects) is not very clean, but I have to admit that the API looks good. So, I would be OK doing this like that.
    
    Would we also take care to unregister the tables after the query was optimized?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    We don't have to intercept inline string arguments. We just have to make sure that `Table.toString()` returns an unique identifier that the method previously registered in it's table environment. `Table.toString()` should then always return the same identifier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    @twalthr  I have updated this PR regarding to the idea you proposed that `env.sql(s"SELECT * FROM $table JOIN $otherTable")`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    Thank you @twalthr , I like the idea. So Java users can also use `env.sql("SELECT * FROM " + table1 + " JOIN " + table2)` as a shortcut. What do you think @fhueske ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    @KurtYoung I agree that this syntax can be misused like `table.sql("SELECT * FROM otherTable")`. But right now it is very inconvenient to always register a table first, especially if you want to define multiple queries subsequently.
    
    We could also think about `env.sql("SELECT * FROM _ JOIN _", table, otherTable)`. Other suggestions are very welcome.
    
    Another idea that just came to my mind. We could also use the `toString` method of a Table. This method could implicitly register the Table under a unique name and return that. Then we could support `env.sql(s"SELECT * FROM $table JOIN $otherTable")`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    @twalthr Yes,  I will update it in these days. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    Do we have to take care of unregistering the tables? Right now `fromDataSet/fromDataStream` do also not unregister their unique names.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    Hi @twalthr , the `env.sql(s"SELECT * FROM $table JOIN $otherTable")` is a nice way. But how to handle the table's table name and register it to env ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    Agreed, for Scala the inlined variant looks really good. Like @wuchong, 
    I'm curious how we can intercept the inlined string arguments. Do you know how to do that @twalthr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    @wuchong if `toString` is called for the first time. It calls `tEnv.registerTableInternal(this)` which uses an atomic counter and generates a unique identifier such as `UnnamedTable$232`. This identifier is returned and saved in the table. Every further call of toString return this identifier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    What do you think about my solution of `env.sql(s"SELECT * FROM $table JOIN $otherTable")`? It would be much more readable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    Hi, I like @twalthr's suggestion to use `env.sql("SELECT * FROM _ JOIN _", table, otherTable)`. 
    I'd also be fine with slf4j syntax as @wuchong proposed.
    
    What do you think @KurtYoung?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    I think the point is that we don't know if the table will ever be used again. If we unregister them after optimization, we can not have multiple `toDataStream()` calls for one table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    An external catalog would not help in this case. It's actually quite the opposite. The issue is to run a SQL query on an ad-hoc table without the need to explicitly add it to a catalog.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3107: [FLINK-5441] [table] Directly allow SQL queries on...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/3107


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    Thanks for the quick update @wuchong. The code looks good to merge. Can you add some documentation to the SQL section?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    What about using slf4j syntax `env.sql("SELECT * FROM {} JOIN {}", table1, table2)` ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    Yes, you are right. Forgot that we lazily translate the plans. :-/
    Let's keep the tables then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    Hi @KurtYoung , thank your for reviewing. I'm not sure about that. Do you mean omitting the "FROM" syntax ?  But it will be hard to validate SQL syntax. Because the query can follow a WHERE clause or JOIN/UNION with other tables.  For example, it's wired and hard to validate the query like this:
    
    ```
    mytable.sql("SELECT * WHERE a > 13 UNION SELECT * FROM TABLE_B")
    ```
    
    I think allowing query on other tables is fine here.  What do you think ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    @twalthr do you mean register table in his `toString` method ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    But what's the point of keeping tables that will never be used again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    @wuchong any plans to update this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    updated the documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...

Posted by KurtYoung <gi...@git.apache.org>.
Github user KurtYoung commented on the issue:

    https://github.com/apache/flink/pull/3107
  
    Hi @fhueske, both solutions are fine with me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---