You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by vasia <gi...@git.apache.org> on 2016/04/11 15:22:56 UTC

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

GitHub user vasia opened a pull request:

    https://github.com/apache/flink/pull/1867

    [FLINK-3640] [docs] extend the Table API docs and add a section about SQL

    - Renames the "Table API" links to "Table API and SQL"
    - Adds a TOC
    - Adds a section about embedded SQL

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vasia/flink embedded-sql-docs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1867.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1867
    
----
commit 2b7aa41a7558b51f24177e81356731ef319f1398
Author: vasia <va...@apache.org>
Date:   2016-04-11T13:14:55Z

    [FLINK-3640] [docs] extend the Table API docs and add a section about embedded SQL mode

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

Posted by fhueske <gi...@git.apache.org>.

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1867#discussion_r59224646
  
    --- Diff: docs/apis/batch/libs/table.md ---
    @@ -408,3 +428,132 @@ Here, `literal` is a valid Java literal and `field reference` specifies a column
     column names follow Java identifier syntax.
     
     Only the types `LONG` and `STRING` can be casted to `DATE` and vice versa. A `LONG` casted to `DATE` must be a milliseconds timestamp. A `STRING` casted to `DATE` must have the format "`yyyy-MM-dd HH:mm:ss.SSS`", "`yyyy-MM-dd`", "`HH:mm:ss`", or a milliseconds timestamp. By default, all timestamps refer to the UTC timezone beginning from January 1, 1970, 00:00:00 in milliseconds.
    +
    +{% top %}
    +
    +SQL
    +----
    +The Table API also supports embedded SQL queries.
    +In order to use a `Table` or `DataSet` in a SQL query, it has to be registered in the `TableEnvironment`, using a unique name.
    +A registered `Table` can be retrieved back from the `TableEnvironment` using the `scan` method:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +// create a Table environment
    +TableEnvironment tableEnv = new TableEnvironment();
    +// reset the translation context: this will erase existing registered Tables
    +TranslationContext.reset();
    +// read a DataSet from an external source
    +DataSet<Tuple2<Integer, Long>> ds = env.readTextFile(...);
    +// register the DataSet under the name "MyTable"
    +tableEnv.registerDataSet("MyTable", ds);
    +// retrieve "MyTable" into a new Table
    +Table t = tableEnv.scan("MyTable");
    +{% endhighlight %}
    +</div>
    +
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +val env = ExecutionEnvironment.getExecutionEnvironment
    +// create a Table environment
    +val tEnv = new TableEnvironment
    +// reset the translation context: this will erase existing registered Tables
    +TranslationContext.reset()
    +// read a DataSet from an external source
    +val ds = env.readTextFile(...)
    +// register the DataSet under the name "MyTable"
    +tEnv.registerDataSet("MyTable", ds)
    +// retrieve "MyTable" into a new Table
    +val t = tEnv.scan("MyTable")
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +*Note: Table names are not allowed to follow the `^_DataSetTable_[0-9]+` pattern, as this is reserved for internal use only.*
    +
    +When registering a `DataSet`, one can also give names to the `Table` columns. For example, if "MyTable" has three columns, `user`, `product`, and `order`, we can give them names upon registering the `DataSet` as shown below:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +// register the DataSet under the name "MyTable" with columns user, product, and order
    +tableEnv.registerDataSet("MyTable", ds, "user, product, order");
    --- End diff --
    
    Java examples use `tableEnv`, Scala examples use `tEnv`. Is that on purpose?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

Posted by fhueske <gi...@git.apache.org>.

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1867#discussion_r59223134

--- Diff: docs/apis/batch/libs/table.md ---
@@ -30,13 +30,21 @@ specific language governing permissions and limitations
under the License.
-->

-**The Table API: an experimental feature**
+Flink's Table API is a a SQL-like expression language embedded in Java and Scala.
+Instead of manipulating a `DataSet` or `DataStream`, you can create and work with the `Table` relational abstraction.
+Tables have a schema and allow running relational operations on them, including selection, aggregation, and joins.
+A `Table` can be created from a `DataSet` or a `DataStream` and then queried either using the Table API Operators or using SQL queries.
+Once a `Table` is converted back to a `DataSet` or `DataStream`, the defined relational plan is optimized using [Apache Calcite](https://calcite.apache.org/)
+and transformed into a `DataSet` or `DataStream` execution plan.

-Flink provides an API that allows specifying operations using SQL-like expressions. Instead of
-manipulating a `DataSet` you can work with a `Table` on which relational operations can
-be performed.
+* This will be replaced by the TOC
+{:toc}

-The following dependency must be added to your project in order to use the Table API:
+Using the Table API and SQL
+----------------------------
+
+The Table API and SQL are part of the *libraries* Maven project.
--- End diff --

*libraries* Maven project -> *flink-libraries* Maven module

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

Posted by fhueske <gi...@git.apache.org>.

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1867#discussion_r59222070
  
    --- Diff: docs/apis/batch/libs/table.md ---
    @@ -30,13 +30,21 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**The Table API: an experimental feature**
    +Flink's Table API is a a SQL-like expression language embedded in Java and Scala.
    --- End diff --
    
    double "a"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

Posted by fhueske <gi...@git.apache.org>.

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1867#discussion_r59222041
  
    --- Diff: docs/apis/batch/libs/table.md ---
    @@ -30,13 +30,21 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**The Table API: an experimental feature**
    --- End diff --
    
    I would keep this hint.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/1867


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

Posted by vasia <gi...@git.apache.org>.

Github user vasia commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1867#discussion_r59225146
  
    --- Diff: docs/apis/batch/libs/table.md ---
    @@ -408,3 +428,132 @@ Here, `literal` is a valid Java literal and `field reference` specifies a column
     column names follow Java identifier syntax.
     
     Only the types `LONG` and `STRING` can be casted to `DATE` and vice versa. A `LONG` casted to `DATE` must be a milliseconds timestamp. A `STRING` casted to `DATE` must have the format "`yyyy-MM-dd HH:mm:ss.SSS`", "`yyyy-MM-dd`", "`HH:mm:ss`", or a milliseconds timestamp. By default, all timestamps refer to the UTC timezone beginning from January 1, 1970, 00:00:00 in milliseconds.
    +
    +{% top %}
    +
    +SQL
    +----
    +The Table API also supports embedded SQL queries.
    +In order to use a `Table` or `DataSet` in a SQL query, it has to be registered in the `TableEnvironment`, using a unique name.
    +A registered `Table` can be retrieved back from the `TableEnvironment` using the `scan` method:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +// create a Table environment
    +TableEnvironment tableEnv = new TableEnvironment();
    +// reset the translation context: this will erase existing registered Tables
    +TranslationContext.reset();
    +// read a DataSet from an external source
    +DataSet<Tuple2<Integer, Long>> ds = env.readTextFile(...);
    +// register the DataSet under the name "MyTable"
    +tableEnv.registerDataSet("MyTable", ds);
    +// retrieve "MyTable" into a new Table
    +Table t = tableEnv.scan("MyTable");
    +{% endhighlight %}
    +</div>
    +
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +val env = ExecutionEnvironment.getExecutionEnvironment
    +// create a Table environment
    +val tEnv = new TableEnvironment
    +// reset the translation context: this will erase existing registered Tables
    +TranslationContext.reset()
    +// read a DataSet from an external source
    +val ds = env.readTextFile(...)
    +// register the DataSet under the name "MyTable"
    +tEnv.registerDataSet("MyTable", ds)
    +// retrieve "MyTable" into a new Table
    +val t = tEnv.scan("MyTable")
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +*Note: Table names are not allowed to follow the `^_DataSetTable_[0-9]+` pattern, as this is reserved for internal use only.*
    +
    +When registering a `DataSet`, one can also give names to the `Table` columns. For example, if "MyTable" has three columns, `user`, `product`, and `order`, we can give them names upon registering the `DataSet` as shown below:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +// register the DataSet under the name "MyTable" with columns user, product, and order
    +tableEnv.registerDataSet("MyTable", ds, "user, product, order");
    --- End diff --
    
    Not really. I'll change both to `tableEnv`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

Posted by fhueske <gi...@git.apache.org>.

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1867#discussion_r59224953
  
    --- Diff: docs/apis/batch/libs/table.md ---
    @@ -408,3 +428,132 @@ Here, `literal` is a valid Java literal and `field reference` specifies a column
     column names follow Java identifier syntax.
     
     Only the types `LONG` and `STRING` can be casted to `DATE` and vice versa. A `LONG` casted to `DATE` must be a milliseconds timestamp. A `STRING` casted to `DATE` must have the format "`yyyy-MM-dd HH:mm:ss.SSS`", "`yyyy-MM-dd`", "`HH:mm:ss`", or a milliseconds timestamp. By default, all timestamps refer to the UTC timezone beginning from January 1, 1970, 00:00:00 in milliseconds.
    +
    +{% top %}
    +
    +SQL
    +----
    +The Table API also supports embedded SQL queries.
    +In order to use a `Table` or `DataSet` in a SQL query, it has to be registered in the `TableEnvironment`, using a unique name.
    +A registered `Table` can be retrieved back from the `TableEnvironment` using the `scan` method:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +// create a Table environment
    +TableEnvironment tableEnv = new TableEnvironment();
    +// reset the translation context: this will erase existing registered Tables
    +TranslationContext.reset();
    +// read a DataSet from an external source
    +DataSet<Tuple2<Integer, Long>> ds = env.readTextFile(...);
    +// register the DataSet under the name "MyTable"
    +tableEnv.registerDataSet("MyTable", ds);
    +// retrieve "MyTable" into a new Table
    +Table t = tableEnv.scan("MyTable");
    +{% endhighlight %}
    +</div>
    +
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +val env = ExecutionEnvironment.getExecutionEnvironment
    +// create a Table environment
    +val tEnv = new TableEnvironment
    +// reset the translation context: this will erase existing registered Tables
    +TranslationContext.reset()
    +// read a DataSet from an external source
    +val ds = env.readTextFile(...)
    +// register the DataSet under the name "MyTable"
    +tEnv.registerDataSet("MyTable", ds)
    +// retrieve "MyTable" into a new Table
    +val t = tEnv.scan("MyTable")
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +*Note: Table names are not allowed to follow the `^_DataSetTable_[0-9]+` pattern, as this is reserved for internal use only.*
    +
    +When registering a `DataSet`, one can also give names to the `Table` columns. For example, if "MyTable" has three columns, `user`, `product`, and `order`, we can give them names upon registering the `DataSet` as shown below:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +// register the DataSet under the name "MyTable" with columns user, product, and order
    +tableEnv.registerDataSet("MyTable", ds, "user, product, order");
    +{% endhighlight %}
    +</div>
    +
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +// register the DataSet under the name "MyTable" with columns user, product, and order
    +tEnv.registerDataSet("MyTable", ds, 'user, 'product, 'order)
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +A `Table` can be registered in a similar way:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +// read a DataSet from an external source
    +DataSet<Tuple2<Integer, Long>> ds = env.readTextFile(...);
    +// create a Table from the DataSet with columns user, product, and order
    +Table t = tableEnv.fromDataSet(ds).as("user, product, order");
    +// register the Table under the name "MyTable"
    +tableEnv.registerTable("MyTable", t);
    +{% endhighlight %}
    +</div>
    +
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +// read a DataSet from an external source and
    +// create a Table from the DataSet with columns user, product, and order
    +val t = env.readTextFile(...).as('user, 'product, 'order)
    +// register the Table under the name "MyTable"
    +tEnv.registerTable("MyTable", t)
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +After registering a `Table` or `DataSet`, one can use them in SQL queries. A SQL query is executed using the `sql` method of the `TableEnvironment`.
    --- End diff --
    
    A SQL query is executed -> A SQL query is defined
    
    Execution happens later when the program is executed (ExecutionEnvironment.execute/print/collect). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

Posted by fhueske <gi...@git.apache.org>.

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1867#discussion_r59224299
  
    --- Diff: docs/apis/batch/libs/table.md ---
    @@ -408,3 +428,132 @@ Here, `literal` is a valid Java literal and `field reference` specifies a column
     column names follow Java identifier syntax.
     
     Only the types `LONG` and `STRING` can be casted to `DATE` and vice versa. A `LONG` casted to `DATE` must be a milliseconds timestamp. A `STRING` casted to `DATE` must have the format "`yyyy-MM-dd HH:mm:ss.SSS`", "`yyyy-MM-dd`", "`HH:mm:ss`", or a milliseconds timestamp. By default, all timestamps refer to the UTC timezone beginning from January 1, 1970, 00:00:00 in milliseconds.
    +
    +{% top %}
    +
    +SQL
    +----
    +The Table API also supports embedded SQL queries.
    +In order to use a `Table` or `DataSet` in a SQL query, it has to be registered in the `TableEnvironment`, using a unique name.
    +A registered `Table` can be retrieved back from the `TableEnvironment` using the `scan` method:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +// create a Table environment
    +TableEnvironment tableEnv = new TableEnvironment();
    +// reset the translation context: this will erase existing registered Tables
    +TranslationContext.reset();
    +// read a DataSet from an external source
    +DataSet<Tuple2<Integer, Long>> ds = env.readTextFile(...);
    --- End diff --
    
    `readTextFile()` results in a `DataSet<String>`. We need to add a `map` step or use `readCsvFile()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

Posted by vasia <gi...@git.apache.org>.

Github user vasia commented on the pull request:

    https://github.com/apache/flink/pull/1867#issuecomment-208408620
  
    Thanks for the comments! I've made the changes and will merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

Posted by vasia <gi...@git.apache.org>.

Github user vasia commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1867#discussion_r59222466
  
    --- Diff: docs/apis/batch/libs/table.md ---
    @@ -30,13 +30,21 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**The Table API: an experimental feature**
    --- End diff --
    
    I removed this because there is already the "Beta" tag, but I can add it back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

Posted by fhueske <gi...@git.apache.org>.

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1867#discussion_r59222873
  
    --- Diff: docs/apis/batch/libs/table.md ---
    @@ -30,13 +30,21 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**The Table API: an experimental feature**
    +Flink's Table API is a a SQL-like expression language embedded in Java and Scala.
    +Instead of manipulating a `DataSet` or `DataStream`, you can create and work with the `Table` relational abstraction.
    +Tables have a schema and allow running relational operations on them, including selection, aggregation, and joins.
    +A `Table` can be created from a `DataSet` or a `DataStream` and then queried either using the Table API Operators or using SQL queries.
    --- End diff --
    
    Table API Operators -> Table API operators


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

Posted by fhueske <gi...@git.apache.org>.

Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1867#issuecomment-208402682
  
    Thanks for updating the Table API documentation!
    I added a few comments. Looks good otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3640] [docs] extend the Table API docs ...

Posted by fhueske <gi...@git.apache.org>.

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1867#discussion_r59222738
  
    --- Diff: docs/apis/batch/libs/table.md ---
    @@ -30,13 +30,21 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**The Table API: an experimental feature**
    +Flink's Table API is a a SQL-like expression language embedded in Java and Scala.
    +Instead of manipulating a `DataSet` or `DataStream`, you can create and work with the `Table` relational abstraction.
    --- End diff --
    
    "the `Table` relational abstraction" -> "a relational `Table` abstraction"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---