You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by fhueske <gi...@git.apache.org> on 2017/07/19 09:56:37 UTC

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

GitHub user fhueske opened a pull request:

    https://github.com/apache/flink/pull/4365

    [FLINK-6747] [docs] Add documentation for dynamic tables.

    This PR adds documentation about dynamic tables to the Table API / SQL docs.
    
    - [X] General
      - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
      - The pull request addresses only one issue
      - Each commit in the PR has a meaningful commit message (including the JIRA id)
    
    - [X] Documentation
      - Documentation has been added for new functionality
      - Old documentation affected by the pull request has been updated
      - JavaDoc for public methods has been added
    
    - [X] Tests & Build
      - Functionality added by the pull request is covered by tests
      - `mvn clean verify` has been executed successfully locally or a Travis build has passed


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fhueske/flink tableStreamDocs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4365.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4365
    
----
commit 56c607b164a488493d69e5cb9eb9fab4b54175cf
Author: Fabian Hueske <fh...@apache.org>
Date:   2017-07-17T17:11:04Z

    [FLINK-6747] [docs] Add documentation for dynamic tables.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4365: [FLINK-6747] [docs] Add documentation for dynamic tables.

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4365
  
    Great, thanks @sunjincheng121 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128250087
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    +
    +### Continuous Queries
    +
    +A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables. At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables. 
    +
    +In the following we show two example queries on a `clicks` table that is defined on the stream of click events.
    +
    +The first query is a simple `GROUP-BY COUNT` aggregation query. It groups the `clicks` table on the `user` field and counts the number of visited URLs. The following figure shows how the query is evaluated over time as the `clicks` table is updated with additional rows.
    +
    +<center>
    +<img alt="Continuous Non-Windowed Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-cnt.png" width="90%">
    +</center>
    +
    +The input table `clicks` is shown on the left-hand side. Initially, the table consists of six rows. Evaluating the query (shown in the middle) on these six records yields a result table which is shown on the right-hand side at the top. When the `clicks` table is updated by appending an additional row (originating from the stream of click events), the query updates the current result table and increases the appropriate count. The updated result table is show on the right-hand side in the middle (the updated row is highlighted). Finally, another row is added and the result is shown on the right bottom of the figure.
    +
    +The second query is similar to the first one but groups the `clicks` table in addition to the `user` attribute also on an [hourly tumbling window](./sql.html#group-windows) before it counts the number of URLs. Again, the figure shows the input and output at different points in time to visualize the changing nature of dynamic tables.
    +
    +<center>
    +<img alt="Continuous Group-Window Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-window-cnt.png" width="100%">
    +</center>
    +
    +The input table `clicks` is shown on the left. The query continuously computes results every hour and updates the result table. The clicks table contains four rows with timestamps (`cTime`) between `12:00:00` and `12:59:59`. The query computes two results rows from this input (one for each `user`) and appends them to the result table. For the next window between `13:00:00` and `13:59:59`, the `clicks` table contains three rows, which results in another two rows being appended to the result table. As more records arrive over time, the result table is appropriately updated.
    +
    +**Note:** Time-based computations such as windows are based on special [Time Attributes](#time-attributes), which are discussed below.
    +
    +#### Update and Append Queries
    +
    +Although the two example queries appear to be quite similar (both compute a grouped count aggregate), they differ in one important aspect. The first query must update previously emitted results, i.e., the changelog stream that defines the result table contains `INSERT` and `UPDATE` changes. In contrast, the second query only appends to the result table, i.e., the changelog stream of the result table consists only of `INSERT` changes.
    +
    +Whether a query produces an append-only table or an updated table has some implications:
    +- Queries that produce update changes usually have to maintain more state (see the following section).
    +- The conversion of an append-only table into a stream is different from the conversion of an updated table (see the [Table to Stream Conversion](#table-to-stream-conversion) section). 
    +
    +#### Query Restrictions
    +
    +Many, but not all, semantically valid queries can be evaluated as continuous queries on streams. Some queries are too expensive to compute, either due to the size of state that they need to maintain or because computing updates is too expensive.
    +
    +- **State Size:** Continuous queries are evaluated on unbounded streams and are often supposed to run for weeks or months. Hence, the total amount of data that a continuous query processes can be very large. Queries that have to update previously emitted results need to maintain all emitted rows in order to be able to update them. For instance, the first example query needs to store the URL count for each user to be able to increase the count and sent out a new result when the input table receives a new row. If only registered users are tracked, the number of counts to maintain might not be too high. However, if non-registered users get a unique user name assigned, the number of counts to maintain would grow over time and might eventually cause the query to fail.
    +
    +{% highlight sql %}
    +SELECT user, COUNT(url)
    +FROM clicks
    +GROUP BY user;
    +{% endhighlight %}
    +
    +- **Computing Updates:** Some queries require to recompute and update a large fraction of the emitted result rows even if only a single input record is added or updated. Clearly, such queries are not well suited to be executed as continuous queries. An example is the following query which computes for each user a `RANK` based on the time of the last click. As soon as the `clicks` table receives a new row, the `lastAction` of the user is updated and a new rank must be computed. However since two rows cannot have the same rank, all lower ranked rows need to be updated as well.
    +
    +{% highlight sql %}
    +SELECT user, RANK() OVER (ORDER BY lastLogin) 
    +FROM (
    +  SELECT user, MAX(cTime) AS lastAction FROM clicks GROUP BY user
    +);
    +{% endhighlight %}
    +
    +The [QueryConfig](#query-configuration) section discusses parameters to control the execution of continuous queries. Some parameters can be used to trade the size of maintained state for result accuracy.
    +
    +### Table to Stream Conversion
    +
    +A dynamic table can be continuously modified by `INSERT`, `UPDATE`, and `DELETE` changes just like a regular database table. It might be a table with a single row, which is constantly updated, an insert-only table without `UPDATE` and `DELETE` modifications, or anything in between.
    +
    +When converting a dynamic table into a stream or writing it to an external system, these changes need to be encoded. Flink's Table API and SQL support three ways to encode the changes of a dynamic table:
    +
    +* **Append-only stream:** A dynamic table that is only modified by `INSERT` changes can be  converted into a stream by emitting the inserted rows. 
    +
    +* **Retract stream:** A retract stream is a stream with two types of messages, *add messages* and *retract messages*. A dynamic table is converted into an retract stream by encoding an `INSERT` change as add message, a `DELETE` change as retract message, and an `UPDATE` change as a retract message for the updated row and an add message for the updating row. The following figure visualizes the conversion of a dynamic table into a retract stream.
    --- End diff --
    
    `and an `UPDATE` change as a retract message for the updated row and an add message for the updating row.`
    To:
    `and an `UPDATE` change as both a retract and an add message for the updating row.
    `
    What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128513830
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    +
    +### Continuous Queries
    +
    +A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables. At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables. 
    +
    +In the following we show two example queries on a `clicks` table that is defined on the stream of click events.
    +
    +The first query is a simple `GROUP-BY COUNT` aggregation query. It groups the `clicks` table on the `user` field and counts the number of visited URLs. The following figure shows how the query is evaluated over time as the `clicks` table is updated with additional rows.
    +
    +<center>
    +<img alt="Continuous Non-Windowed Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-cnt.png" width="90%">
    +</center>
    +
    +The input table `clicks` is shown on the left-hand side. Initially, the table consists of six rows. Evaluating the query (shown in the middle) on these six records yields a result table which is shown on the right-hand side at the top. When the `clicks` table is updated by appending an additional row (originating from the stream of click events), the query updates the current result table and increases the appropriate count. The updated result table is show on the right-hand side in the middle (the updated row is highlighted). Finally, another row is added and the result is shown on the right bottom of the figure.
    +
    +The second query is similar to the first one but groups the `clicks` table in addition to the `user` attribute also on an [hourly tumbling window](./sql.html#group-windows) before it counts the number of URLs. Again, the figure shows the input and output at different points in time to visualize the changing nature of dynamic tables.
    +
    +<center>
    +<img alt="Continuous Group-Window Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-window-cnt.png" width="100%">
    +</center>
    +
    +The input table `clicks` is shown on the left. The query continuously computes results every hour and updates the result table. The clicks table contains four rows with timestamps (`cTime`) between `12:00:00` and `12:59:59`. The query computes two results rows from this input (one for each `user`) and appends them to the result table. For the next window between `13:00:00` and `13:59:59`, the `clicks` table contains three rows, which results in another two rows being appended to the result table. As more records arrive over time, the result table is appropriately updated.
    +
    +**Note:** Time-based computations such as windows are based on special [Time Attributes](#time-attributes), which are discussed below.
    +
    +#### Update and Append Queries
    +
    +Although the two example queries appear to be quite similar (both compute a grouped count aggregate), they differ in one important aspect. The first query must update previously emitted results, i.e., the changelog stream that defines the result table contains `INSERT` and `UPDATE` changes. In contrast, the second query only appends to the result table, i.e., the changelog stream of the result table consists only of `INSERT` changes.
    +
    +Whether a query produces an append-only table or an updated table has some implications:
    +- Queries that produce update changes usually have to maintain more state (see the following section).
    +- The conversion of an append-only table into a stream is different from the conversion of an updated table (see the [Table to Stream Conversion](#table-to-stream-conversion) section). 
    +
    +#### Query Restrictions
    +
    +Many, but not all, semantically valid queries can be evaluated as continuous queries on streams. Some queries are too expensive to compute, either due to the size of state that they need to maintain or because computing updates is too expensive.
    +
    +- **State Size:** Continuous queries are evaluated on unbounded streams and are often supposed to run for weeks or months. Hence, the total amount of data that a continuous query processes can be very large. Queries that have to update previously emitted results need to maintain all emitted rows in order to be able to update them. For instance, the first example query needs to store the URL count for each user to be able to increase the count and sent out a new result when the input table receives a new row. If only registered users are tracked, the number of counts to maintain might not be too high. However, if non-registered users get a unique user name assigned, the number of counts to maintain would grow over time and might eventually cause the query to fail.
    +
    +{% highlight sql %}
    +SELECT user, COUNT(url)
    +FROM clicks
    +GROUP BY user;
    +{% endhighlight %}
    +
    +- **Computing Updates:** Some queries require to recompute and update a large fraction of the emitted result rows even if only a single input record is added or updated. Clearly, such queries are not well suited to be executed as continuous queries. An example is the following query which computes for each user a `RANK` based on the time of the last click. As soon as the `clicks` table receives a new row, the `lastAction` of the user is updated and a new rank must be computed. However since two rows cannot have the same rank, all lower ranked rows need to be updated as well.
    +
    +{% highlight sql %}
    +SELECT user, RANK() OVER (ORDER BY lastLogin) 
    +FROM (
    +  SELECT user, MAX(cTime) AS lastAction FROM clicks GROUP BY user
    +);
    +{% endhighlight %}
    +
    +The [QueryConfig](#query-configuration) section discusses parameters to control the execution of continuous queries. Some parameters can be used to trade the size of maintained state for result accuracy.
    +
    +### Table to Stream Conversion
    +
    +A dynamic table can be continuously modified by `INSERT`, `UPDATE`, and `DELETE` changes just like a regular database table. It might be a table with a single row, which is constantly updated, an insert-only table without `UPDATE` and `DELETE` modifications, or anything in between.
    +
    +When converting a dynamic table into a stream or writing it to an external system, these changes need to be encoded. Flink's Table API and SQL support three ways to encode the changes of a dynamic table:
    +
    +* **Append-only stream:** A dynamic table that is only modified by `INSERT` changes can be  converted into a stream by emitting the inserted rows. 
    +
    +* **Retract stream:** A retract stream is a stream with two types of messages, *add messages* and *retract messages*. A dynamic table is converted into an retract stream by encoding an `INSERT` change as add message, a `DELETE` change as retract message, and an `UPDATE` change as a retract message for the updated row and an add message for the updating row. The following figure visualizes the conversion of a dynamic table into a retract stream.
    --- End diff --
    
    OK, I'll add `previous` and `new` to the sentence. Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128349657
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    +
    +### Continuous Queries
    +
    +A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables. At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables. 
    +
    +In the following we show two example queries on a `clicks` table that is defined on the stream of click events.
    +
    +The first query is a simple `GROUP-BY COUNT` aggregation query. It groups the `clicks` table on the `user` field and counts the number of visited URLs. The following figure shows how the query is evaluated over time as the `clicks` table is updated with additional rows.
    +
    +<center>
    +<img alt="Continuous Non-Windowed Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-cnt.png" width="90%">
    +</center>
    +
    +The input table `clicks` is shown on the left-hand side. Initially, the table consists of six rows. Evaluating the query (shown in the middle) on these six records yields a result table which is shown on the right-hand side at the top. When the `clicks` table is updated by appending an additional row (originating from the stream of click events), the query updates the current result table and increases the appropriate count. The updated result table is show on the right-hand side in the middle (the updated row is highlighted). Finally, another row is added and the result is shown on the right bottom of the figure.
    +
    +The second query is similar to the first one but groups the `clicks` table in addition to the `user` attribute also on an [hourly tumbling window](./sql.html#group-windows) before it counts the number of URLs. Again, the figure shows the input and output at different points in time to visualize the changing nature of dynamic tables.
    +
    +<center>
    +<img alt="Continuous Group-Window Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-window-cnt.png" width="100%">
    +</center>
    +
    +The input table `clicks` is shown on the left. The query continuously computes results every hour and updates the result table. The clicks table contains four rows with timestamps (`cTime`) between `12:00:00` and `12:59:59`. The query computes two results rows from this input (one for each `user`) and appends them to the result table. For the next window between `13:00:00` and `13:59:59`, the `clicks` table contains three rows, which results in another two rows being appended to the result table. As more records arrive over time, the result table is appropriately updated.
    +
    +**Note:** Time-based computations such as windows are based on special [Time Attributes](#time-attributes), which are discussed below.
    +
    +#### Update and Append Queries
    +
    +Although the two example queries appear to be quite similar (both compute a grouped count aggregate), they differ in one important aspect. The first query must update previously emitted results, i.e., the changelog stream that defines the result table contains `INSERT` and `UPDATE` changes. In contrast, the second query only appends to the result table, i.e., the changelog stream of the result table consists only of `INSERT` changes.
    --- End diff --
    
    I would like to keep the sentence short, but I will try to rephrase it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4365: [FLINK-6747] [docs] Add documentation for dynamic tables.

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on the issue:

    https://github.com/apache/flink/pull/4365
  
    Hi @fhueske No problem, that's my duty. :)
    Merging...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128349315
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    +
    +### Continuous Queries
    +
    +A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables. At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables. 
    +
    +In the following we show two example queries on a `clicks` table that is defined on the stream of click events.
    +
    +The first query is a simple `GROUP-BY COUNT` aggregation query. It groups the `clicks` table on the `user` field and counts the number of visited URLs. The following figure shows how the query is evaluated over time as the `clicks` table is updated with additional rows.
    +
    +<center>
    +<img alt="Continuous Non-Windowed Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-cnt.png" width="90%">
    --- End diff --
    
    The figure shows the execution of the query at three points in times, i.e, when the clicks table has 6, 7, and 8 records. The transition from 6 to 7 and 7 to 8 are single row updates. So what you are asking for should already be visualized by the figure or am I understand something not correctly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128405026
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    +
    +### Continuous Queries
    +
    +A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables. At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables. 
    +
    +In the following we show two example queries on a `clicks` table that is defined on the stream of click events.
    +
    +The first query is a simple `GROUP-BY COUNT` aggregation query. It groups the `clicks` table on the `user` field and counts the number of visited URLs. The following figure shows how the query is evaluated over time as the `clicks` table is updated with additional rows.
    +
    +<center>
    +<img alt="Continuous Non-Windowed Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-cnt.png" width="90%">
    +</center>
    +
    +The input table `clicks` is shown on the left-hand side. Initially, the table consists of six rows. Evaluating the query (shown in the middle) on these six records yields a result table which is shown on the right-hand side at the top. When the `clicks` table is updated by appending an additional row (originating from the stream of click events), the query updates the current result table and increases the appropriate count. The updated result table is show on the right-hand side in the middle (the updated row is highlighted). Finally, another row is added and the result is shown on the right bottom of the figure.
    +
    +The second query is similar to the first one but groups the `clicks` table in addition to the `user` attribute also on an [hourly tumbling window](./sql.html#group-windows) before it counts the number of URLs. Again, the figure shows the input and output at different points in time to visualize the changing nature of dynamic tables.
    +
    +<center>
    +<img alt="Continuous Group-Window Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-window-cnt.png" width="100%">
    +</center>
    +
    +The input table `clicks` is shown on the left. The query continuously computes results every hour and updates the result table. The clicks table contains four rows with timestamps (`cTime`) between `12:00:00` and `12:59:59`. The query computes two results rows from this input (one for each `user`) and appends them to the result table. For the next window between `13:00:00` and `13:59:59`, the `clicks` table contains three rows, which results in another two rows being appended to the result table. As more records arrive over time, the result table is appropriately updated.
    +
    +**Note:** Time-based computations such as windows are based on special [Time Attributes](#time-attributes), which are discussed below.
    +
    +#### Update and Append Queries
    +
    +Although the two example queries appear to be quite similar (both compute a grouped count aggregate), they differ in one important aspect. The first query must update previously emitted results, i.e., the changelog stream that defines the result table contains `INSERT` and `UPDATE` changes. In contrast, the second query only appends to the result table, i.e., the changelog stream of the result table consists only of `INSERT` changes.
    +
    +Whether a query produces an append-only table or an updated table has some implications:
    +- Queries that produce update changes usually have to maintain more state (see the following section).
    +- The conversion of an append-only table into a stream is different from the conversion of an updated table (see the [Table to Stream Conversion](#table-to-stream-conversion) section). 
    +
    +#### Query Restrictions
    +
    +Many, but not all, semantically valid queries can be evaluated as continuous queries on streams. Some queries are too expensive to compute, either due to the size of state that they need to maintain or because computing updates is too expensive.
    +
    +- **State Size:** Continuous queries are evaluated on unbounded streams and are often supposed to run for weeks or months. Hence, the total amount of data that a continuous query processes can be very large. Queries that have to update previously emitted results need to maintain all emitted rows in order to be able to update them. For instance, the first example query needs to store the URL count for each user to be able to increase the count and sent out a new result when the input table receives a new row. If only registered users are tracked, the number of counts to maintain might not be too high. However, if non-registered users get a unique user name assigned, the number of counts to maintain would grow over time and might eventually cause the query to fail.
    +
    +{% highlight sql %}
    +SELECT user, COUNT(url)
    +FROM clicks
    +GROUP BY user;
    +{% endhighlight %}
    +
    +- **Computing Updates:** Some queries require to recompute and update a large fraction of the emitted result rows even if only a single input record is added or updated. Clearly, such queries are not well suited to be executed as continuous queries. An example is the following query which computes for each user a `RANK` based on the time of the last click. As soon as the `clicks` table receives a new row, the `lastAction` of the user is updated and a new rank must be computed. However since two rows cannot have the same rank, all lower ranked rows need to be updated as well.
    +
    +{% highlight sql %}
    +SELECT user, RANK() OVER (ORDER BY lastLogin) 
    +FROM (
    +  SELECT user, MAX(cTime) AS lastAction FROM clicks GROUP BY user
    +);
    +{% endhighlight %}
    +
    +The [QueryConfig](#query-configuration) section discusses parameters to control the execution of continuous queries. Some parameters can be used to trade the size of maintained state for result accuracy.
    +
    +### Table to Stream Conversion
    +
    +A dynamic table can be continuously modified by `INSERT`, `UPDATE`, and `DELETE` changes just like a regular database table. It might be a table with a single row, which is constantly updated, an insert-only table without `UPDATE` and `DELETE` modifications, or anything in between.
    +
    +When converting a dynamic table into a stream or writing it to an external system, these changes need to be encoded. Flink's Table API and SQL support three ways to encode the changes of a dynamic table:
    +
    +* **Append-only stream:** A dynamic table that is only modified by `INSERT` changes can be  converted into a stream by emitting the inserted rows. 
    +
    +* **Retract stream:** A retract stream is a stream with two types of messages, *add messages* and *retract messages*. A dynamic table is converted into an retract stream by encoding an `INSERT` change as add message, a `DELETE` change as retract message, and an `UPDATE` change as a retract message for the updated row and an add message for the updating row. The following figure visualizes the conversion of a dynamic table into a retract stream.
    --- End diff --
    
    It meant that the "updated row" of ("retract message for the updated row" and "an add message for the updating row.")  is not the same type. So suggest using different phrase, something like you said "retracted for previous row" and " add  for the new" is more clearly to user. I am not 100% sure, but I suggest to rephrase it. -:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/4365


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4365: [FLINK-6747] [docs] Add documentation for dynamic tables.

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on the issue:

    https://github.com/apache/flink/pull/4365
  
    I'll merge this weekend...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128348285
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    --- End diff --
    
    The intention is to explain that the table (which is append-only and continuously growing) is not kept as state (materialized). Further downstream operators might need to materialize it partially but the append-only conversion does not materialize.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4365: [FLINK-6747] [docs] Add documentation for dynamic tables.

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on the issue:

    https://github.com/apache/flink/pull/4365
  
    Hi @fhueske Thanks for the update. The PR. looks pretty good to me. 
    +1 to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128656745
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    --- End diff --
    
    Ah, make sense. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128406043
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    --- End diff --
    
    Ah, got your meaning.
    If we want to emphasize whether a table needs materialize, then I suggest that we should correspond before and after. i.e., we need add Note information for the following table (e.g.: retract table) which need materialize. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128512155
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    --- End diff --
    
    Not sure if that's necessary. At this point I am only talking about the table that is logically constructed from the stream.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128245611
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    +
    +### Continuous Queries
    +
    +A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables. At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables. 
    +
    +In the following we show two example queries on a `clicks` table that is defined on the stream of click events.
    +
    +The first query is a simple `GROUP-BY COUNT` aggregation query. It groups the `clicks` table on the `user` field and counts the number of visited URLs. The following figure shows how the query is evaluated over time as the `clicks` table is updated with additional rows.
    +
    +<center>
    +<img alt="Continuous Non-Windowed Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-cnt.png" width="90%">
    --- End diff --
    
    This figure show the behavior of non-Windowed Query.  In common way, continuous query is Continuous query and non-windowed are perfect counterparts. But I think we can not assume that the data is a batch of updates. It should be update one by one in stream, especially processing-time mode. So I think the figure should be improved.(Update the result one by one). What do you think?
     
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128406716
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    +
    +### Continuous Queries
    +
    +A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables. At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables. 
    +
    +In the following we show two example queries on a `clicks` table that is defined on the stream of click events.
    +
    +The first query is a simple `GROUP-BY COUNT` aggregation query. It groups the `clicks` table on the `user` field and counts the number of visited URLs. The following figure shows how the query is evaluated over time as the `clicks` table is updated with additional rows.
    +
    +<center>
    +<img alt="Continuous Non-Windowed Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-cnt.png" width="90%">
    --- End diff --
    
    Yes, I know your meaning. 
    But I think` Continuous Queries` need to be accompanied by the flow, that is, from the first element of the stream, it will query and output the results. The figure in the document need to show the behavior of the first element to the user, just like the way of "undo-redo-mode.png" showed us.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128248216
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    +
    +### Continuous Queries
    +
    +A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables. At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables. 
    +
    +In the following we show two example queries on a `clicks` table that is defined on the stream of click events.
    +
    +The first query is a simple `GROUP-BY COUNT` aggregation query. It groups the `clicks` table on the `user` field and counts the number of visited URLs. The following figure shows how the query is evaluated over time as the `clicks` table is updated with additional rows.
    +
    +<center>
    +<img alt="Continuous Non-Windowed Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-cnt.png" width="90%">
    +</center>
    +
    +The input table `clicks` is shown on the left-hand side. Initially, the table consists of six rows. Evaluating the query (shown in the middle) on these six records yields a result table which is shown on the right-hand side at the top. When the `clicks` table is updated by appending an additional row (originating from the stream of click events), the query updates the current result table and increases the appropriate count. The updated result table is show on the right-hand side in the middle (the updated row is highlighted). Finally, another row is added and the result is shown on the right bottom of the figure.
    +
    +The second query is similar to the first one but groups the `clicks` table in addition to the `user` attribute also on an [hourly tumbling window](./sql.html#group-windows) before it counts the number of URLs. Again, the figure shows the input and output at different points in time to visualize the changing nature of dynamic tables.
    +
    +<center>
    +<img alt="Continuous Group-Window Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-window-cnt.png" width="100%">
    +</center>
    +
    +The input table `clicks` is shown on the left. The query continuously computes results every hour and updates the result table. The clicks table contains four rows with timestamps (`cTime`) between `12:00:00` and `12:59:59`. The query computes two results rows from this input (one for each `user`) and appends them to the result table. For the next window between `13:00:00` and `13:59:59`, the `clicks` table contains three rows, which results in another two rows being appended to the result table. As more records arrive over time, the result table is appropriately updated.
    +
    +**Note:** Time-based computations such as windows are based on special [Time Attributes](#time-attributes), which are discussed below.
    +
    +#### Update and Append Queries
    +
    +Although the two example queries appear to be quite similar (both compute a grouped count aggregate), they differ in one important aspect. The first query must update previously emitted results, i.e., the changelog stream that defines the result table contains `INSERT` and `UPDATE` changes. In contrast, the second query only appends to the result table, i.e., the changelog stream of the result table consists only of `INSERT` changes.
    --- End diff --
    
    Although the two example queries appear to be quite similar (both compute a grouped count aggregate), they differ in one important aspect, i.e., ... What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128350244
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    +
    +### Continuous Queries
    +
    +A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables. At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables. 
    +
    +In the following we show two example queries on a `clicks` table that is defined on the stream of click events.
    +
    +The first query is a simple `GROUP-BY COUNT` aggregation query. It groups the `clicks` table on the `user` field and counts the number of visited URLs. The following figure shows how the query is evaluated over time as the `clicks` table is updated with additional rows.
    +
    +<center>
    +<img alt="Continuous Non-Windowed Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-cnt.png" width="90%">
    +</center>
    +
    +The input table `clicks` is shown on the left-hand side. Initially, the table consists of six rows. Evaluating the query (shown in the middle) on these six records yields a result table which is shown on the right-hand side at the top. When the `clicks` table is updated by appending an additional row (originating from the stream of click events), the query updates the current result table and increases the appropriate count. The updated result table is show on the right-hand side in the middle (the updated row is highlighted). Finally, another row is added and the result is shown on the right bottom of the figure.
    +
    +The second query is similar to the first one but groups the `clicks` table in addition to the `user` attribute also on an [hourly tumbling window](./sql.html#group-windows) before it counts the number of URLs. Again, the figure shows the input and output at different points in time to visualize the changing nature of dynamic tables.
    +
    +<center>
    +<img alt="Continuous Group-Window Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-window-cnt.png" width="100%">
    +</center>
    +
    +The input table `clicks` is shown on the left. The query continuously computes results every hour and updates the result table. The clicks table contains four rows with timestamps (`cTime`) between `12:00:00` and `12:59:59`. The query computes two results rows from this input (one for each `user`) and appends them to the result table. For the next window between `13:00:00` and `13:59:59`, the `clicks` table contains three rows, which results in another two rows being appended to the result table. As more records arrive over time, the result table is appropriately updated.
    +
    +**Note:** Time-based computations such as windows are based on special [Time Attributes](#time-attributes), which are discussed below.
    +
    +#### Update and Append Queries
    +
    +Although the two example queries appear to be quite similar (both compute a grouped count aggregate), they differ in one important aspect. The first query must update previously emitted results, i.e., the changelog stream that defines the result table contains `INSERT` and `UPDATE` changes. In contrast, the second query only appends to the result table, i.e., the changelog stream of the result table consists only of `INSERT` changes.
    +
    +Whether a query produces an append-only table or an updated table has some implications:
    +- Queries that produce update changes usually have to maintain more state (see the following section).
    +- The conversion of an append-only table into a stream is different from the conversion of an updated table (see the [Table to Stream Conversion](#table-to-stream-conversion) section). 
    +
    +#### Query Restrictions
    +
    +Many, but not all, semantically valid queries can be evaluated as continuous queries on streams. Some queries are too expensive to compute, either due to the size of state that they need to maintain or because computing updates is too expensive.
    +
    +- **State Size:** Continuous queries are evaluated on unbounded streams and are often supposed to run for weeks or months. Hence, the total amount of data that a continuous query processes can be very large. Queries that have to update previously emitted results need to maintain all emitted rows in order to be able to update them. For instance, the first example query needs to store the URL count for each user to be able to increase the count and sent out a new result when the input table receives a new row. If only registered users are tracked, the number of counts to maintain might not be too high. However, if non-registered users get a unique user name assigned, the number of counts to maintain would grow over time and might eventually cause the query to fail.
    +
    +{% highlight sql %}
    +SELECT user, COUNT(url)
    +FROM clicks
    +GROUP BY user;
    +{% endhighlight %}
    +
    +- **Computing Updates:** Some queries require to recompute and update a large fraction of the emitted result rows even if only a single input record is added or updated. Clearly, such queries are not well suited to be executed as continuous queries. An example is the following query which computes for each user a `RANK` based on the time of the last click. As soon as the `clicks` table receives a new row, the `lastAction` of the user is updated and a new rank must be computed. However since two rows cannot have the same rank, all lower ranked rows need to be updated as well.
    +
    +{% highlight sql %}
    +SELECT user, RANK() OVER (ORDER BY lastLogin) 
    +FROM (
    +  SELECT user, MAX(cTime) AS lastAction FROM clicks GROUP BY user
    +);
    +{% endhighlight %}
    +
    +The [QueryConfig](#query-configuration) section discusses parameters to control the execution of continuous queries. Some parameters can be used to trade the size of maintained state for result accuracy.
    +
    +### Table to Stream Conversion
    +
    +A dynamic table can be continuously modified by `INSERT`, `UPDATE`, and `DELETE` changes just like a regular database table. It might be a table with a single row, which is constantly updated, an insert-only table without `UPDATE` and `DELETE` modifications, or anything in between.
    +
    +When converting a dynamic table into a stream or writing it to an external system, these changes need to be encoded. Flink's Table API and SQL support three ways to encode the changes of a dynamic table:
    +
    +* **Append-only stream:** A dynamic table that is only modified by `INSERT` changes can be  converted into a stream by emitting the inserted rows. 
    +
    +* **Retract stream:** A retract stream is a stream with two types of messages, *add messages* and *retract messages*. A dynamic table is converted into an retract stream by encoding an `INSERT` change as add message, a `DELETE` change as retract message, and an `UPDATE` change as a retract message for the updated row and an add message for the updating row. The following figure visualizes the conversion of a dynamic table into a retract stream.
    --- End diff --
    
    I think that would miss the information that the previous row (i.e., the row that is updated) is retracted and the new row (i.e., the updating row) is added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128566720
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    +
    +### Continuous Queries
    +
    +A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables. At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables. 
    +
    +In the following we show two example queries on a `clicks` table that is defined on the stream of click events.
    +
    +The first query is a simple `GROUP-BY COUNT` aggregation query. It groups the `clicks` table on the `user` field and counts the number of visited URLs. The following figure shows how the query is evaluated over time as the `clicks` table is updated with additional rows.
    +
    +<center>
    +<img alt="Continuous Non-Windowed Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-cnt.png" width="90%">
    --- End diff --
    
    OK, I'll update the figure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128242397
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    --- End diff --
    
    Does this note want tell user the table is logic table. ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4365: [FLINK-6747] [docs] Add documentation for dynamic ...

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4365#discussion_r128406783
  
    --- Diff: docs/dev/table/streaming.md ---
    @@ -22,21 +22,166 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -**TO BE DONE:** Intro
    +Flink's [Table API](tableApi.html) and [SQL support](sql.html) are unified APIs for batch and stream processing. This means that Table API and SQL queries have the same semantics regardless whether their input is bounded batch input or unbounded stream input. Because the relational algebra and SQL were originally designed for batch processing, relational queries on unbounded streaming input are not as well understood as relational queries on bounded batch input. 
    +
    +On this page, we explain concepts, practical limitations, and stream-specific configuration parameters of Flink's relational APIs on streaming data. 
     
     * This will be replaced by the TOC
     {:toc}
     
    -Dynamic Table
    --------------
    +Relational Queries on Data Streams
    +----------------------------------
    +
    +SQL and the relational algebra have not been designed with streaming data in mind. As a consequence, there are few conceptual gaps between relational algebra (and SQL) and stream processing.
    +
    +<table class="table table-bordered">
    +	<tr>
    +		<th>Relational Algebra / SQL</th>
    +		<th>Stream Processing</th>
    +	</tr>
    +	<tr>
    +		<td>Relations (or tables) are bounded (multi-)sets of tuples.</td>
    +		<td>A stream is an infinite sequences of tuples.</td>
    +	</tr>
    +	<tr>
    +		<td>A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data.</td>
    +		<td>A streaming query cannot access all data when is started and has to "wait" for data to be streamed in.</td>
    +	</tr>
    +	<tr>
    +		<td>A batch query terminates after it produced a fixed sized result.</td>
    +		<td>A streaming query continuously updates its result based on the received records and never completes.</td>
    +	</tr>
    +</table>
    +
    +Despite these differences, processing streams with relational queries and SQL is not impossible. Advanced relational database systems offer a feature called *Materialized Views*. A materialized view is defined as a SQL query, just like a regular virtual view. In contrast to a virtual view, a materialized view caches the result of the query such that the query does not need to be evaluated when the view is accessed. A common challenge for caching is to prevent a cache from serving outdated results. A materialized view becomes outdated when the base tables of its definition query are modified. *Eager View Maintenance* is a technique to update materialized views and updates a materialized view as soon as its base tables are updated. 
    +
    +The connection between eager view maintenance and SQL queries on streams becomes obvious if we consider the following:
    +
    +- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and `DELETE` DML statements, often called *changelog stream*.
    +- A materialized view is defined as a SQL query. In order to update the view, the query is continuously processes the changelog streams of the view's base relations.
    +- The materialized view is the result of the streaming SQL query.
    +
    +With these points in mind, we introduce Flink's concept of *Dynamic Tables* in the next section.
    +
    +Dynamic Tables &amp; Continuous Queries
    +---------------------------------------
    +
    +*Dynamic tables* are the core concept of Flink's Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic table are changing over time. They can be queried like static batch tables. Querying a dynamic table yields a *Continuous Query*. A continuous query never terminates and produces a dynamic table as result. The query continuously updates its (dynamic) result table to reflect the changes on its input (dynamic) table. Essentially, a continuous query on a dynamic table is very similar to the definition query of a materialized view. 
    +
    +It is important to note that the result of a continuous query is always semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables.
    +
    +The following figure visualizes the relationship of streams, dynamic tables, and  continuous queries: 
    +
    +<center>
    +<img alt="Dynamic tables" src="{{ site.baseurl }}/fig/table-streaming/stream-query-stream.png" width="80%">
    +</center>
    +
    +1. A stream is converted into a dynamic table.
    +1. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
    +1. The resulting dynamic table is converted back into a stream.
    +
    +**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are not necessarily (fully) materialized during query execution.
    +
    +In the following, we will explain the concepts of dynamic tables and continuous queries with a stream of click events that have the following schema:
    +
    +```
    +[ 
    +  user:  VARCHAR,   // the name of the user
    +  cTime: TIMESTAMP, // the time when the URL was accessed
    +  url:   VARCHAR    // the URL that was accessed by the user
    +]
    +```
    +
    +### Defining a Table on a Stream
    +
    +In order to process a stream with a relational query, it has to be converted into a `Table`. Conceptually, each record of the stream is interpreted as an `INSERT` modification on the resulting table. Essentially, we are building a table from an `INSERT`-only changelog stream.
    +
    +The following figure visualizes how the stream of click event (left-hand side) is converted into a table (right-hand side). The resulting table is continuously growing as more records of the click stream are inserted.
    +
    +<center>
    +<img alt="Append mode" src="{{ site.baseurl }}/fig/table-streaming/append-mode.png" width="60%">
    +</center>
    +
    +**Note:** A table which is defined on a stream is internally not materialized. 
    +
    +### Continuous Queries
    +
    +A continuous query is evaluated on a dynamic table and produces a new dynamic table as result. In contrast to a batch query, a continuous query never terminates and updates its result table according to the updates on its input tables. At any point in time, the result of a continuous query is semantically equivalent to the result of the same query being executed in batch mode on a snapshot of the input tables. 
    +
    +In the following we show two example queries on a `clicks` table that is defined on the stream of click events.
    +
    +The first query is a simple `GROUP-BY COUNT` aggregation query. It groups the `clicks` table on the `user` field and counts the number of visited URLs. The following figure shows how the query is evaluated over time as the `clicks` table is updated with additional rows.
    +
    +<center>
    +<img alt="Continuous Non-Windowed Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-cnt.png" width="90%">
    +</center>
    +
    +The input table `clicks` is shown on the left-hand side. Initially, the table consists of six rows. Evaluating the query (shown in the middle) on these six records yields a result table which is shown on the right-hand side at the top. When the `clicks` table is updated by appending an additional row (originating from the stream of click events), the query updates the current result table and increases the appropriate count. The updated result table is show on the right-hand side in the middle (the updated row is highlighted). Finally, another row is added and the result is shown on the right bottom of the figure.
    +
    +The second query is similar to the first one but groups the `clicks` table in addition to the `user` attribute also on an [hourly tumbling window](./sql.html#group-windows) before it counts the number of URLs. Again, the figure shows the input and output at different points in time to visualize the changing nature of dynamic tables.
    +
    +<center>
    +<img alt="Continuous Group-Window Query" src="{{ site.baseurl }}/fig/table-streaming/query-groupBy-window-cnt.png" width="100%">
    +</center>
    +
    +The input table `clicks` is shown on the left. The query continuously computes results every hour and updates the result table. The clicks table contains four rows with timestamps (`cTime`) between `12:00:00` and `12:59:59`. The query computes two results rows from this input (one for each `user`) and appends them to the result table. For the next window between `13:00:00` and `13:59:59`, the `clicks` table contains three rows, which results in another two rows being appended to the result table. As more records arrive over time, the result table is appropriately updated.
    +
    +**Note:** Time-based computations such as windows are based on special [Time Attributes](#time-attributes), which are discussed below.
    +
    +#### Update and Append Queries
    +
    +Although the two example queries appear to be quite similar (both compute a grouped count aggregate), they differ in one important aspect. The first query must update previously emitted results, i.e., the changelog stream that defines the result table contains `INSERT` and `UPDATE` changes. In contrast, the second query only appends to the result table, i.e., the changelog stream of the result table consists only of `INSERT` changes.
    --- End diff --
    
    Sounds good.:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4365: [FLINK-6747] [docs] Add documentation for dynamic tables.

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4365
  
    Thanks for the feedback @sunjincheng121.
    I updated the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---