You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by twalthr <gi...@git.apache.org> on 2018/01/08 10:49:58 UTC

[GitHub] flink pull request #5257: [FLINK-8381] [table] Document more flexible schema...

GitHub user twalthr opened a pull request:

    https://github.com/apache/flink/pull/5257

    [FLINK-8381] [table] Document more flexible schema definition

    ## What is the purpose of the change
    
    Documentation for schema definition modes.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/twalthr/flink FLINK-8381

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5257.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5257
    
----
commit 6ef689a0509fb1040600212b72d6a0a1ef66a3b9
Author: twalthr <tw...@...>
Date:   2018-01-08T10:46:45Z

    [FLINK-8381] [table] Document more flexible schema definition

----


---

[GitHub] flink pull request #5257: [FLINK-8381] [table] Document more flexible schema...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5257#discussion_r160274966
  
    --- Diff: docs/dev/table/common.md ---
    @@ -802,7 +802,87 @@ val dsTuple: DataSet[(String, Int)] = tableEnv.toDataSet[(String, Int)](table)
     
     ### Mapping of Data Types to Table Schema
     
    -Flink's DataStream and DataSet APIs support very diverse types, such as Tuples (built-in Scala and Flink Java tuples), POJOs, case classes, and atomic types. In the following we describe how the Table API converts these types into an internal row representation and show examples of converting a `DataStream` into a `Table`.
    +Flink's DataStream and DataSet APIs support very diverse types. Composite types such as Tuples (built-in Scala and Flink Java tuples), POJOs, Scala case classes, and Flink's Row type allow for nested data structures with multiple fields that can be accessed in table expressions. Other types are treated as atomic types. In the following, we describe how the Table API converts these types into an internal row representation and show examples of converting a `DataStream` into a `Table`.
    --- End diff --
    
    The description is the two modes is good, but the following sections for the different types was not updated. 
    I think we could describe *Position-based Mapping* and *Name-based Mapping* first and move the concrete code examples to the individual type sections. For example for `Tuples` we would show position and name base mappings in the same code example. This would also highlight the difference.
    We should also double-check the text descriptions for the different types.
      


---

[GitHub] flink pull request #5257: [FLINK-8381] [table] Document more flexible schema...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5257#discussion_r160275240
  
    --- Diff: docs/dev/table/common.md ---
    @@ -802,7 +802,87 @@ val dsTuple: DataSet[(String, Int)] = tableEnv.toDataSet[(String, Int)](table)
     
     ### Mapping of Data Types to Table Schema
     
    -Flink's DataStream and DataSet APIs support very diverse types, such as Tuples (built-in Scala and Flink Java tuples), POJOs, case classes, and atomic types. In the following we describe how the Table API converts these types into an internal row representation and show examples of converting a `DataStream` into a `Table`.
    +Flink's DataStream and DataSet APIs support very diverse types. Composite types such as Tuples (built-in Scala and Flink Java tuples), POJOs, Scala case classes, and Flink's Row type allow for nested data structures with multiple fields that can be accessed in table expressions. Other types are treated as atomic types. In the following, we describe how the Table API converts these types into an internal row representation and show examples of converting a `DataStream` into a `Table`.
    +
    +The mapping of a data type to a table schema can happen in two ways: **based on the field positions** or **based on the field names**.
    +
    +**Position-based Mapping**
    +
    +Position-based mapping can be used to give fields a more meaningful name while keeping the field order. This mapping is available for composite data types *with a defined field order* as well as atomic types. Composite data types such as tuples, rows, and case classes have such a field order. However, fields of a POJO must be mapped based on the field names (see next section).
    +
    +When defining a position-based mapping, the specified names must not exist in the input data type, otherwise the API will assume that the mapping should happen based on the field names. If no field names are specified, the default field names and field order of the composite type are used or `f0` for atomic types. 
    +
    +<div class="codetabs" markdown="1">
    --- End diff --
    
    Move and split code examples to the discussion of the individual types.
      


---

[GitHub] flink issue #5257: [FLINK-8381] [table] Document more flexible schema defini...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/5257
  
    Thanks for your feedback @fhueske. I hope I could address all your comments. I will merge this now...


---

[GitHub] flink pull request #5257: [FLINK-8381] [table] Document more flexible schema...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5257#discussion_r160275286
  
    --- Diff: docs/dev/table/common.md ---
    @@ -802,7 +802,87 @@ val dsTuple: DataSet[(String, Int)] = tableEnv.toDataSet[(String, Int)](table)
     
     ### Mapping of Data Types to Table Schema
     
    -Flink's DataStream and DataSet APIs support very diverse types, such as Tuples (built-in Scala and Flink Java tuples), POJOs, case classes, and atomic types. In the following we describe how the Table API converts these types into an internal row representation and show examples of converting a `DataStream` into a `Table`.
    +Flink's DataStream and DataSet APIs support very diverse types. Composite types such as Tuples (built-in Scala and Flink Java tuples), POJOs, Scala case classes, and Flink's Row type allow for nested data structures with multiple fields that can be accessed in table expressions. Other types are treated as atomic types. In the following, we describe how the Table API converts these types into an internal row representation and show examples of converting a `DataStream` into a `Table`.
    +
    +The mapping of a data type to a table schema can happen in two ways: **based on the field positions** or **based on the field names**.
    +
    +**Position-based Mapping**
    +
    +Position-based mapping can be used to give fields a more meaningful name while keeping the field order. This mapping is available for composite data types *with a defined field order* as well as atomic types. Composite data types such as tuples, rows, and case classes have such a field order. However, fields of a POJO must be mapped based on the field names (see next section).
    +
    +When defining a position-based mapping, the specified names must not exist in the input data type, otherwise the API will assume that the mapping should happen based on the field names. If no field names are specified, the default field names and field order of the composite type are used or `f0` for atomic types. 
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +// get a StreamTableEnvironment, works for BatchTableEnvironment equivalently
    +StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
    +
    +DataStream<Tuple2<Long, Integer>> stream = ...
    +// convert DataStream into Table with default field names "f0" and "f1"
    +Table table = tableEnv.fromDataStream(stream);
    +// convert DataStream into Table with field names "myLong" and "myInt"
    +Table table = tableEnv.fromDataStream(stream, "myLong, myInt");
    +{% endhighlight %}
    +</div>
    +
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +// get a TableEnvironment
    +val tableEnv = TableEnvironment.getTableEnvironment(env)
    +
    +val stream: DataStream[(Long, Int)] = ...
    +// convert DataStream into Table with default field names "_1" and "_2"
    +val table: Table = tableEnv.fromDataStream(stream)
    +// convert DataStream into Table with field names "myLong" and "myInt"
    +val table: Table = tableEnv.fromDataStream(stream, 'myLong 'myInt)
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +**Name-based Mapping**
    +
    +Name-based mapping can be used for any data type including POJOs. It is the most flexible way of defining a table schema mapping. All fields in the mapping are referenced by name and can be possibly renamed using an alias `as`. Fields can be reordered and projected out.
    +
    +If no field names are specified, the default field names and field order of the composite type are used or `f0` for atomic types.
    +
    +<div class="codetabs" markdown="1">
    --- End diff --
    
    Move and split code examples to the discussion of the individual types.


---

[GitHub] flink pull request #5257: [FLINK-8381] [table] Document more flexible schema...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/5257


---