You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ja...@apache.org on 2020/06/10 02:52:15 UTC

[flink] 03/03: [FLINK-18133][docs][avro] Add documentation for the new Avro format

This is an automated email from the ASF dual-hosted git repository.

jark pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 821e786ce1e94e0074affcd50ecd4b87a6bd744b
Author: Jark Wu <ja...@apache.org>
AuthorDate: Mon Jun 8 17:38:14 2020 +0800

    [FLINK-18133][docs][avro] Add documentation for the new Avro format
    
    This closes #12523
---
 docs/dev/table/connectors/formats/avro.md     | 203 +++++++++++++++++++++++++
 docs/dev/table/connectors/formats/avro.zh.md  | 206 ++++++++++++++++++++++++++
 docs/dev/table/connectors/formats/index.md    |  72 +++++++++
 docs/dev/table/connectors/formats/index.zh.md |  72 +++++++++
 4 files changed, 553 insertions(+)

diff --git a/docs/dev/table/connectors/formats/avro.md b/docs/dev/table/connectors/formats/avro.md
new file mode 100644
index 0000000..8870235
--- /dev/null
+++ b/docs/dev/table/connectors/formats/avro.md
@@ -0,0 +1,203 @@
+---
+title: "Avro Format"
+nav-title: Avro
+nav-parent_id: sql-formats
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<span class="label label-info">Format: Serialization Schema</span>
+<span class="label label-info">Format: Deserialization Schema</span>
+
+* This will be replaced by the TOC
+{:toc}
+
+The [Apache Avro](https://avro.apache.org/) format allows to read and write Avro data based on an Avro schema. Currently, the Avro schema is derived from table schema.
+
+Dependencies
+------------
+
+In order to setup the Avro format, the following table provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.
+
+| Maven dependency   | SQL Client JAR         |
+| :----------------- | :----------------------|
+| `flink-avro`       | [Pre-bundled Hadoop](https://flink.apache.org/downloads.html#additional-components) |
+
+How to create a table with Avro format
+----------------
+
+Here is an example to create a table using Kafka connector and Avro format.
+
+<div class="codetabs" markdown="1">
+<div data-lang="SQL" markdown="1">
+{% highlight sql %}
+CREATE TABLE user_behavior (
+  user_id BIGINT,
+  item_id BIGINT,
+  category_id BIGINT,
+  behavior STRING,
+  ts TIMESTAMP(3),
+) WITH (
+ 'connector' = 'kafka',
+ 'topic' = 'user_behavior',
+ 'properties.bootstrap.servers' = 'localhost:9092',
+ 'properties.group.id' = 'testGroup',
+ 'format' = 'avro'
+)
+{% endhighlight %}
+</div>
+</div>
+
+Format Options
+----------------
+
+<table class="table table-bordered">
+    <thead>
+      <tr>
+        <th class="text-left" style="width: 25%">Option</th>
+        <th class="text-center" style="width: 8%">Required</th>
+        <th class="text-center" style="width: 7%">Default</th>
+        <th class="text-center" style="width: 10%">Type</th>
+        <th class="text-center" style="width: 50%">Description</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td><h5>format</h5></td>
+      <td>required</td>
+      <td style="word-wrap: break-word;">(none)</td>
+      <td>String</td>
+      <td>Specify what format to use, here should be 'avro'.</td>
+    </tr>
+    </tbody>
+</table>
+
+Data Type Mapping
+----------------
+
+Currently, the Avro schema is always derived from table schema. Explicitly defining an Avro schema is not supported yet.
+So the following table lists the type mapping from Flink type to Avro type.
+
+<table class="table table-bordered">
+    <thead>
+      <tr>
+        <th class="text-left">Flink Data Type</th>
+        <th class="text-center">Avro type</th>
+        <th class="text-center">Avro logical type</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td>CHAR / VARCHAR / STRING</td>
+      <td>string</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>BOOLEAN</td>
+      <td>boolean</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>BINARY / VARBINARY</td>
+      <td>bytes</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>DECIMAL</td>
+      <td>fixed</td>
+      <td>decimal</td>
+    </tr>
+    <tr>
+      <td>TINYINT</td>
+      <td>int</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>SMALLINT</td>
+      <td>int</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>INT</td>
+      <td>int</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>BIGINT</td>
+      <td>long</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>FLOAT</td>
+      <td>float</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>DOUBLE</td>
+      <td>double</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>DATE</td>
+      <td>int</td>
+      <td>date</td>
+    </tr>
+    <tr>
+      <td>TIME</td>
+      <td>int</td>
+      <td>time-millis</td>
+    </tr>
+    <tr>
+      <td>TIMESTAMP</td>
+      <td>long</td>
+      <td>timestamp-millis</td>
+    </tr>
+    <tr>
+      <td>ARRAY</td>
+      <td>array</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>MAP<br>
+      (key must be string/char/varchar type)</td>
+      <td>map</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>MULTISET<br>
+      (element must be string/char/varchar type)</td>
+      <td>map</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>ROW</td>
+      <td>record</td>
+      <td></td>
+    </tr>
+    </tbody>
+</table>
+
+In addition to the types listed above, Flink supports reading/writing nullable types. Flink maps nullable types to Avro `union(something, null)`, where `something` is the Avro type converted from Flink type.
+
+You can refer to Avro Specification for more information about Avro types: [https://avro.apache.org/docs/current/spec.html](https://avro.apache.org/docs/current/spec.html).
+
+
+
+
diff --git a/docs/dev/table/connectors/formats/avro.zh.md b/docs/dev/table/connectors/formats/avro.zh.md
new file mode 100644
index 0000000..ed74042
--- /dev/null
+++ b/docs/dev/table/connectors/formats/avro.zh.md
@@ -0,0 +1,206 @@
+---
+title: "Avro Format"
+nav-title: Avro
+nav-parent_id: sql-formats
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<span class="label label-info">Format: Serialization Schema</span>
+<span class="label label-info">Format: Deserialization Schema</span>
+
+* This will be replaced by the TOC
+{:toc}
+
+The [Apache Avro](https://avro.apache.org/) format allows to read and write Avro data based on an Avro schema. Currently, the Avro schema is derived from table schema.
+
+Dependencies
+------------
+
+In order to setup the Avro format, the following table provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.
+
+| Maven dependency   | SQL Client JAR         |
+| :----------------- | :----------------------|
+| `flink-avro`       | [Pre-bundled Hadoop](https://flink.apache.org/downloads.html#additional-components) |
+
+How to create a table with Avro format
+----------------
+
+Here is an example to create a table using Kafka connector and Avro format.
+
+<div class="codetabs" markdown="1">
+<div data-lang="SQL" markdown="1">
+{% highlight sql %}
+CREATE TABLE user_behavior (
+  user_id BIGINT,
+  item_id BIGINT,
+  category_id BIGINT,
+  behavior STRING,
+  ts TIMESTAMP(3),
+) WITH (
+ 'connector' = 'kafka',
+ 'topic' = 'user_behavior',
+ 'properties.bootstrap.servers' = 'localhost:9092',
+ 'properties.group.id' = 'testGroup',
+ 'format' = 'avro'
+)
+{% endhighlight %}
+</div>
+</div>
+
+Format Options
+----------------
+
+<table class="table table-bordered">
+    <thead>
+      <tr>
+        <th class="text-left" style="width: 25%">Option</th>
+        <th class="text-center" style="width: 8%">Required</th>
+        <th class="text-center" style="width: 7%">Default</th>
+        <th class="text-center" style="width: 10%">Type</th>
+        <th class="text-center" style="width: 50%">Description</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td><h5>format</h5></td>
+      <td>required</td>
+      <td style="word-wrap: break-word;">(none)</td>
+      <td>String</td>
+      <td>Specify what format to use, here should be 'avro'.</td>
+    </tr>
+    </tbody>
+</table>
+
+Data Type Mapping
+----------------
+
+Currently, the Avro schema is always derived from table schema, explicitly define an Avro schema is not supported yet. So here only lists the conversion from Flink type to Avro type.
+
+### Conversion from Flink type to Avro type
+
+The following table lists the conversion from the supported Flink type to Avro type.
+
+<table class="table table-bordered">
+    <thead>
+      <tr>
+        <th class="text-left">Flink Data Type</th>
+        <th class="text-center">Avro type</th>
+        <th class="text-center">Avro logical type</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td>CHAR / VARCHAR / STRING</td>
+      <td>string</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>BOOLEAN</td>
+      <td>boolean</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>BINARY / VARBINARY</td>
+      <td>bytes</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>DECIMAL</td>
+      <td>fixed</td>
+      <td>decimal</td>
+    </tr>
+    <tr>
+      <td>TINYINT</td>
+      <td>int</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>SMALLINT</td>
+      <td>int</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>INT</td>
+      <td>int</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>BIGINT</td>
+      <td>long</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>FLOAT</td>
+      <td>float</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>DOUBLE</td>
+      <td>double</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>DATE</td>
+      <td>int</td>
+      <td>date</td>
+    </tr>
+    <tr>
+      <td>TIME</td>
+      <td>int</td>
+      <td>time-millis</td>
+    </tr>
+    <tr>
+      <td>TIMESTAMP</td>
+      <td>long</td>
+      <td>timestamp-millis</td>
+    </tr>
+    <tr>
+      <td>ARRAY</td>
+      <td>array</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>MAP<br>
+      (key must be string/char/varchar type)</td>
+      <td>map</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>MULTISET<br>
+      (element must be string/char/varchar type)</td>
+      <td>map</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td>ROW</td>
+      <td>record</td>
+      <td></td>
+    </tr>
+    </tbody>
+</table>
+
+In addition to the types listed above, Flink supports reading/writing nullable types. Flink maps nullable types to Avro `union(something, null)`, where `something` is the Avro type converted from Flink type.
+
+You can refer to Avro Specification for more information about Avro types: [https://avro.apache.org/docs/current/spec.html](https://avro.apache.org/docs/current/spec.html).
+
+
+
+
diff --git a/docs/dev/table/connectors/formats/index.md b/docs/dev/table/connectors/formats/index.md
new file mode 100644
index 0000000..6f45d74
--- /dev/null
+++ b/docs/dev/table/connectors/formats/index.md
@@ -0,0 +1,72 @@
+---
+title: "Formats"
+nav-id: sql-formats
+nav-parent_id: sql-connectors
+nav-pos: 1
+nav-show_overview: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Flink provides a set of table formats that can be used with table connectors. A table format is a storage format defines how to map binary data onto table columns.
+
+Flink supports the following formats:
+
+<table class="table table-bordered">
+    <thead>
+      <tr>
+        <th class="text-left">Formats</th>
+        <th class="text-left">Supported Connectors</th>
+      </tr>
+    </thead>
+    <tbody>
+        <tr>
+          <td>CSV</td>
+          <td>Apache Kafka,
+          <a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td>
+        </tr>
+        <tr>
+         <td>JSON</td>
+         <td>Apache Kafka,
+          <a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a>,
+          Elasticsearch</td>
+       </tr>
+        <tr>
+          <td><a href="{{ site.baseurl }}/dev/table/connectors/formats/avro.html">Apache Avro</a></td>
+          <td>Apache Kafka,
+           <a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td>
+        </tr>
+        <tr>
+         <td>Debezium JSON</td>
+         <td>Apache Kafka</td>
+        </tr>
+        <tr>
+         <td>Canal JSON</td>
+         <td>Apache Kafka</td>
+        </tr>
+        <tr>
+         <td>Apache Parquet</td>
+         <td><a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td>
+        </tr>
+        <tr>
+         <td>Apache ORC</td>
+         <td><a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td>
+        </tr>
+    </tbody>
+</table>
\ No newline at end of file
diff --git a/docs/dev/table/connectors/formats/index.zh.md b/docs/dev/table/connectors/formats/index.zh.md
new file mode 100644
index 0000000..9272f21
--- /dev/null
+++ b/docs/dev/table/connectors/formats/index.zh.md
@@ -0,0 +1,72 @@
+---
+title: "Formats"
+nav-id: sql-formats
+nav-parent_id: sql-connectors
+nav-pos: 1
+nav-show_overview: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Flink provides a set of table formats that can be used with table connectors. A table format is a storage format defines how to map binary data onto table columns.
+
+Flink supports the following formats:
+
+<table class="table table-bordered">
+    <thead>
+      <tr>
+        <th class="text-left">Formats</th>
+        <th class="text-left">Supported Connectors</th>
+      </tr>
+    </thead>
+    <tbody>
+        <tr>
+          <td>CSV</td>
+          <td><a href="{{ site.baseurl }}/dev/table/connectors/kafka.html">Apache Kafka</a>,
+          <a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td>
+        </tr>
+        <tr>
+         <td>JSON</td>
+         <td><a href="{{ site.baseurl }}/dev/table/connectors/kafka.html">Apache Kafka</a>,
+          <a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a>,
+          <a href="{{ site.baseurl }}/dev/table/connectors/elasticsearch.html">Elasticsearch</a></td>
+       </tr>
+        <tr>
+          <td><a href="{{ site.baseurl }}/dev/table/connectors/formats/avro.html">Apache Avro</a></td>
+          <td><a href="{{ site.baseurl }}/dev/table/connectors/kafka.html">Apache Kafka</a>,
+           <a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td>
+        </tr>
+        <tr>
+         <td>Debezium JSON</td>
+         <td><a href="{{ site.baseurl }}/dev/table/connectors/kafka.html">Apache Kafka</a></td>
+        </tr>
+        <tr>
+         <td>Canal JSON</td>
+         <td><a href="{{ site.baseurl }}/dev/table/connectors/kafka.html">Apache Kafka</a></td>
+        </tr>
+        <tr>
+         <td>Apache Parquet</td>
+         <td><a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td>
+        </tr>
+        <tr>
+         <td>Apache ORC</td>
+         <td><a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td>
+        </tr>
+    </tbody>
+</table>
\ No newline at end of file