You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ja...@apache.org on 2020/06/11 08:05:40 UTC
[flink] branch release-1.11 updated: [FLINK-18132][docs] Add
documentation for the new CSV format
This is an automated email from the ASF dual-hosted git repository.
jark pushed a commit to branch release-1.11
in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/release-1.11 by this push:
new 671649e [FLINK-18132][docs] Add documentation for the new CSV format
671649e is described below
commit 671649ef3c666441e297047dd236468590debf96
Author: Danny Chan <yu...@gmail.com>
AuthorDate: Thu Jun 11 16:04:38 2020 +0800
[FLINK-18132][docs] Add documentation for the new CSV format
This closes #12571
---
docs/dev/table/connectors/formats/csv.md | 254 ++++++++++++++++++++++++++
docs/dev/table/connectors/formats/csv.zh.md | 254 ++++++++++++++++++++++++++
docs/dev/table/connectors/formats/index.md | 2 +-
docs/dev/table/connectors/formats/index.zh.md | 2 +-
4 files changed, 510 insertions(+), 2 deletions(-)
diff --git a/docs/dev/table/connectors/formats/csv.md b/docs/dev/table/connectors/formats/csv.md
new file mode 100644
index 0000000..6154fbd
--- /dev/null
+++ b/docs/dev/table/connectors/formats/csv.md
@@ -0,0 +1,254 @@
+---
+title: "CSV Format"
+nav-title: CSV
+nav-parent_id: sql-formats
+nav-pos: 1
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<span class="label label-info">Format: Serialization Schema</span>
+<span class="label label-info">Format: Deserialization Schema</span>
+
+* This will be replaced by the TOC
+{:toc}
+
+The [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) format allows to read and write CSV data based on an CSV schema. Currently, the CSV schema is derived from table schema.
+
+Dependencies
+------------
+
+In order to setup the CSV format, the following table provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.
+
+| Maven dependency | SQL Client JAR |
+| :----------------- | :----------------------|
+| `flink-csv` | Built-in |
+
+How to create a table with CSV format
+----------------
+
+Here is an example to create a table using Kafka connector and CSV format.
+
+<div class="codetabs" markdown="1">
+<div data-lang="SQL" markdown="1">
+{% highlight sql %}
+CREATE TABLE user_behavior (
+ user_id BIGINT,
+ item_id BIGINT,
+ category_id BIGINT,
+ behavior STRING,
+ ts TIMESTAMP(3)
+) WITH (
+ 'connector' = 'kafka',
+ 'topic' = 'user_behavior',
+ 'properties.bootstrap.servers' = 'localhost:9092',
+ 'properties.group.id' = 'testGroup',
+ 'format' = 'csv',
+ 'csv.ignore-parse-errors' = 'true',
+ 'csv.allow-comments' = 'true'
+)
+{% endhighlight %}
+</div>
+</div>
+
+Format Options
+----------------
+
+<table class="table table-bordered">
+ <thead>
+ <tr>
+ <th class="text-left" style="width: 25%">Option</th>
+ <th class="text-center" style="width: 8%">Required</th>
+ <th class="text-center" style="width: 7%">Default</th>
+ <th class="text-center" style="width: 10%">Type</th>
+ <th class="text-center" style="width: 50%">Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td><h5>format</h5></td>
+ <td>required</td>
+ <td style="word-wrap: break-word;">(none)</td>
+ <td>String</td>
+ <td>Specify what format to use, here should be 'csv'.</td>
+ </tr>
+ <tr>
+ <td><h5>csv.field-delimiter</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;"><code>,</code></td>
+ <td>String</td>
+ <td>Field delimiter character (',' by default).</td>
+ </tr>
+ <tr>
+ <td><h5>csv.line-delimiter</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;"><code>\n</code></td>
+ <td>String</td>
+ <td>Line delimiter ('\n' by default, otherwise
+ '\r' or '\r\n' are allowed), unicode is supported if
+ the delimiter is an invisible special character,
+ e.g. U&'\\000D' is the unicode representation of carriage return '\r'
+ e.g. U&'\\000A' is the unicode representation of line feed '\n'.</td>
+ </tr>
+ <tr>
+ <td><h5>csv.disable-quote-character</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;">false</td>
+ <td>Boolean</td>
+ <td>Flag to disabled quote character for enclosing field values (false by default)
+ if true, quote-character can not be set.</td>
+ </tr>
+ <tr>
+ <td><h5>csv.quote-character</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;"><code>"</code></td>
+ <td>String</td>
+ <td>Quote character for enclosing field values ('"' by default).</td>
+ </tr>
+ <tr>
+ <td><h5>csv.allow-comments</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;">false</td>
+ <td>Boolean</td>
+ <td>Flag to ignore comment lines that start with '#'
+ (disabled by default);
+ if enabled, make sure to also ignore parse errors to allow empty rows.</td>
+ </tr>
+ <tr>
+ <td><h5>csv.ignore-parse-errors</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;">false</td>
+ <td>Boolean</td>
+ <td>Flag to skip fields and rows with parse errors instead of failing;
+ fields are set to null in case of errors.</td>
+ </tr>
+ <tr>
+ <td><h5>csv.array-element-delimiter</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;"><code>;</code></td>
+ <td>String</td>
+ <td>Array element delimiter string for separating
+ array and row element values (';' by default).</td>
+ </tr>
+ <tr>
+ <td><h5>csv.escape-character</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;">(none)</td>
+ <td>String</td>
+ <td>Escape character for escaping values (disabled by default).</td>
+ </tr>
+ <tr>
+ <td><h5>csv.null-literal</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;">(none)</td>
+ <td>String</td>
+ <td>Null literal string that is interpreted as a
+ null value (disabled by default).</td>
+ </tr>
+ </tbody>
+</table>
+
+Data Type Mapping
+----------------
+
+Currently, the CSV schema is always derived from table schema. Explicitly defining an CSV schema is not supported yet.
+
+Flink CSV format uses [jackson databind API](https://github.com/FasterXML/jackson-databind) to parse and generate CSV string.
+
+The following table lists the type mapping from Flink type to CSV type.
+
+<table class="table table-bordered">
+ <thead>
+ <tr>
+ <th class="text-left">Flink Data Type</th>
+ <th class="text-center">CSV Data Type</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>CHAR / VARCHAR / STRING</td>
+ <td>string</td>
+ </tr>
+ <tr>
+ <td>BOOLEAN</td>
+ <td>boolean</td>
+ </tr>
+ <tr>
+ <td>BINARY / VARBINARY</td>
+ <td>string with encoding: base64</td>
+ </tr>
+ <tr>
+ <td>DECIMAL</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>TINYINT</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>SMALLINT</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>INT</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>BIGINT</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>FLOAT</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>DOUBLE</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>DATE</td>
+ <td>string with format: date</td>
+ </tr>
+ <tr>
+ <td>TIME</td>
+ <td>string with format: time</td>
+ </tr>
+ <tr>
+ <td>TIMESTAMP</td>
+ <td>string with format: date-time</td>
+ </tr>
+ <tr>
+ <td>INTERVAL</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>ARRAY</td>
+ <td>array</td>
+ </tr>
+ <tr>
+ <td>ROW</td>
+ <td>object</td>
+ </tr>
+ </tbody>
+</table>
+
+
+
+
+
diff --git a/docs/dev/table/connectors/formats/csv.zh.md b/docs/dev/table/connectors/formats/csv.zh.md
new file mode 100644
index 0000000..6154fbd
--- /dev/null
+++ b/docs/dev/table/connectors/formats/csv.zh.md
@@ -0,0 +1,254 @@
+---
+title: "CSV Format"
+nav-title: CSV
+nav-parent_id: sql-formats
+nav-pos: 1
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<span class="label label-info">Format: Serialization Schema</span>
+<span class="label label-info">Format: Deserialization Schema</span>
+
+* This will be replaced by the TOC
+{:toc}
+
+The [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) format allows to read and write CSV data based on an CSV schema. Currently, the CSV schema is derived from table schema.
+
+Dependencies
+------------
+
+In order to setup the CSV format, the following table provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.
+
+| Maven dependency | SQL Client JAR |
+| :----------------- | :----------------------|
+| `flink-csv` | Built-in |
+
+How to create a table with CSV format
+----------------
+
+Here is an example to create a table using Kafka connector and CSV format.
+
+<div class="codetabs" markdown="1">
+<div data-lang="SQL" markdown="1">
+{% highlight sql %}
+CREATE TABLE user_behavior (
+ user_id BIGINT,
+ item_id BIGINT,
+ category_id BIGINT,
+ behavior STRING,
+ ts TIMESTAMP(3)
+) WITH (
+ 'connector' = 'kafka',
+ 'topic' = 'user_behavior',
+ 'properties.bootstrap.servers' = 'localhost:9092',
+ 'properties.group.id' = 'testGroup',
+ 'format' = 'csv',
+ 'csv.ignore-parse-errors' = 'true',
+ 'csv.allow-comments' = 'true'
+)
+{% endhighlight %}
+</div>
+</div>
+
+Format Options
+----------------
+
+<table class="table table-bordered">
+ <thead>
+ <tr>
+ <th class="text-left" style="width: 25%">Option</th>
+ <th class="text-center" style="width: 8%">Required</th>
+ <th class="text-center" style="width: 7%">Default</th>
+ <th class="text-center" style="width: 10%">Type</th>
+ <th class="text-center" style="width: 50%">Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td><h5>format</h5></td>
+ <td>required</td>
+ <td style="word-wrap: break-word;">(none)</td>
+ <td>String</td>
+ <td>Specify what format to use, here should be 'csv'.</td>
+ </tr>
+ <tr>
+ <td><h5>csv.field-delimiter</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;"><code>,</code></td>
+ <td>String</td>
+ <td>Field delimiter character (',' by default).</td>
+ </tr>
+ <tr>
+ <td><h5>csv.line-delimiter</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;"><code>\n</code></td>
+ <td>String</td>
+ <td>Line delimiter ('\n' by default, otherwise
+ '\r' or '\r\n' are allowed), unicode is supported if
+ the delimiter is an invisible special character,
+ e.g. U&'\\000D' is the unicode representation of carriage return '\r'
+ e.g. U&'\\000A' is the unicode representation of line feed '\n'.</td>
+ </tr>
+ <tr>
+ <td><h5>csv.disable-quote-character</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;">false</td>
+ <td>Boolean</td>
+ <td>Flag to disabled quote character for enclosing field values (false by default)
+ if true, quote-character can not be set.</td>
+ </tr>
+ <tr>
+ <td><h5>csv.quote-character</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;"><code>"</code></td>
+ <td>String</td>
+ <td>Quote character for enclosing field values ('"' by default).</td>
+ </tr>
+ <tr>
+ <td><h5>csv.allow-comments</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;">false</td>
+ <td>Boolean</td>
+ <td>Flag to ignore comment lines that start with '#'
+ (disabled by default);
+ if enabled, make sure to also ignore parse errors to allow empty rows.</td>
+ </tr>
+ <tr>
+ <td><h5>csv.ignore-parse-errors</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;">false</td>
+ <td>Boolean</td>
+ <td>Flag to skip fields and rows with parse errors instead of failing;
+ fields are set to null in case of errors.</td>
+ </tr>
+ <tr>
+ <td><h5>csv.array-element-delimiter</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;"><code>;</code></td>
+ <td>String</td>
+ <td>Array element delimiter string for separating
+ array and row element values (';' by default).</td>
+ </tr>
+ <tr>
+ <td><h5>csv.escape-character</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;">(none)</td>
+ <td>String</td>
+ <td>Escape character for escaping values (disabled by default).</td>
+ </tr>
+ <tr>
+ <td><h5>csv.null-literal</h5></td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;">(none)</td>
+ <td>String</td>
+ <td>Null literal string that is interpreted as a
+ null value (disabled by default).</td>
+ </tr>
+ </tbody>
+</table>
+
+Data Type Mapping
+----------------
+
+Currently, the CSV schema is always derived from table schema. Explicitly defining an CSV schema is not supported yet.
+
+Flink CSV format uses [jackson databind API](https://github.com/FasterXML/jackson-databind) to parse and generate CSV string.
+
+The following table lists the type mapping from Flink type to CSV type.
+
+<table class="table table-bordered">
+ <thead>
+ <tr>
+ <th class="text-left">Flink Data Type</th>
+ <th class="text-center">CSV Data Type</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>CHAR / VARCHAR / STRING</td>
+ <td>string</td>
+ </tr>
+ <tr>
+ <td>BOOLEAN</td>
+ <td>boolean</td>
+ </tr>
+ <tr>
+ <td>BINARY / VARBINARY</td>
+ <td>string with encoding: base64</td>
+ </tr>
+ <tr>
+ <td>DECIMAL</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>TINYINT</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>SMALLINT</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>INT</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>BIGINT</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>FLOAT</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>DOUBLE</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>DATE</td>
+ <td>string with format: date</td>
+ </tr>
+ <tr>
+ <td>TIME</td>
+ <td>string with format: time</td>
+ </tr>
+ <tr>
+ <td>TIMESTAMP</td>
+ <td>string with format: date-time</td>
+ </tr>
+ <tr>
+ <td>INTERVAL</td>
+ <td>number</td>
+ </tr>
+ <tr>
+ <td>ARRAY</td>
+ <td>array</td>
+ </tr>
+ <tr>
+ <td>ROW</td>
+ <td>object</td>
+ </tr>
+ </tbody>
+</table>
+
+
+
+
+
diff --git a/docs/dev/table/connectors/formats/index.md b/docs/dev/table/connectors/formats/index.md
index e0e03b5..ec39af0 100644
--- a/docs/dev/table/connectors/formats/index.md
+++ b/docs/dev/table/connectors/formats/index.md
@@ -37,7 +37,7 @@ Flink supports the following formats:
</thead>
<tbody>
<tr>
- <td>CSV</td>
+ <td><a href="{{ site.baseurl }}/dev/table/connectors/formats/csv.html">CSV</a></td>
<td>Apache Kafka,
<a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td>
</tr>
diff --git a/docs/dev/table/connectors/formats/index.zh.md b/docs/dev/table/connectors/formats/index.zh.md
index e0e03b5..ec39af0 100644
--- a/docs/dev/table/connectors/formats/index.zh.md
+++ b/docs/dev/table/connectors/formats/index.zh.md
@@ -37,7 +37,7 @@ Flink supports the following formats:
</thead>
<tbody>
<tr>
- <td>CSV</td>
+ <td><a href="{{ site.baseurl }}/dev/table/connectors/formats/csv.html">CSV</a></td>
<td>Apache Kafka,
<a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td>
</tr>