You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ar...@apache.org on 2021/12/06 18:51:14 UTC

[flink] 02/03: [FLINK-24859][doc][formats] document text file reading

This is an automated email from the ASF dual-hosted git repository.

arvid pushed a commit to branch release-1.14
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 47d496ab85940df618fd5c192680d1d9aa720964
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Mon Nov 15 12:02:38 2021 +0100

    [FLINK-24859][doc][formats] document text file reading
---
 .../connectors/datastream/formats/text_files.md    | 68 ++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/docs/content/docs/connectors/datastream/formats/text_files.md b/docs/content/docs/connectors/datastream/formats/text_files.md
new file mode 100644
index 0000000..b79c9c6
--- /dev/null
+++ b/docs/content/docs/connectors/datastream/formats/text_files.md
@@ -0,0 +1,68 @@
+---
+title:  "Text files"
+weight: 4
+type: docs
+aliases:
+- /dev/connectors/formats/text_files.html
+- /apis/streaming/connectors/formats/text_files.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+
+# Text files format
+
+Flink supports reading from text lines from a file using `TextLineInputFormat`. This format uses Java's built-in InputStreamReader to decode the byte stream using various supported charset encodings.
+To use the format you need to add the Flink Parquet dependency to your project:
+
+```xml
+{{< artifact flink-connector-files >}}
+```
+
+This format is compatible with the new Source that can be used in both batch and streaming modes.
+Thus, you can use this format in two ways:
+- Bounded read for batch mode
+- Continuous read for streaming mode: monitors a directory for new files that appear
+
+**Bounded read example**:
+
+In this example we create a DataStream containing the lines of a text file as Strings. 
+There is no need for a watermark strategy as records do not contain event timestamps.
+
+```java
+final FileSource<String> source =
+  FileSource.forRecordStreamFormat(new TextLineInputFormat(), /* Flink Path */)
+  .build();
+final DataStream<String> stream =
+  env.fromSource(source, WatermarkStrategy.noWatermarks(), "file-source");
+```
+
+**Continuous read example**:
+In this example, we create a DataStream containing the lines of text files as Strings that will infinitely grow 
+as new files are added to the directory. We monitor for new files each second.
+There is no need for a watermark strategy as records do not contain event timestamps.
+
+```java
+final FileSource<String> source =
+    FileSource.forRecordStreamFormat(new TextLineInputFormat(), /* Flink Path */)
+  .monitorContinuously(Duration.ofSeconds(1L))
+  .build();
+final DataStream<String> stream =
+  env.fromSource(source, WatermarkStrategy.noWatermarks(), "file-source");
+```