You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ja...@apache.org on 2019/07/03 02:08:14 UTC
[flink] 04/05: [FLINK-12943][docs-zh] Translate "HDFS Connector" page into Chinese

This is an automated email from the ASF dual-hosted git repository.

jark pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 345ac8868b705b7c9be9bb70e6fb54d1d15baa9b
Author: aloys <lo...@gmail.com>
AuthorDate: Wed Jun 26 01:22:22 2019 +0800

    [FLINK-12943][docs-zh] Translate "HDFS Connector" page into Chinese
    
    This closes #8897
---
 docs/dev/connectors/filesystem_sink.zh.md | 80 ++++++++++++-------------------
 1 file changed, 31 insertions(+), 49 deletions(-)

diff --git a/docs/dev/connectors/filesystem_sink.zh.md b/docs/dev/connectors/filesystem_sink.zh.md
index f9a828d..54b0c64 100644
--- a/docs/dev/connectors/filesystem_sink.zh.md
+++ b/docs/dev/connectors/filesystem_sink.zh.md
@@ -1,5 +1,5 @@
 ---
-title: "HDFS Connector"
+title: "HDFS 连接器"
 nav-title: Rolling File Sink
 nav-parent_id: connectors
 nav-pos: 5
@@ -23,9 +23,8 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-This connector provides a Sink that writes partitioned files to any filesystem supported by
-[Hadoop FileSystem](http://hadoop.apache.org). To use this connector, add the
-following dependency to your project:
+这个连接器可以向所有 [Hadoop FileSystem](http://hadoop.apache.org) 支持的文件系统写入分区文件。
+使用前，需要在工程里添加下面的依赖：
 
 {% highlight xml %}
 <dependency>
@@ -35,16 +34,11 @@ following dependency to your project:
 </dependency>
 {% endhighlight %}
 
-Note that the streaming connectors are currently not part of the binary
-distribution. See
-[here]({{site.baseurl}}/dev/projectsetup/dependencies.html)
-for information about how to package the program with the libraries for
-cluster execution.
+注意连接器目前还不是二进制发行版的一部分，添加依赖、打包配置以及集群运行信息请参考 [这里]({{site.baseurl}}/zh/dev/projectsetup/dependencies.html)。
 
-#### Bucketing File Sink
+#### 分桶文件 Sink
 
-The bucketing behaviour as well as the writing can be configured but we will get to that later.
-This is how you can create a bucketing sink which by default, sinks to rolling files that are split by time:
+关于分桶的配置我们后面会有讲述，这里先创建一个分桶 sink，默认情况下这个 sink 会将数据写入到按照时间切分的滚动文件中：
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
@@ -65,40 +59,30 @@ input.addSink(new BucketingSink[String]("/base/path"))
 </div>
 </div>
 
-The only required parameter is the base path where the buckets will be
-stored. The sink can be further configured by specifying a custom bucketer, writer and batch size.
-
-By default the bucketing sink will split by the current system time when elements arrive and will
-use the datetime pattern `"yyyy-MM-dd--HH"` to name the buckets. This pattern is passed to
-`DateTimeFormatter` with the current system time and JVM's default timezone to form a bucket path.
-Users can also specify a timezone for the bucketer to format bucket path. A new bucket will be created
-whenever a new date is encountered. For example, if you have a pattern that contains minutes as the
-finest granularity you will get a new bucket every minute. Each bucket is itself a directory that
-contains several part files: each parallel instance of the sink will create its own part file and
-when part files get too big the sink will also create a new part file next to the others. When a
-bucket becomes inactive, the open part file will be flushed and closed. A bucket is regarded as
-inactive when it hasn't been written to recently. By default, the sink checks for inactive buckets
-every minute, and closes any buckets which haven't been written to for over a minute. This
-behaviour can be configured with `setInactiveBucketCheckInterval()` and
-`setInactiveBucketThreshold()` on a `BucketingSink`.
-
-You can also specify a custom bucketer by using `setBucketer()` on a `BucketingSink`. If desired,
-the bucketer can use a property of the element or tuple to determine the bucket directory.
-
-The default writer is `StringWriter`. This will call `toString()` on the incoming elements
-and write them to part files, separated by newline. To specify a custom writer use `setWriter()`
-on a `BucketingSink`. If you want to write Hadoop SequenceFiles you can use the provided
-`SequenceFileWriter` which can also be configured to use compression.
-
-There are two configuration options that specify when a part file should be closed
-and a new one started:
+初始化时只需要一个参数，这个参数表示分桶文件存储的路径。分桶 sink 可以通过指定自定义的 bucketer、 writer 和 batch 值进一步配置。
+
+默认情况下，当数据到来时，分桶 sink 会按照系统时间对数据进行切分，并以 `"yyyy-MM-dd--HH"` 的时间格式给每个桶命名。然后 
+`DateTimeFormatter` 按照这个时间格式将当前系统时间以 JVM 默认时区转换成分桶的路径。用户可以自定义时区来生成
+分桶的路径。每遇到一个新的日期都会产生一个新的桶。例如，如果时间的格式以分钟为粒度，那么每分钟都会产生一个桶。每个桶都是一个目录，
+目录下包含了几个部分文件（part files）：每个 sink 的并发实例都会创建一个属于自己的部分文件，当这些文件太大的时候，sink 会产生新的部分文件。
+当一个桶不再活跃时，打开的部分文件会刷盘并且关闭。如果一个桶最近一段时间都没有写入，那么这个桶被认为是不活跃的。sink 默认会每分钟
+检查不活跃的桶、关闭那些超过一分钟没有写入的桶。这些行为可以通过 `BucketingSink` 的 `setInactiveBucketCheckInterval()` 
+和 `setInactiveBucketThreshold()` 进行设置。
+
+可以调用`BucketingSink` 的 `setBucketer()` 方法指定自定义的 bucketer，如果需要的话，也可以使用一个元素或者元组属性来决定桶的路径。
+
+默认的 writer 是 `StringWriter`。数据到达时，通过 `toString()` 方法得到内容，内容以换行符分隔，`StringWriter` 将数据
+内容写入部分文件。可以通过 `BucketingSink` 的 `setWriter()` 指定自定义的 writer。`SequenceFileWriter` 支持写入 Hadoop
+SequenceFiles，并且可以配置是否开启压缩。
+
+关闭部分文件和打开新部分文件的时机可以通过两个配置来确定：
  
-* By setting a batch size (The default part file size is 384 MB)
-* By setting a batch roll over time interval (The default roll over interval is `Long.MAX_VALUE`)
+* 设置文件大小（默认文件大小是384MB）
+* 设置文件滚动周期，单位是毫秒（默认滚动周期是 `Long.MAX_VALUE`）
 
-A new part file is started when either of these two conditions is satisfied.
+当上述两个条件中的任意一个被满足，都会生成一个新的部分文件。
 
-Example:
+示例:
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
@@ -133,17 +117,15 @@ input.addSink(sink)
 </div>
 </div>
 
-This will create a sink that writes to bucket files that follow this schema:
+上述代码会创建一个 sink，这个 sink 按下面的模式写入桶文件：
 
 {% highlight plain %}
 /base/path/{date-time}/part-{parallel-task}-{count}
 {% endhighlight %}
 
-Where `date-time` is the string that we get from the date/time format, `parallel-task` is the index
-of the parallel sink instance and `count` is the running number of part files that were created
-because of the batch size or batch roll over interval.
+`date-time` 是我们从日期/时间格式获得的字符串，`parallel-task` 是 sink 并发实例的索引，`count` 是因文件大小或者滚动周期而产生的
+文件的编号。
 
-For in-depth information, please refer to the JavaDoc for
-[BucketingSink](http://flink.apache.org/docs/latest/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html).
+更多信息，请参考 [BucketingSink](http://flink.apache.org/docs/latest/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html)。
 
 {% top %}