You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ja...@apache.org on 2019/07/03 02:08:10 UTC

[flink] branch master updated (686bc84 -> 29a0e8e)

This is an automated email from the ASF dual-hosted git repository.

jark pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git.


    from 686bc84  [hotfix][python] Remove the redundant 'python setup.py install' from tox.ini
     new 6bf2071  [FLINK-12945][docs-zh] Translate "RabbitMQ Connector" page into Chinese
     new bda485f  [FLINK-12938][docs-zh] Translate "Streaming Connectors" page into Chinese
     new 5caed71  [FLINK-12946][docs-zh] Translate "Apache NiFi Connector" page into Chinese
     new 345ac88  [FLINK-12943][docs-zh] Translate "HDFS Connector" page into Chinese
     new 29a0e8e  [FLINK-12944][docs-zh] Translate "Streaming File Sink" page into Chinese

The 5 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/dev/connectors/filesystem_sink.zh.md | 80 +++++++++++---------------
 docs/dev/connectors/index.zh.md           | 49 +++++++---------
 docs/dev/connectors/nifi.zh.md            | 43 ++++++--------
 docs/dev/connectors/rabbitmq.md           | 10 ++--
 docs/dev/connectors/rabbitmq.zh.md        | 73 +++++++++---------------
 docs/dev/connectors/streamfile_sink.zh.md | 93 ++++++++++++-------------------
 6 files changed, 135 insertions(+), 213 deletions(-)


[flink] 05/05: [FLINK-12944][docs-zh] Translate "Streaming File Sink" page into Chinese

Posted by ja...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jark pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 29a0e8e5e7d69b0450ac3865c25df0e5d75d758b
Author: aloys <lo...@gmail.com>
AuthorDate: Fri Jun 28 00:39:56 2019 +0800

    [FLINK-12944][docs-zh] Translate "Streaming File Sink" page into Chinese
    
    This closes #8918
---
 docs/dev/connectors/streamfile_sink.zh.md | 93 ++++++++++++-------------------
 1 file changed, 35 insertions(+), 58 deletions(-)

diff --git a/docs/dev/connectors/streamfile_sink.zh.md b/docs/dev/connectors/streamfile_sink.zh.md
index 40b104b..79586ad 100644
--- a/docs/dev/connectors/streamfile_sink.zh.md
+++ b/docs/dev/connectors/streamfile_sink.zh.md
@@ -23,30 +23,22 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-This connector provides a Sink that writes partitioned files to filesystems
-supported by the [Flink `FileSystem` abstraction]({{ site.baseurl}}/ops/filesystems/index.html).
+这个连接器提供了一个 Sink 来将分区文件写入到支持 [Flink `FileSystem`]({{ site.baseurl}}/zh/ops/filesystems/index.html) 接口的文件系统中。
 
-Since in streaming the input is potentially infinite, the streaming file sink writes data
-into buckets. The bucketing behaviour is configurable but a useful default is time-based
-bucketing where we start writing a new bucket every hour and thus get
-individual files that each contain a part of the infinite output stream.
+由于在流处理中输入可能是无限的,所以流处理的文件 sink 会将数据写入到桶中。如何分桶是可以配置的,一种有效的默认
+策略是基于时间的分桶,这种策略每个小时写入一个新的桶,这些桶各包含了无限输出流的一部分数据。
 
-Within a bucket, we further split the output into smaller part files based on a
-rolling policy. This is useful to prevent individual bucket files from getting
-too big. This is also configurable but the default policy rolls files based on
-file size and a timeout, *i.e* if no new data was written to a part file.
+在一个桶内部,会进一步将输出基于滚动策略切分成更小的文件。这有助于防止桶文件变得过大。滚动策略也是可以配置的,默认
+策略会根据文件大小和超时时间来滚动文件,超时时间是指没有新数据写入部分文件(part file)的时间。
 
-The `StreamingFileSink` supports both row-wise encoding formats and
-bulk-encoding formats, such as [Apache Parquet](http://parquet.apache.org).
+`StreamingFileSink` 支持行编码格式和批量编码格式,比如 [Apache Parquet](http://parquet.apache.org)。
 
-#### Using Row-encoded Output Formats
+#### 使用行编码输出格式
 
-The only required configuration are the base path where we want to output our
-data and an
-[Encoder]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/api/common/serialization/Encoder.html)
-that is used for serializing records to the `OutputStream` for each file.
+只需要配置一个输出路径和一个 [Encoder]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/api/common/serialization/Encoder.html)。
+Encoder负责为每个文件的 `OutputStream` 序列化数据。
 
-Basic usage thus looks like this:
+基本用法如下:
 
 
 <div class="codetabs" markdown="1">
@@ -84,55 +76,40 @@ input.addSink(sink)
 </div>
 </div>
 
-This will create a streaming sink that creates hourly buckets and uses a
-default rolling policy. The default bucket assigner is
+上面的代码创建了一个按小时分桶、按默认策略滚动的 sink。默认分桶器是
 [DateTimeBucketAssigner]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.html)
-and the default rolling policy is
-[DefaultRollingPolicy]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html).
-You can specify a custom
+,默认滚动策略是
+[DefaultRollingPolicy]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html)。
+可以为 sink builder 自定义
 [BucketAssigner]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/BucketAssigner.html)
-and
-[RollingPolicy]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy.html)
-on the sink builder. Please check out the JavaDoc for
-[StreamingFileSink]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/StreamingFileSink.html)
-for more configuration options and more documentation about the workings and
-interactions of bucket assigners and rolling policies.
+和
+[RollingPolicy]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy.html)。
+更多配置操作以及分桶器和滚动策略的工作机制和相互影响请参考:
+[StreamingFileSink]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/StreamingFileSink.html)。
 
-#### Using Bulk-encoded Output Formats
+#### 使用批量编码输出格式
 
-In the above example we used an `Encoder` that can encode or serialize each
-record individually. The streaming file sink also supports bulk-encoded output
-formats such as [Apache Parquet](http://parquet.apache.org). To use these,
-instead of `StreamingFileSink.forRowFormat()` you would use
-`StreamingFileSink.forBulkFormat()` and specify a `BulkWriter.Factory`.
+上面的示例使用 `Encoder` 分别序列化每一个记录。除此之外,流式文件 sink 还支持批量编码的输出格式,比如 [Apache Parquet](http://parquet.apache.org)。
+使用这种编码格式需要用 `StreamingFileSink.forBulkFormat()` 来代替 `StreamingFileSink.forRowFormat()` ,然后指定一个 `BulkWriter.Factory`。
 
 [ParquetAvroWriters]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/formats/parquet/avro/ParquetAvroWriters.html)
-has static methods for creating a `BulkWriter.Factory` for various types.
+中包含了为各种类型创建 `BulkWriter.Factory` 的静态方法。
 
 <div class="alert alert-info">
-    <b>IMPORTANT:</b> Bulk-encoding formats can only be combined with the
-    `OnCheckpointRollingPolicy`, which rolls the in-progress part file on
-    every checkpoint.
+    <b>重要:</b> 批量编码格式只能和 `OnCheckpointRollingPolicy` 结合使用,每次做 checkpoint 时滚动文件。
 </div>
 
-#### Important Considerations for S3
-
-<span class="label label-danger">Important Note 1</span>: For S3, the `StreamingFileSink`
-supports only the [Hadoop-based](https://hadoop.apache.org/) FileSystem implementation, not
-the implementation based on [Presto](https://prestodb.io/). In case your job uses the
-`StreamingFileSink` to write to S3 but you want to use the Presto-based one for checkpointing,
-it is advised to use explicitly *"s3a://"* (for Hadoop) as the scheme for the target path of
-the sink and *"s3p://"* for checkpointing (for Presto). Using *"s3://"* for both the sink
-and checkpointing may lead to unpredictable behavior, as both implementations "listen" to that scheme.
-
-<span class="label label-danger">Important Note 2</span>: To guarantee exactly-once semantics while
-being efficient, the `StreamingFileSink` uses the [Multi-part Upload](https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html)
-feature of S3 (MPU from now on). This feature allows to upload files in independent chunks (thus the "multi-part")
-which can be combined into the original file when all the parts of the MPU are successfully uploaded.
-For inactive MPUs, S3 supports a bucket lifecycle rule that the user can use to abort multipart uploads
-that don't complete within a specified number of days after being initiated. This implies that if you set this rule
-aggressively and take a savepoint with some part-files being not fully uploaded, their associated MPUs may time-out
-before the job is restarted. This will result in your job not being able to restore from that savepoint as the
-pending part-files are no longer there and Flink will fail with an exception as it tries to fetch them and fails.
+####  关于S3的重要内容
+
+<span class="label label-danger">重要提示 1</span>: 对于 S3,`StreamingFileSink`  只支持基于 [Hadoop](https://hadoop.apache.org/) 
+的文件系统实现,不支持基于 [Presto](https://prestodb.io/) 的实现。如果想使用 `StreamingFileSink` 向 S3 写入数据并且将 
+checkpoint 放在基于 Presto 的文件系统,建议明确指定 *"s3a://"* (for Hadoop)作为sink的目标路径方案,并且为 checkpoint 路径明确指定 *"s3p://"* (for Presto)。
+如果 Sink 和 checkpoint 都使用 *"s3://"* 路径的话,可能会导致不可预知的行为,因为双方的实现都在“监听”这个路径。
+
+<span class="label label-danger">重要提示 2</span>: `StreamingFileSink` 使用 S3 的 [Multi-part Upload](https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html)
+(后续使用MPU代替)特性可以保证精确一次的语义。这个特性支持以独立的块(因此被称为"multi-part")模式上传文件,当 MPU 的所有部分文件
+成功上传之后,可以合并成原始文件。对于失效的 MPUs,S3 提供了一个基于桶生命周期的规则,用户可以用这个规则来丢弃在指定时间内未完成的MPU。
+如果在一些部分文件还未上传时触发 savepoint,并且这个规则设置的比较严格,这意味着相关的 MPU在作业重启之前可能会超时。后续的部分文件没
+有写入到 savepoint, 那么在 Flink 作业从 savepoint 恢复时,会因为拿不到缺失的部分文件,导致任务失败并抛出异常。
 
 {% top %}


[flink] 03/05: [FLINK-12946][docs-zh] Translate "Apache NiFi Connector" page into Chinese

Posted by ja...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jark pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 5caed717570e14e0d45c49eccf409c3170034af1
Author: aloys <lo...@gmail.com>
AuthorDate: Sun Jun 23 00:54:09 2019 +0800

    [FLINK-12946][docs-zh] Translate "Apache NiFi Connector" page into Chinese
    
    This closes #8838
---
 docs/dev/connectors/nifi.zh.md | 43 +++++++++++++++++-------------------------
 1 file changed, 17 insertions(+), 26 deletions(-)

diff --git a/docs/dev/connectors/nifi.zh.md b/docs/dev/connectors/nifi.zh.md
index 97fd831..114092f 100644
--- a/docs/dev/connectors/nifi.zh.md
+++ b/docs/dev/connectors/nifi.zh.md
@@ -1,5 +1,5 @@
 ---
-title: "Apache NiFi Connector"
+title: "Apache NiFi 连接器"
 nav-title: NiFi
 nav-parent_id: connectors
 nav-pos: 7
@@ -23,9 +23,8 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-This connector provides a Source and Sink that can read from and write to
-[Apache NiFi](https://nifi.apache.org/). To use this connector, add the
-following dependency to your project:
+[Apache NiFi](https://nifi.apache.org/) 连接器提供了可以读取和写入的 Source 和 Sink。
+使用这个连接器,需要在工程中添加下面的依赖:
 
 {% highlight xml %}
 <dependency>
@@ -35,30 +34,23 @@ following dependency to your project:
 </dependency>
 {% endhighlight %}
 
-Note that the streaming connectors are currently not part of the binary
-distribution. See
-[here]({{site.baseurl}}/dev/projectsetup/dependencies.html)
-for information about how to package the program with the libraries for
-cluster execution.
+注意这些连接器目前还没有包含在二进制发行版中。添加依赖、打包配置以及集群运行的相关信息请参考 [这里]({{site.baseurl}}/zh/dev/projectsetup/dependencies.html)。
 
-#### Installing Apache NiFi
+#### 安装 Apache NiFi
 
-Instructions for setting up a Apache NiFi cluster can be found
-[here](https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#how-to-install-and-start-nifi).
+安装 Apache NiFi 集群请参考 [这里](https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#how-to-install-and-start-nifi)。
 
 #### Apache NiFi Source
 
-The connector provides a Source for reading data from Apache NiFi to Apache Flink.
+该连接器提供了一个 Source 可以用来从 Apache NiFi 读取数据到 Apache Flink。
 
-The class `NiFiSource(…)` provides 2 constructors for reading data from NiFi.
+`NiFiSource(…)` 类有两个构造方法。
 
-- `NiFiSource(SiteToSiteConfig config)` - Constructs a `NiFiSource(…)` given the client's SiteToSiteConfig and a
-     default wait time of 1000 ms.
+- `NiFiSource(SiteToSiteConfig config)` - 构造一个 `NiFiSource(…)` ,需要指定参数 SiteToSiteConfig ,采用默认的等待时间 1000 ms。
 
-- `NiFiSource(SiteToSiteConfig config, long waitTimeMs)` - Constructs a `NiFiSource(…)` given the client's
-     SiteToSiteConfig and the specified wait time (in milliseconds).
+- `NiFiSource(SiteToSiteConfig config, long waitTimeMs)` - 构造一个 `NiFiSource(…)`,需要指定参数 SiteToSiteConfig 和等待时间(单位为毫秒)。
 
-Example:
+示例:
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
@@ -89,18 +81,17 @@ val nifiSource = new NiFiSource(clientConfig)
 </div>
 </div>
 
-Here data is read from the Apache NiFi Output Port called "Data for Flink" which is part of Apache NiFi
-Site-to-site protocol configuration.
+数据从 Apache NiFi Output Port 读取,Apache NiFi Output Port 也被称为 "Data for Flink",是 Apache NiFi Site-to-site 协议配置的一部分。
 
 #### Apache NiFi Sink
 
-The connector provides a Sink for writing data from Apache Flink to Apache NiFi.
+该连接器提供了一个 Sink 可以用来把 Apache Flink 的数据写入到 Apache NiFi。
 
-The class `NiFiSink(…)` provides a constructor for instantiating a `NiFiSink`.
+`NiFiSink(…)` 类只有一个构造方法。
 
-- `NiFiSink(SiteToSiteClientConfig, NiFiDataPacketBuilder<T>)` constructs a `NiFiSink(…)` given the client's `SiteToSiteConfig` and a `NiFiDataPacketBuilder` that converts data from Flink to `NiFiDataPacket` to be ingested by NiFi.
+- `NiFiSink(SiteToSiteClientConfig, NiFiDataPacketBuilder<T>)` 构造一个 `NiFiSink(…)`,需要指定 `SiteToSiteConfig` 和  `NiFiDataPacketBuilder` 参数 ,`NiFiDataPacketBuilder` 可以将Flink数据转化成可以被NiFi识别的 `NiFiDataPacket`.
 
-Example:
+示例:
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
@@ -135,6 +126,6 @@ streamExecEnv.addSink(nifiSink)
 </div>
 </div>      
 
-More information about [Apache NiFi](https://nifi.apache.org) Site-to-Site Protocol can be found [here](https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site)
+更多关于 [Apache NiFi](https://nifi.apache.org) Site-to-Site Protocol 的信息请参考 [这里](https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site)。
 
 {% top %}


[flink] 04/05: [FLINK-12943][docs-zh] Translate "HDFS Connector" page into Chinese

Posted by ja...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jark pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 345ac8868b705b7c9be9bb70e6fb54d1d15baa9b
Author: aloys <lo...@gmail.com>
AuthorDate: Wed Jun 26 01:22:22 2019 +0800

    [FLINK-12943][docs-zh] Translate "HDFS Connector" page into Chinese
    
    This closes #8897
---
 docs/dev/connectors/filesystem_sink.zh.md | 80 ++++++++++++-------------------
 1 file changed, 31 insertions(+), 49 deletions(-)

diff --git a/docs/dev/connectors/filesystem_sink.zh.md b/docs/dev/connectors/filesystem_sink.zh.md
index f9a828d..54b0c64 100644
--- a/docs/dev/connectors/filesystem_sink.zh.md
+++ b/docs/dev/connectors/filesystem_sink.zh.md
@@ -1,5 +1,5 @@
 ---
-title: "HDFS Connector"
+title: "HDFS 连接器"
 nav-title: Rolling File Sink
 nav-parent_id: connectors
 nav-pos: 5
@@ -23,9 +23,8 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-This connector provides a Sink that writes partitioned files to any filesystem supported by
-[Hadoop FileSystem](http://hadoop.apache.org). To use this connector, add the
-following dependency to your project:
+这个连接器可以向所有 [Hadoop FileSystem](http://hadoop.apache.org) 支持的文件系统写入分区文件。
+使用前,需要在工程里添加下面的依赖:
 
 {% highlight xml %}
 <dependency>
@@ -35,16 +34,11 @@ following dependency to your project:
 </dependency>
 {% endhighlight %}
 
-Note that the streaming connectors are currently not part of the binary
-distribution. See
-[here]({{site.baseurl}}/dev/projectsetup/dependencies.html)
-for information about how to package the program with the libraries for
-cluster execution.
+注意连接器目前还不是二进制发行版的一部分,添加依赖、打包配置以及集群运行信息请参考 [这里]({{site.baseurl}}/zh/dev/projectsetup/dependencies.html)。
 
-#### Bucketing File Sink
+#### 分桶文件 Sink
 
-The bucketing behaviour as well as the writing can be configured but we will get to that later.
-This is how you can create a bucketing sink which by default, sinks to rolling files that are split by time:
+关于分桶的配置我们后面会有讲述,这里先创建一个分桶 sink,默认情况下这个 sink 会将数据写入到按照时间切分的滚动文件中:
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
@@ -65,40 +59,30 @@ input.addSink(new BucketingSink[String]("/base/path"))
 </div>
 </div>
 
-The only required parameter is the base path where the buckets will be
-stored. The sink can be further configured by specifying a custom bucketer, writer and batch size.
-
-By default the bucketing sink will split by the current system time when elements arrive and will
-use the datetime pattern `"yyyy-MM-dd--HH"` to name the buckets. This pattern is passed to
-`DateTimeFormatter` with the current system time and JVM's default timezone to form a bucket path.
-Users can also specify a timezone for the bucketer to format bucket path. A new bucket will be created
-whenever a new date is encountered. For example, if you have a pattern that contains minutes as the
-finest granularity you will get a new bucket every minute. Each bucket is itself a directory that
-contains several part files: each parallel instance of the sink will create its own part file and
-when part files get too big the sink will also create a new part file next to the others. When a
-bucket becomes inactive, the open part file will be flushed and closed. A bucket is regarded as
-inactive when it hasn't been written to recently. By default, the sink checks for inactive buckets
-every minute, and closes any buckets which haven't been written to for over a minute. This
-behaviour can be configured with `setInactiveBucketCheckInterval()` and
-`setInactiveBucketThreshold()` on a `BucketingSink`.
-
-You can also specify a custom bucketer by using `setBucketer()` on a `BucketingSink`. If desired,
-the bucketer can use a property of the element or tuple to determine the bucket directory.
-
-The default writer is `StringWriter`. This will call `toString()` on the incoming elements
-and write them to part files, separated by newline. To specify a custom writer use `setWriter()`
-on a `BucketingSink`. If you want to write Hadoop SequenceFiles you can use the provided
-`SequenceFileWriter` which can also be configured to use compression.
-
-There are two configuration options that specify when a part file should be closed
-and a new one started:
+初始化时只需要一个参数,这个参数表示分桶文件存储的路径。分桶 sink 可以通过指定自定义的 bucketer、 writer 和 batch 值进一步配置。
+
+默认情况下,当数据到来时,分桶 sink 会按照系统时间对数据进行切分,并以 `"yyyy-MM-dd--HH"` 的时间格式给每个桶命名。然后 
+`DateTimeFormatter` 按照这个时间格式将当前系统时间以 JVM 默认时区转换成分桶的路径。用户可以自定义时区来生成
+分桶的路径。每遇到一个新的日期都会产生一个新的桶。例如,如果时间的格式以分钟为粒度,那么每分钟都会产生一个桶。每个桶都是一个目录,
+目录下包含了几个部分文件(part files):每个 sink 的并发实例都会创建一个属于自己的部分文件,当这些文件太大的时候,sink 会产生新的部分文件。
+当一个桶不再活跃时,打开的部分文件会刷盘并且关闭。如果一个桶最近一段时间都没有写入,那么这个桶被认为是不活跃的。sink 默认会每分钟
+检查不活跃的桶、关闭那些超过一分钟没有写入的桶。这些行为可以通过 `BucketingSink` 的 `setInactiveBucketCheckInterval()` 
+和 `setInactiveBucketThreshold()` 进行设置。
+
+可以调用`BucketingSink` 的 `setBucketer()` 方法指定自定义的 bucketer,如果需要的话,也可以使用一个元素或者元组属性来决定桶的路径。
+
+默认的 writer 是 `StringWriter`。数据到达时,通过 `toString()` 方法得到内容,内容以换行符分隔,`StringWriter` 将数据
+内容写入部分文件。可以通过 `BucketingSink` 的 `setWriter()` 指定自定义的 writer。`SequenceFileWriter` 支持写入 Hadoop
+SequenceFiles,并且可以配置是否开启压缩。
+
+关闭部分文件和打开新部分文件的时机可以通过两个配置来确定:
  
-* By setting a batch size (The default part file size is 384 MB)
-* By setting a batch roll over time interval (The default roll over interval is `Long.MAX_VALUE`)
+* 设置文件大小(默认文件大小是384MB)
+* 设置文件滚动周期,单位是毫秒(默认滚动周期是 `Long.MAX_VALUE`)
 
-A new part file is started when either of these two conditions is satisfied.
+当上述两个条件中的任意一个被满足,都会生成一个新的部分文件。
 
-Example:
+示例:
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
@@ -133,17 +117,15 @@ input.addSink(sink)
 </div>
 </div>
 
-This will create a sink that writes to bucket files that follow this schema:
+上述代码会创建一个 sink,这个 sink 按下面的模式写入桶文件:
 
 {% highlight plain %}
 /base/path/{date-time}/part-{parallel-task}-{count}
 {% endhighlight %}
 
-Where `date-time` is the string that we get from the date/time format, `parallel-task` is the index
-of the parallel sink instance and `count` is the running number of part files that were created
-because of the batch size or batch roll over interval.
+`date-time` 是我们从日期/时间格式获得的字符串,`parallel-task` 是 sink 并发实例的索引,`count` 是因文件大小或者滚动周期而产生的
+文件的编号。
 
-For in-depth information, please refer to the JavaDoc for
-[BucketingSink](http://flink.apache.org/docs/latest/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html).
+更多信息,请参考 [BucketingSink](http://flink.apache.org/docs/latest/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html)。
 
 {% top %}


[flink] 01/05: [FLINK-12945][docs-zh] Translate "RabbitMQ Connector" page into Chinese

Posted by ja...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jark pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 6bf207166a5ca09c259d854989ccff18687745bc
Author: aloys <lo...@gmail.com>
AuthorDate: Sun Jun 23 20:19:26 2019 +0800

    [FLINK-12945][docs-zh] Translate "RabbitMQ Connector" page into Chinese
    
    This closes #8843
---
 docs/dev/connectors/rabbitmq.md    | 10 +++---
 docs/dev/connectors/rabbitmq.zh.md | 73 ++++++++++++++------------------------
 2 files changed, 32 insertions(+), 51 deletions(-)

diff --git a/docs/dev/connectors/rabbitmq.md b/docs/dev/connectors/rabbitmq.md
index 838db2a..a3a56a8 100644
--- a/docs/dev/connectors/rabbitmq.md
+++ b/docs/dev/connectors/rabbitmq.md
@@ -23,7 +23,7 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# License of the RabbitMQ Connector
+## License of the RabbitMQ Connector
 
 Flink's RabbitMQ connector defines a Maven dependency on the
 "RabbitMQ AMQP Java Client", is triple-licensed under the Mozilla Public License 1.1 ("MPL"), the GNU General Public License version 2 ("GPL") and the Apache License version 2 ("ASL").
@@ -35,7 +35,7 @@ Users that create and publish derivative work based on Flink's
 RabbitMQ connector (thereby re-distributing the "RabbitMQ AMQP Java Client")
 must be aware that this may be subject to conditions declared in the Mozilla Public License 1.1 ("MPL"), the GNU General Public License version 2 ("GPL") and the Apache License version 2 ("ASL").
 
-# RabbitMQ Connector
+## RabbitMQ Connector
 
 This connector provides access to data streams from [RabbitMQ](http://www.rabbitmq.com/). To use this connector, add the following dependency to your project:
 
@@ -49,10 +49,10 @@ This connector provides access to data streams from [RabbitMQ](http://www.rabbit
 
 Note that the streaming connectors are currently not part of the binary distribution. See linking with them for cluster execution [here]({{site.baseurl}}/dev/projectsetup/dependencies.html).
 
-#### Installing RabbitMQ
+### Installing RabbitMQ
 Follow the instructions from the [RabbitMQ download page](http://www.rabbitmq.com/download.html). After the installation the server automatically starts, and the application connecting to RabbitMQ can be launched.
 
-#### RabbitMQ Source
+### RabbitMQ Source
 
 This connector provides a `RMQSource` class to consume messages from a RabbitMQ
 queue. This source provides three different levels of guarantees, depending
@@ -131,7 +131,7 @@ val stream = env
 </div>
 </div>
 
-#### RabbitMQ Sink
+### RabbitMQ Sink
 This connector provides a `RMQSink` class for sending messages to a RabbitMQ
 queue. Below is a code example for setting up a RabbitMQ sink.
 
diff --git a/docs/dev/connectors/rabbitmq.zh.md b/docs/dev/connectors/rabbitmq.zh.md
index 838db2a..e213d3f 100644
--- a/docs/dev/connectors/rabbitmq.zh.md
+++ b/docs/dev/connectors/rabbitmq.zh.md
@@ -1,5 +1,5 @@
 ---
-title: "RabbitMQ Connector"
+title: "RabbitMQ 连接器"
 nav-title: RabbitMQ
 nav-parent_id: connectors
 nav-pos: 6
@@ -23,21 +23,17 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# License of the RabbitMQ Connector
+## RabbitMQ 连接器的许可证
 
-Flink's RabbitMQ connector defines a Maven dependency on the
-"RabbitMQ AMQP Java Client", is triple-licensed under the Mozilla Public License 1.1 ("MPL"), the GNU General Public License version 2 ("GPL") and the Apache License version 2 ("ASL").
+Flink 的 RabbitMQ 连接器依赖了 "RabbitMQ AMQP Java Client",它基于三种协议下发行:Mozilla Public License 1.1 ("MPL")、GNU General Public License version 2 ("GPL") 和 Apache License version 2 ("ASL")。
 
-Flink itself neither reuses source code from the "RabbitMQ AMQP Java Client"
-nor packages binaries from the "RabbitMQ AMQP Java Client".
+Flink 自身既没有复用 "RabbitMQ AMQP Java Client" 的代码,也没有将 "RabbitMQ AMQP Java Client" 打二进制包。
 
-Users that create and publish derivative work based on Flink's
-RabbitMQ connector (thereby re-distributing the "RabbitMQ AMQP Java Client")
-must be aware that this may be subject to conditions declared in the Mozilla Public License 1.1 ("MPL"), the GNU General Public License version 2 ("GPL") and the Apache License version 2 ("ASL").
+如果用户发布的内容是基于 Flink 的 RabbitMQ 连接器的(进而重新发布了 "RabbitMQ AMQP Java Client" ),那么一定要注意这可能会受到 Mozilla Public License 1.1 ("MPL")、GNU General Public License version 2 ("GPL")、Apache License version 2 ("ASL") 协议的限制.
 
-# RabbitMQ Connector
+## RabbitMQ 连接器
 
-This connector provides access to data streams from [RabbitMQ](http://www.rabbitmq.com/). To use this connector, add the following dependency to your project:
+这个连接器可以访问 [RabbitMQ](http://www.rabbitmq.com/) 的数据流。使用这个连接器,需要在工程里添加下面的依赖:
 
 {% highlight xml %}
 <dependency>
@@ -47,44 +43,30 @@ This connector provides access to data streams from [RabbitMQ](http://www.rabbit
 </dependency>
 {% endhighlight %}
 
-Note that the streaming connectors are currently not part of the binary distribution. See linking with them for cluster execution [here]({{site.baseurl}}/dev/projectsetup/dependencies.html).
+注意连接器现在没有包含在二进制发行版中。集群执行的相关信息请参考 [这里]({{site.baseurl}}/zh/dev/projectsetup/dependencies.html).
 
-#### Installing RabbitMQ
-Follow the instructions from the [RabbitMQ download page](http://www.rabbitmq.com/download.html). After the installation the server automatically starts, and the application connecting to RabbitMQ can be launched.
+### 安装 RabbitMQ
+安装 RabbitMQ 请参考 [RabbitMQ 下载页面](http://www.rabbitmq.com/download.html)。安装完成之后,服务会自动拉起,应用程序就可以尝试连接到 RabbitMQ 了。
 
-#### RabbitMQ Source
+### RabbitMQ Source
 
-This connector provides a `RMQSource` class to consume messages from a RabbitMQ
-queue. This source provides three different levels of guarantees, depending
-on how it is configured with Flink:
+`RMQSource` 负责从 RabbitMQ 中消费数据,可以配置三种不同级别的保证:
 
-1. **Exactly-once**: In order to achieve exactly-once guarantees with the
-RabbitMQ source, the following is required -
- - *Enable checkpointing*: With checkpointing enabled, messages are only
- acknowledged (hence, removed from the RabbitMQ queue) when checkpoints
- are completed.
- - *Use correlation ids*: Correlation ids are a RabbitMQ application feature.
- You have to set it in the message properties when injecting messages into RabbitMQ.
- The correlation id is used by the source to deduplicate any messages that
- have been reprocessed when restoring from a checkpoint.
- - *Non-parallel source*: The source must be non-parallel (parallelism set
- to 1) in order to achieve exactly-once. This limitation is mainly due to
- RabbitMQ's approach to dispatching messages from a single queue to multiple
- consumers.
+1. **精确一次**: 保证精确一次需要以下条件 -
+ - *开启 checkpointing*: 开启 checkpointing 之后,消息在 checkpoints 
+ 完成之后才会被确认(然后从 RabbitMQ 队列中删除).
+ - *使用关联标识(Correlation ids)*: 关联标识是 RabbitMQ 的一个特性,消息写入 RabbitMQ 时在消息属性中设置。
+ 从 checkpoint 恢复时有些消息可能会被重复处理,source 可以利用关联标识对消息进行去重。
+ - *非并发 source*: 为了保证精确一次的数据投递,source 必须是非并发的(并行度设置为1)。
+  这主要是由于 RabbitMQ 分发数据时是从单队列向多个消费者投递消息的。
 
+2. **至少一次**:  在 checkpointing 开启的条件下,如果没有使用关联标识或者 source 是并发的,
+那么 source 就只能提供至少一次的保证。
 
-2. **At-least-once**: When checkpointing is enabled, but correlation ids
-are not used or the source is parallel, the source only provides at-least-once
-guarantees.
+3. **无任何保证**: 如果没有开启 checkpointing,source 就不能提供任何的数据投递保证。
+使用这种设置时,source 一旦接收到并处理消息,消息就会被自动确认。
 
-3. **No guarantee**: If checkpointing isn't enabled, the source does not
-have any strong delivery guarantees. Under this setting, instead of
-collaborating with Flink's checkpointing, messages will be automatically
-acknowledged once the source receives and processes them.
-
-Below is a code example for setting up an exactly-once RabbitMQ source.
-Inline comments explain which parts of the configuration can be ignored
-for more relaxed guarantees.
+下面是一个保证 exactly-once 的 RabbitMQ source 示例。 注释部分展示了更加宽松的保证应该如何配置。
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
@@ -131,9 +113,8 @@ val stream = env
 </div>
 </div>
 
-#### RabbitMQ Sink
-This connector provides a `RMQSink` class for sending messages to a RabbitMQ
-queue. Below is a code example for setting up a RabbitMQ sink.
+### RabbitMQ Sink
+该连接器提供了一个 `RMQSink` 类,用来向 RabbitMQ 队列发送数据。下面是设置 RabbitMQ sink 的代码示例:
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
@@ -170,6 +151,6 @@ stream.addSink(new RMQSink[String](
 </div>
 </div>
 
-More about RabbitMQ can be found [here](http://www.rabbitmq.com/).
+更多关于 RabbitMQ 的信息请参考 [这里](http://www.rabbitmq.com/).
 
 {% top %}


[flink] 02/05: [FLINK-12938][docs-zh] Translate "Streaming Connectors" page into Chinese

Posted by ja...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jark pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git

commit bda485fc077453ff0a555c3f25e702bfd0a1f339
Author: aloys <lo...@gmail.com>
AuthorDate: Thu Jun 27 16:17:40 2019 +0800

    [FLINK-12938][docs-zh] Translate "Streaming Connectors" page into Chinese
    
    This closes #8837
---
 docs/dev/connectors/index.zh.md | 49 +++++++++++++++++------------------------
 1 file changed, 20 insertions(+), 29 deletions(-)

diff --git a/docs/dev/connectors/index.zh.md b/docs/dev/connectors/index.zh.md
index b5405d4..aabcdbd 100644
--- a/docs/dev/connectors/index.zh.md
+++ b/docs/dev/connectors/index.zh.md
@@ -1,5 +1,5 @@
 ---
-title: "Streaming Connectors"
+title: "流式连接器"
 nav-id: connectors
 nav-title: Connectors
 nav-parent_id: streaming
@@ -28,16 +28,15 @@ under the License.
 * toc
 {:toc}
 
-## Predefined Sources and Sinks
+## 预定义的 Source 和 Sink
 
-A few basic data sources and sinks are built into Flink and are always available.
-The [predefined data sources]({{ site.baseurl }}/dev/datastream_api.html#data-sources) include reading from files, directories, and sockets, and
-ingesting data from collections and iterators.
-The [predefined data sinks]({{ site.baseurl }}/dev/datastream_api.html#data-sinks) support writing to files, to stdout and stderr, and to sockets.
+一些比较基本的 Source 和 Sink 已经内置在 Flink 里。
+[预定义 data sources]({{ site.baseurl }}/zh/dev/datastream_api.html#data-sources) 支持从文件、目录、socket,以及 collections 和 iterators 中读取数据。
+[预定义 data sinks]({{ site.baseurl }}/zh/dev/datastream_api.html#data-sinks) 支持把数据写入文件、标准输出(stdout)、标准错误输出(stderr)和 socket。
 
-## Bundled Connectors
+## 附带的连接器
 
-Connectors provide code for interfacing with various third-party systems. Currently these systems are supported:
+连接器可以和多种多样的第三方系统进行交互。目前支持以下系统:
 
  * [Apache Kafka](kafka.html) (source/sink)
  * [Apache Cassandra](cassandra.html) (sink)
@@ -48,15 +47,13 @@ Connectors provide code for interfacing with various third-party systems. Curren
  * [Apache NiFi](nifi.html) (source/sink)
  * [Twitter Streaming API](twitter.html) (source)
 
-Keep in mind that to use one of these connectors in an application, additional third party
-components are usually required, e.g. servers for the data stores or message queues.
-Note also that while the streaming connectors listed in this section are part of the
-Flink project and are included in source releases, they are not included in the binary distributions. 
-Further instructions can be found in the corresponding subsections.
+请记住,在使用一种连接器时,通常需要额外的第三方组件,比如:数据存储服务器或者消息队列。
+要注意这些列举的连接器是 Flink 工程的一部分,包含在发布的源码中,但是不包含在二进制发行版中。
+更多说明可以参考对应的子部分。
 
-## Connectors in Apache Bahir
+## Apache Bahir 中的连接器
 
-Additional streaming connectors for Flink are being released through [Apache Bahir](https://bahir.apache.org/), including:
+Flink 还有些一些额外的连接器通过 [Apache Bahir](https://bahir.apache.org/) 发布, 包括:
 
  * [Apache ActiveMQ](https://bahir.apache.org/docs/flink/current/flink-streaming-activemq/) (source/sink)
  * [Apache Flume](https://bahir.apache.org/docs/flink/current/flink-streaming-flume/) (sink)
@@ -64,23 +61,17 @@ Additional streaming connectors for Flink are being released through [Apache Bah
  * [Akka](https://bahir.apache.org/docs/flink/current/flink-streaming-akka/) (sink)
  * [Netty](https://bahir.apache.org/docs/flink/current/flink-streaming-netty/) (source)
 
-## Other Ways to Connect to Flink
+## 连接Fink的其他方法
 
-### Data Enrichment via Async I/O
+### 异步 I/O
 
-Using a connector isn't the only way to get data in and out of Flink.
-One common pattern is to query an external database or web service in a `Map` or `FlatMap`
-in order to enrich the primary datastream.
-Flink offers an API for [Asynchronous I/O]({{ site.baseurl }}/dev/stream/operators/asyncio.html)
-to make it easier to do this kind of enrichment efficiently and robustly.
+使用connector并不是唯一可以使数据进入或者流出Flink的方式。
+一种常见的模式是从外部数据库或者 Web 服务查询数据得到初始数据流,然后通过 `Map` 或者 `FlatMap` 对初始数据流进行丰富和增强。
+Flink 提供了[异步 I/O]({{ site.baseurl }}/zh/dev/stream/operators/asyncio.html) API 来让这个过程更加简单、高效和稳定。
 
-### Queryable State
+### 可查询状态
 
-When a Flink application pushes a lot of data to an external data store, this
-can become an I/O bottleneck.
-If the data involved has many fewer reads than writes, a better approach can be
-for an external application to pull from Flink the data it needs.
-The [Queryable State]({{ site.baseurl }}/dev/stream/state/queryable_state.html) interface
-enables this by allowing the state being managed by Flink to be queried on demand.
+当 Flink 应用程序需要向外部存储推送大量数据时会导致 I/O 瓶颈问题出现。在这种场景下,如果对数据的读操作远少于写操作,那么让外部应用从 Flink 拉取所需的数据会是一种更好的方式。
+[可查询状态]({{ site.baseurl }}/zh/dev/stream/state/queryable_state.html) 接口可以实现这个功能,该接口允许被 Flink 托管的状态可以被按需查询。
 
 {% top %}