You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ga...@apache.org on 2022/01/04 10:28:09 UTC

[flink] branch master updated: [FLINK-16152][doc-zh] Translate dev/datastream/operators/overview.md into Chinese

This is an automated email from the ASF dual-hosted git repository.

gaoyunhaii pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/master by this push:
     new 5cf0c0f  [FLINK-16152][doc-zh] Translate dev/datastream/operators/overview.md into Chinese
5cf0c0f is described below

commit 5cf0c0f3b8114ca5ce73a409b2024ff4ce1ed3a4
Author: hapihu <w....@qq.com>
AuthorDate: Thu Aug 19 02:47:34 2021 +0800

    [FLINK-16152][doc-zh] Translate dev/datastream/operators/overview.md into Chinese
    
    This closes #16889.
---
 .../docs/dev/datastream/operators/overview.md      | 106 ++++++++++-----------
 1 file changed, 53 insertions(+), 53 deletions(-)

diff --git a/docs/content.zh/docs/dev/datastream/operators/overview.md b/docs/content.zh/docs/dev/datastream/operators/overview.md
index 5b80633..a3b5c8e 100644
--- a/docs/content.zh/docs/dev/datastream/operators/overview.md
+++ b/docs/content.zh/docs/dev/datastream/operators/overview.md
@@ -28,14 +28,14 @@ under the License.
 
 用户通过算子能将一个或多个 DataStream 转换成新的 DataStream,在应用程序中可以将多个数据转换算子合并成一个复杂的数据流拓扑。
 
-这部分内容将描述 Flink DataStream API 中基本的数据转换API,数据转换后各种数据分区方式,以及算子的链接策略。
+这部分内容将描述 Flink DataStream API 中基本的数据转换 API,数据转换后各种数据分区方式,以及算子的链接策略。
 
 ## 数据流转换
 
 ### Map
 #### DataStream &rarr; DataStream
 
-Takes one element and produces one element. A map function that doubles the values of the input stream:
+输入一个元素同时输出一个元素。下面是将输入流中元素数值加倍的 map function:
 
 {{< tabs mapfunc >}}
 {{< tab "Java">}}
@@ -65,7 +65,7 @@ data_stream.map(lambda x: 2 * x, output_type=Types.INT())
 ### FlatMap
 #### DataStream &rarr; DataStream
 
-Takes one element and produces zero, one, or more elements. A flatmap function that splits sentences to words:
+输入一个元素同时产生零个、一个或多个元素。下面是将句子拆分为单词的 flatmap function:
 
 {{< tabs flatmapfunc >}}
 {{< tab "Java">}}
@@ -97,7 +97,7 @@ data_stream.flat_map(lambda x: x.split(' '), output_type=Types.STRING())
 ### Filter
 #### DataStream &rarr; DataStream
 
-Evaluates a boolean function for each element and retains those for which the function returns true. A filter that filters out zero values: 
+为每个元素执行一个布尔 function,并保留那些 function 输出值为 true 的元素。下面是过滤掉零值的 filter:
 
 {{< tabs filterfunc >}}
 {{< tab "Java">}}
@@ -126,7 +126,7 @@ data_stream.filter(lambda x: x != 0)
 ### KeyBy
 #### DataStream &rarr; KeyedStream
 
-Logically partitions a stream into disjoint partitions. All records with the same key are assigned to the same partition. Internally, _keyBy()_ is implemented with hash partitioning. There are different ways to [specify keys]({{< ref "docs/dev/datastream/fault-tolerance/state" >}}#keyed-datastream).
+在逻辑上将流划分为不相交的分区。具有相同 key 的记录都分配到同一个分区。在内部, _keyBy()_  是通过哈希分区实现的。有多种[指定 key ]({{< ref "docs/dev/datastream/fault-tolerance/state" >}}#keyed-datastream)的方式。
 
 {{< tabs keybyfunc >}}
 {{< tab "Java">}}
@@ -150,18 +150,18 @@ data_stream.key_by(lambda x: x[1], key_type=Types.STRING()) // Key by the result
 {{< /tabs>}}
 
 {{< hint warning >}}
-A type **cannot be a key if**:
+以下情况,一个类**不能作为 key**:
 
-1. it is a POJO type but does not override the `hashCode()` method and relies on the `Object.hashCode()` implementation.
-2. it is an array of any type.
+1. 它是一种 POJO 类,但没有重写 hashCode() 方法而是依赖于 Object.hashCode() 实现。
+2. 它是任意类的数组。
 {{< /hint >}}
 
 ### Reduce
 #### KeyedStream &rarr; DataStream
 
-A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value.
+在相同 key 的数据流上“滚动”执行 reduce。将当前元素与最后一次 reduce 得到的值组合然后输出新值。
 
-A reduce function that creates a stream of partial sums:
+下面是创建局部求和流的 reduce function:
 
 {{< tabs globalreduce >}}
 {{< tab "Java">}}
@@ -191,8 +191,7 @@ data_stream.key_by(lambda x: x[1]).reduce(lambda a, b: (a[0] + b[0], b[1]))
 ### Window
 #### KeyedStream &rarr; WindowedStream
 
-Windows can be defined on already partitioned KeyedStreams. Windows group the data in each key according to some characteristic (e.g., the data that arrived within the last 5 seconds).
-See [windows]({{< ref "docs/dev/datastream/operators/windows" >}}) for a complete description of windows.
+可以在已经分区的 KeyedStreams 上定义 Window。Window 根据某些特征(例如,最近 5 秒内到达的数据)对每个 key Stream 中的数据进行分组。请参阅 [windows]({{< ref "docs/dev/datastream/operators/windows" >}}) 获取有关 window 的完整说明。
 
 {{< tabs window >}}
 {{< tab "Java">}}
@@ -210,17 +209,17 @@ dataStream
 ```
 {{< /tab >}}
 {{< tab "Python" >}}
-This feature is not yet supported in Python
+Python 中尚不支持此特性。
 {{< /tab >}}
 {{< /tabs>}}
 
 ### WindowAll
 #### DataStream &rarr; AllWindowedStream
 
-Windows can be defined on regular DataStreams. Windows group all the stream events according to some characteristic (e.g., the data that arrived within the last 5 seconds). See [windows]({{< ref "docs/dev/datastream/operators/windows" >}}) for a complete description of windows.
+可以在普通 DataStream 上定义 Window。 Window 根据某些特征(例如,最近 5 秒内到达的数据)对所有流事件进行分组。请参阅[windows]({{< ref "docs/dev/datastream/operators/windows" >}})获取有关 window 的完整说明。
 
 {{< hint warning >}}
-This is in many cases a non-parallel transformation. All records will be gathered in one task for the windowAll operator.
+这适用于非并行转换的大多数场景。所有记录都将收集到 windowAll 算子对应的一个任务中。
 {{< /hint >}}
 
 {{< tabs windowAll >}}
@@ -237,7 +236,7 @@ dataStream
 ```
 {{< /tab >}}
 {{< tab "Python" >}}
-This feature is not yet supported in Python
+Python 中尚不支持此特性。
 {{< /tab >}}
 {{< /tabs>}}
 
@@ -245,10 +244,10 @@ This feature is not yet supported in Python
 #### WindowedStream &rarr; DataStream
 #### AllWindowedStream &rarr; DataStream
 
-Applies a general function to the window as a whole. Below is a function that manually sums the elements of a window.
+将通用 function 应用于整个窗口。下面是一个手动对窗口内元素求和的 function。
 
 {{< hint info >}}
-If you are using a windowAll transformation, you need to use an `AllWindowFunction` instead.
+如果你使用 windowAll 转换,则需要改用 `AllWindowFunction`。
 {{< /hint >}}
 
 {{< tabs windowapply >}}
@@ -267,7 +266,7 @@ windowedStream.apply(new WindowFunction<Tuple2<String,Integer>, Integer, Tuple,
     }
 });
 
-// applying an AllWindowFunction on non-keyed window stream
+// 在 non-keyed 窗口流上应用 AllWindowFunction
 allWindowedStream.apply (new AllWindowFunction<Tuple2<String,Integer>, Integer, Window>() {
     public void apply (Window window,
             Iterable<Tuple2<String, Integer>> values,
@@ -285,19 +284,19 @@ allWindowedStream.apply (new AllWindowFunction<Tuple2<String,Integer>, Integer,
 ```scala
 windowedStream.apply { WindowFunction }
 
-// applying an AllWindowFunction on non-keyed window stream
+// 在 non-keyed 窗口流上应用 AllWindowFunction
 allWindowedStream.apply { AllWindowFunction }
 ```
 {{< /tab >}}
 {{< tab "Python" >}}
-This feature is not yet supported in Python
+Python 中尚不支持此特性。
 {{< /tab >}}
 {{< /tabs>}}
 
 ### WindowReduce
 #### WindowedStream &rarr; DataStream
 
-Applies a functional reduce function to the window and returns the reduced value.
+对窗口应用 reduce function 并返回 reduce 后的值。
 
 {{< tabs windowreduce >}}
 {{< tab "Java" >}}
@@ -315,14 +314,14 @@ windowedStream.reduce { _ + _ }
 ```
 {{< /tab >}}
 {{< tab "Python" >}}
-This feature is not yet supported in Python
+Python 中尚不支持此特性。
 {{< /tab >}}
 {{< /tabs>}}
 
 ### Union
 #### DataStream\* &rarr; DataStream
 
-Union of two or more data streams creating a new stream containing all the elements from all the streams. Note: If you union a data stream with itself you will get each element twice in the resulting stream.
+将两个或多个数据流联合来创建一个包含所有流中数据的新流。注意:如果一个数据流和自身进行联合,这个流中的每个数据将在合并后的流中出现两次。
 
 {{< tabs union >}}
 {{< tab "Java" >}}
@@ -345,7 +344,7 @@ data_stream.union(otherStream1, otherStream2, ...)
 ### Window Join
 #### DataStream,DataStream &rarr; DataStream
 
-Join two data streams on a given key and a common window.
+根据指定的 key 和窗口 join 两个数据流。
 
 {{< tabs windowjoin >}}
 {{< tab "Java" >}}
@@ -365,14 +364,14 @@ dataStream.join(otherStream)
 ```
 {{< /tab >}}
 {{< tab "Python" >}}
-This feature is not yet supported in Python
+Python 中尚不支持此特性。
 {{< /tab >}}
 {{< /tabs>}}
 
 ### Interval Join
 #### KeyedStream,KeyedStream &rarr; DataStream
 
-Join two elements e1 and e2 of two keyed streams with a common key over a given time interval, so that `e1.timestamp + lowerBound <= e2.timestamp <= e1.timestamp + upperBound`.
+根据 key 相等并且满足指定的时间范围内(`e1.timestamp + lowerBound <= e2.timestamp <= e1.timestamp + upperBound`)的条件将分别属于两个 keyed stream 的元素 e1 和 e2  Join 在一起。
 
 {{< tabs intervaljoin >}}
 {{< tab "Java" >}}
@@ -399,14 +398,14 @@ keyedStream.intervalJoin(otherKeyedStream)
 ```
 {{< /tab >}}
 {{< tab "Python" >}}
-This feature is not yet supported in Python
+Python 中尚不支持此特性。
 {{< /tab >}}
 {{< /tabs>}}
 
 ### Window CoGroup
 #### DataStream,DataStream &rarr; DataStream
 
-Cogroups two data streams on a given key and a common window.
+根据指定的 key 和窗口将两个数据流组合在一起。
 
 {{< tabs windowcogroup >}}
 {{< tab "Java" >}}
@@ -426,14 +425,14 @@ dataStream.coGroup(otherStream)
 ```
 {{< /tab >}}
 {{< tab "Python" >}}
-This feature is not yet supported in Python
+Python 中尚不支持此特性。
 {{< /tab >}}
 {{< /tabs>}}
 
 ### Connect
 #### DataStream,DataStream &rarr; ConnectedStream
 
-"Connects" two data streams retaining their types. Connect allowing for shared state between the two streams.
+“连接” 两个数据流并保留各自的类型。connect 允许在两个流的处理逻辑之间共享状态。
 
 {{< tabs connect >}}
 {{< tab "Java" >}}
@@ -464,7 +463,7 @@ connected_streams = stream_1.connect(stream_2)
 ### CoMap, CoFlatMap
 #### ConnectedStream &rarr; DataStream
 
-Similar to map and flatMap on a connected data stream
+类似于在连接的数据流上进行 map 和 flatMap。
 
 {{< tabs comap >}}
 {{< tab "Java" >}}
@@ -536,7 +535,7 @@ connectedStreams.flat_map(MyCoFlatMapFunction())
 ### Iterate
 #### DataStream &rarr; IterativeStream &rarr; ConnectedStream
 
-Creates a "feedback" loop in the flow, by redirecting the output of one operator to some previous operator. This is especially useful for defining algorithms that continuously update a model. The following code starts with a stream and applies the iteration body continuously. Elements that are greater than 0 are sent back to the feedback channel, and the rest of the elements are forwarded downstream.
+通过将一个算子的输出重定向到某个之前的算子来在流中创建“反馈”循环。这对于定义持续更新模型的算法特别有用。下面的代码从一个流开始,并不断地应用迭代自身。大于 0 的元素被发送回反馈通道,其余元素被转发到下游。
 
 {{< tabs iterate >}}
 {{< tab "Java" >}}
@@ -569,7 +568,7 @@ initialStream.iterate {
 ```
 {{< /tab >}}
 {{< tab "Python" >}}
-This feature is not yet supported in Python
+Python 中尚不支持此特性。
 {{< /tab >}}
 {{< /tabs>}}
 
@@ -577,10 +576,10 @@ This feature is not yet supported in Python
 
 Flink 也提供以下方法让用户根据需要在数据转换完成后对数据分区进行更细粒度的配置。
 
-### Custom Partitioning
+### 自定义分区
 #### DataStream &rarr; DataStream
 
-Uses a user-defined Partitioner to select the target task for each element. 
+使用用户定义的 Partitioner 为每个元素选择目标任务。
 
 {{< tabs custompartitioning >}}
 {{< tab "Java" >}}
@@ -603,10 +602,10 @@ data_stream.partition_custom(lambda key, num_partition: key % partition, lambda
 {{< /tab >}}
 {{< /tabs>}}
 
-### Random Partitioning
+### 随机分区
 #### DataStream &rarr; DataStream
 
-Partitions elements randomly according to a uniform distribution. 
+将元素随机地均匀划分到分区。
 
 {{< tabs shuffle >}}
 {{< tab "Java" >}}
@@ -630,13 +629,13 @@ data_stream.shuffle()
 ### Rescaling
 #### DataStream &rarr; DataStream
 
-Partitions elements, round-robin, to a subset of downstream operations. This is useful if you want to have pipelines where you, for example, fan out from each parallel instance of a source to a subset of several mappers to distribute load but don't want the full rebalance that rebalance() would incur. This would require only local data transfers instead of transferring data over network, depending on other configuration values such as the number of slots of TaskManagers.
+将元素以 Round-robin 轮询的方式分发到下游算子。如果你想实现数据管道,这将很有用,例如,想将数据源多个并发实例的数据分发到多个下游 map 来实现负载分配,但又不想像 rebalance() 那样引起完全重新平衡。该算子将只会到本地数据传输而不是网络数据传输,这取决于其它配置值,例如 TaskManager 的 slot 数量。
 
-The subset of downstream operations to which the upstream operation sends elements depends on the degree of parallelism of both the upstream and downstream operation. For example, if the upstream operation has parallelism 2 and the downstream operation has parallelism 6, then one upstream operation would distribute elements to three downstream operations while the other upstream operation would distribute to the other three downstream operations. If, on the other hand, the downstream ope [...]
+上游算子将元素发往哪些下游的算子实例集合同时取决于上游和下游算子的并行度。例如,如果上游算子并行度为 2,下游算子的并发度为 6, 那么上游算子的其中一个并行实例将数据分发到下游算子的三个并行实例, 另外一个上游算子的并行实例则将数据分发到下游算子的另外三个并行实例中。再如,当下游算子的并行度为2,而上游算子的并行度为 6 的时候,那么上游算子中的三个并行实例将会分发数据至下游算子的其中一个并行实例,而另外三个上游算子的并行实例则将数据分发至另下游算子的另外一个并行实例。
 
-In cases where the different parallelisms are not multiples of each other one or several downstream operations will have a differing number of inputs from upstream operations.
+当算子的并行度不是彼此的倍数时,一个或多个下游算子将从上游算子获取到不同数量的输入。
 
-Please see this figure for a visualization of the connection pattern in the above example:
+请参阅下图来可视化地感知上述示例中的连接模式:
 
 {{< img src="/fig/rescale.svg" alt="Checkpoint barriers in data streams" >}}
 
@@ -658,10 +657,10 @@ data_stream.rescale()
 {{< /tab >}}
 {{< /tabs>}}
 
-### Broadcasting
+### 广播
 #### DataStream &rarr; DataStream
 
-Broadcasts elements to every partition. 
+将元素广播到每个分区 。
 
 {{< tabs broadcast >}}
 {{< tab "Java" >}}
@@ -685,14 +684,15 @@ data_stream.broadcast()
 
 将两个算子链接在一起能使得它们在同一个线程中执行,从而提升性能。Flink 默认会将能链接的算子尽可能地进行链接(例如, 两个 map 转换操作)。此外, Flink 还提供了对链接更细粒度控制的 API 以满足更多需求:
 
-如果想对整个作业禁用算子链,可以调用 `StreamExecutionEnvironment.disableOperatorChaining()`。下列方法还提供了更细粒度的控制。需要注 意的是, 这些方法只能在 `DataStream` 转换操作后才能被调用,因为它们只对前一次数据转换生效。例如,可以 `someStream.map(...).startNewChain()` 这样调用,而不能 someStream.startNewChain()这样。
+如果想对整个作业禁用算子链,可以调用 `StreamExecutionEnvironment.disableOperatorChaining()`。下列方法还提供了更细粒度的控制。需要注意的是,这些方法只能在 `DataStream` 转换操作后才能被调用,因为它们只对前一次数据转换生效。例如,可以 `someStream.map(...).startNewChain()` 这样调用,而不能 `someStream.startNewChain()` 这样。
 
-一个资源组对应着 Flink 中的一个 slot 槽,更多细节请看slots 槽。 你可以根据需要手动地将各个算子隔离到不同的 slot 中。 
+一个资源组对应着 Flink 中的一个 slot 槽,更多细节请看 slots 。 你可以根据需要手动地将各个算子隔离到不同的 slot 中。 
 
-### Start New Chain
 
-Begin a new chain, starting with this operator.
-The two mappers will be chained, and filter will not be chained to the first mapper. 
+### 创建新链
+
+基于当前算子创建一个新的算子链。  
+后面两个 map 将被链接起来,而 filter 和第一个 map 不会链接在一起。
 
 {{< tabs startnewchain >}}
 {{< tab "Java" >}}
@@ -712,9 +712,9 @@ some_stream.filter(...).map(...).start_new_chain().map(...)
 {{< /tab >}}
 {{< /tabs>}}
 
-### Disable Chaining
+### 禁止链接
 
-Do not chain the map operator.
+禁止和 map 算子链接在一起。
 
 {{< tabs disablechaining >}}
 {{< tab "Java" >}}
@@ -734,9 +734,9 @@ some_stream.map(...).disable_chaining()
 {{< /tab >}}
 {{< /tabs>}}
 
-### Set Slot Sharing Group
+### 配置 Slot 共享组
 
-Set the slot sharing group of an operation. Flink will put operations with the same slot sharing group into the same slot while keeping operations that don't have the slot sharing group in other slots. This can be used to isolate slots. The slot sharing group is inherited from input operations if all input operations are in the same slot sharing group. The name of the default slot sharing group is "default", operations can explicitly be put into this group by calling slotSharingGroup("de [...]
+为某个算子设置 slot 共享组。Flink 会将同一个 slot 共享组的算子放在同一个 slot 中,而将不在同一 slot 共享组的算子保留在其它 slot 中。这可用于隔离 slot 。如果所有输入算子都属于同一个 slot 共享组,那么 slot 共享组从将继承输入算子所在的 slot。slot 共享组的默认名称是 “default”,可以调用 slotSharingGroup(“default”) 来显式地将算子放入该组。
 
 {{< tabs slotsharing >}}
 {{< tab "Java" >}}