You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/01/24 08:59:23 UTC

[GitHub] [flink] Myasuka commented on a change in pull request #18460: [FLINK-25767][doc] Totally translated state.md into Chinese

Myasuka commented on a change in pull request #18460:
URL: https://github.com/apache/flink/pull/18460#discussion_r790516189



##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -25,32 +25,23 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Working with State
+# 使用状态
 
-In this section you will learn about the APIs that Flink provides for writing
-stateful programs. Please take a look at [Stateful Stream
-Processing]({{< ref "docs/concepts/stateful-stream-processing" >}})
-to learn about the concepts behind stateful stream processing.
+本章节您将学到 Flink 用于编写有状态程序的 API。要学习有状态流处理背后的概念,请参阅[Stateful Stream
+Processing]({{< ref "docs/concepts/stateful-stream-processing" >}})。
 
 ## Keyed DataStream
 
-If you want to use keyed state, you first need to specify a key on a
-`DataStream` that should be used to partition the state (and also the records
-in the stream themselves). You can specify a key using `keyBy(KeySelector)`
-in Java/Scala API or `key_by(KeySelector)` in Python API on a `DataStream`.
-This will yield a `KeyedStream`, which then allows operations that use keyed state.
+如果你希望使用 keyed state,首先需要为`DataStream`指定 key。这个 key 用于状态分区(也会给数据流中的记录本身分区)。
+你能够使用 `DataStream` 中 Java/Scala API 的 `keyBy(KeySelector)` 或者是 Python API 的 `key_by(KeySelector)` 来指定 key。
+它将返回 `KeyedStream`,从而允许使用 keyed state 操作。
 
-A key selector function takes a single record as input and returns the key for
-that record. The key can be of any type and **must** be derived from
-deterministic computations.
+Key selector 函数接收单个记录作为输入,返回这条记录的 key。该 key 可以为任何类型,它**必须**能够被推算出来。

Review comment:
       `deterministic` 是想强调 这个selector的计算是确定的、恒定的。

##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -25,32 +25,23 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Working with State
+# 使用状态
 
-In this section you will learn about the APIs that Flink provides for writing
-stateful programs. Please take a look at [Stateful Stream
-Processing]({{< ref "docs/concepts/stateful-stream-processing" >}})
-to learn about the concepts behind stateful stream processing.
+本章节您将学到 Flink 用于编写有状态程序的 API。要学习有状态流处理背后的概念,请参阅[Stateful Stream
+Processing]({{< ref "docs/concepts/stateful-stream-processing" >}})。
 
 ## Keyed DataStream
 
-If you want to use keyed state, you first need to specify a key on a
-`DataStream` that should be used to partition the state (and also the records
-in the stream themselves). You can specify a key using `keyBy(KeySelector)`
-in Java/Scala API or `key_by(KeySelector)` in Python API on a `DataStream`.
-This will yield a `KeyedStream`, which then allows operations that use keyed state.
+如果你希望使用 keyed state,首先需要为`DataStream`指定 key。这个 key 用于状态分区(也会给数据流中的记录本身分区)。
+你能够使用 `DataStream` 中 Java/Scala API 的 `keyBy(KeySelector)` 或者是 Python API 的 `key_by(KeySelector)` 来指定 key。

Review comment:
       ```suggestion
   你可以使用 `DataStream` 中 Java/Scala API 的 `keyBy(KeySelector)` 或者是 Python API 的 `key_by(KeySelector)` 来指定 key。
   ```

##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -605,41 +593,27 @@ val counts: DataStream[(String, Int)] = stream
     })
 ```
 
-## Operator State
+## 操作符状态 (Operator State)
 
-*Operator State* (or *non-keyed state*) is state that is bound to one
-parallel operator instance. The [Kafka Connector]({{< ref "docs/connectors/datastream/kafka" >}}) is a good motivating example for the use of
-Operator State in Flink. Each parallel instance of the Kafka consumer maintains
-a map of topic partitions and offsets as its Operator State.
+*操作符状态*(或者*非键控状态*)是绑定到一个并行操作符实例的状态。在 Flink 中使用操作符状态,[Kafka Connector]({{< ref "docs/connectors/datastream/kafka" >}})是一个很具有启发性的例子。Kafka 消费者每个并发实例维护了 topic partitions 和偏移量的 map 作为它的操作符状态。

Review comment:
       这里的翻译很生硬,愿意是想用Kafka的source connector是如何实现的来阐述如何用operator state。

##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -85,15 +76,12 @@ keyed = words.key_by(lambda row: row[0])
 {{< /tab >}}
 {{< /tabs >}}
 
-#### Tuple Keys and Expression Keys
+#### 元组 Keys 和表达式 Keys

Review comment:
       Tuple keys 说的是废弃的 `#keyBy(int... fields)` API
   Expression Keys 说的是 废弃的 `keyBy(String... fields)` API,这里的翻译会让人难以理解。

##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -85,15 +76,12 @@ keyed = words.key_by(lambda row: row[0])
 {{< /tab >}}
 {{< /tabs >}}
 
-#### Tuple Keys and Expression Keys
+#### 元组 Keys 和表达式 Keys
 
-Flink also has two alternative ways of defining keys: tuple keys and expression
-keys in the Java/Scala API(still not supported in the Python API). With this you can
-specify keys using tuple field indices or expressions
-for selecting fields of objects. We don't recommend using these today but you
-can refer to the Javadoc of DataStream to learn about them. Using a KeySelector
-function is strictly superior: with Java lambdas they are easy to use and they
-have potentially less overhead at runtime.
+Flink 有两种不同定义 key 的方式:Java/Scala 的元组 key 和表达式 key (Python API 仍未支持)。 
+通过这些方式你能够通过元组字段索引,或者是选择对象字段的表达式来指定 key。
+我们不推荐这样使用,但你可以参考 `DataStream` 的 Javadoc 来学习它们。 明显更优的办法是配合 Java Lambda 使用 `KeySelector`。
+它们用起来更为简单,执行效率也可能更高。

Review comment:
       原文的意思是用`KeySelector`明显更好,还可以配合Java Lambda 来提高`KeySelector`的易用性,而Java Lambda带来的性能额外开销很小,并不是说这种方式的“执行效率也可能更高”

##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -605,41 +593,27 @@ val counts: DataStream[(String, Int)] = stream
     })
 ```
 
-## Operator State
+## 操作符状态 (Operator State)
 
-*Operator State* (or *non-keyed state*) is state that is bound to one
-parallel operator instance. The [Kafka Connector]({{< ref "docs/connectors/datastream/kafka" >}}) is a good motivating example for the use of
-Operator State in Flink. Each parallel instance of the Kafka consumer maintains
-a map of topic partitions and offsets as its Operator State.
+*操作符状态*(或者*非键控状态*)是绑定到一个并行操作符实例的状态。在 Flink 中使用操作符状态,[Kafka Connector]({{< ref "docs/connectors/datastream/kafka" >}})是一个很具有启发性的例子。Kafka 消费者每个并发实例维护了 topic partitions 和偏移量的 map 作为它的操作符状态。
 
-The Operator State interfaces support redistributing state among parallel
-operator instances when the parallelism is changed. There are different schemes
-for doing this redistribution.
+当并行度改变的时候,操作符状态接口支持将状态重分发给各个并行操作符实例。处理重分发过程有多种不同的方案。
 
-In a typical stateful Flink Application you don't need operators state. It is
-mostly a special type of state that is used in source/sink implementations and
-scenarios where you don't have a key by which state can be partitioned.
+在典型的有状态 Flink 作业中你无需使用操作符状态。它大都用于数据源/落地端实现,以及你不需要 state 按照 key 来分区的这类场景中,作为一种特殊类型的状态使用。
 
-**Notes:** Operator state is still not supported in Python DataStream API.
+**注意:** Python DataStream API 仍无法支持操作符状态。
 
-## Broadcast State
+## 广播状态 (Broadcast State)
 
-*Broadcast State* is a special type of *Operator State*.  It was introduced to
-support use cases where records of one stream need to be broadcasted to all
-downstream tasks, where they are used to maintain the same state among all
-subtasks. This state can then be accessed while processing records of a second
-stream. As an example where broadcast state can emerge as a natural fit, one
-can imagine a low-throughput stream containing a set of rules which we want to
-evaluate against all elements coming from another stream. Having the above type
-of use cases in mind, broadcast state differs from the rest of operator states
-in that:
+*广播状态*是一种特殊的操作符状态。它用于支持流中的元素需要广播到所有下游任务的用例。在这些用例中,广播状态用于维护所有子任务相同的状态。

Review comment:
       `Task` 怎么翻译成了 `用例`?

##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -605,41 +593,27 @@ val counts: DataStream[(String, Int)] = stream
     })
 ```
 
-## Operator State
+## 操作符状态 (Operator State)
 
-*Operator State* (or *non-keyed state*) is state that is bound to one
-parallel operator instance. The [Kafka Connector]({{< ref "docs/connectors/datastream/kafka" >}}) is a good motivating example for the use of
-Operator State in Flink. Each parallel instance of the Kafka consumer maintains
-a map of topic partitions and offsets as its Operator State.
+*操作符状态*(或者*非键控状态*)是绑定到一个并行操作符实例的状态。在 Flink 中使用操作符状态,[Kafka Connector]({{< ref "docs/connectors/datastream/kafka" >}})是一个很具有启发性的例子。Kafka 消费者每个并发实例维护了 topic partitions 和偏移量的 map 作为它的操作符状态。
 
-The Operator State interfaces support redistributing state among parallel
-operator instances when the parallelism is changed. There are different schemes
-for doing this redistribution.
+当并行度改变的时候,操作符状态接口支持将状态重分发给各个并行操作符实例。处理重分发过程有多种不同的方案。
 
-In a typical stateful Flink Application you don't need operators state. It is
-mostly a special type of state that is used in source/sink implementations and
-scenarios where you don't have a key by which state can be partitioned.
+在典型的有状态 Flink 作业中你无需使用操作符状态。它大都用于数据源/落地端实现,以及你不需要 state 按照 key 来分区的这类场景中,作为一种特殊类型的状态使用。

Review comment:
       一般不会把 sink翻译成 落地。




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org