You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/05/07 09:10:16 UTC

[GitHub] [flink] klion26 commented on a change in pull request #12012: [FLINK-17289][docs]Translate tutorials/etl.md to Chinese

klion26 commented on a change in pull request #12012:
URL: https://github.com/apache/flink/pull/12012#discussion_r421255062



##########
File path: docs/training/etl.zh.md
##########
@@ -24,35 +24,23 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-One very common use case for Apache Flink is to implement ETL (extract, transform, load) pipelines
-that take data from one or more sources, perform some transformations and/or enrichments, and
-then store the results somewhere. In this section we are going to look at how to use Flink's
-DataStream API to implement this kind of application.
+Apache Flink的一种常见应用场景是ETL(抽取、转换、加载)管道任务。从一个或多个数据源获取数据,进行一些转换操作和信息补充,将结果存储起来。在这个教程中,我们将介绍如何使用Flink的DataStream API实现这类应用。

Review comment:
       ```suggestion
   Apache Flink 的一种常见应用场景是 ETL(抽取、转换、加载)管道任务。从一个或多个数据源获取数据,进行一些转换操作和信息补充,将结果存储起来。在这个教程中,我们将介绍如何使用Flink 的 DataStream API 实现这类应用。
   ```

##########
File path: docs/training/etl.zh.md
##########
@@ -24,35 +24,23 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-One very common use case for Apache Flink is to implement ETL (extract, transform, load) pipelines
-that take data from one or more sources, perform some transformations and/or enrichments, and
-then store the results somewhere. In this section we are going to look at how to use Flink's
-DataStream API to implement this kind of application.
+Apache Flink的一种常见应用场景是ETL(抽取、转换、加载)管道任务。从一个或多个数据源获取数据,进行一些转换操作和信息补充,将结果存储起来。在这个教程中,我们将介绍如何使用Flink的DataStream API实现这类应用。
+
+这里注意,Flink的[Table 和 SQL API]({% link dev/table/index.zh.md %})完全可以满足很多ETL使用场景。但无论你最终是否直接使用DataStream API,对这里介绍的基本知识有扎实的理解都是有价值的。
 
-Note that Flink's [Table and SQL APIs]({% link dev/table/index.zh.md %})
-are well suited for many ETL use cases. But regardless of whether you ultimately use
-the DataStream API directly, or not, having a solid understanding the basics presented here will
-prove valuable.
 
 * This will be replaced by the TOC
 {:toc}
 
-## Stateless Transformations
+## 无状态的转换
 
-This section covers `map()` and `flatmap()`, the basic operations used to implement
-stateless transformations. The examples in this section assume you are familiar with the
-Taxi Ride data used in the hands-on exercises in the
-[flink-training repo](https://github.com/apache/flink-training/tree/{% if site.is_stable %}release-{{ site.version_title }}{% else %}master{% endif %}).
+本节涵盖了 `map()` 和 `flatmap()`,这两种算子可以用来实现无状态转换的基本操作。本节中的示例建立在你已经熟悉[flink-training repo](https://github.com/apache/flink-training/tree/{% if site.is_stable %}release-{{ site.version_title }}{% else %}master{% endif %})中的出租车行程数据的基础上。

Review comment:
       ```suggestion
   本节涵盖了 `map()` 和 `flatmap()`,这两种算子可以用来实现无状态转换的基本操作。本节中的示例建立在你已经熟悉 [flink-training repo](https://github.com/apache/flink-training/tree/{% if site.is_stable %}release-{{ site.version_title }}{% else %}master{% endif %}) 中的出租车行程数据的基础上。
   ```

##########
File path: docs/training/etl.zh.md
##########
@@ -24,35 +24,23 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-One very common use case for Apache Flink is to implement ETL (extract, transform, load) pipelines
-that take data from one or more sources, perform some transformations and/or enrichments, and
-then store the results somewhere. In this section we are going to look at how to use Flink's
-DataStream API to implement this kind of application.
+Apache Flink的一种常见应用场景是ETL(抽取、转换、加载)管道任务。从一个或多个数据源获取数据,进行一些转换操作和信息补充,将结果存储起来。在这个教程中,我们将介绍如何使用Flink的DataStream API实现这类应用。
+
+这里注意,Flink的[Table 和 SQL API]({% link dev/table/index.zh.md %})完全可以满足很多ETL使用场景。但无论你最终是否直接使用DataStream API,对这里介绍的基本知识有扎实的理解都是有价值的。
 
-Note that Flink's [Table and SQL APIs]({% link dev/table/index.zh.md %})
-are well suited for many ETL use cases. But regardless of whether you ultimately use
-the DataStream API directly, or not, having a solid understanding the basics presented here will
-prove valuable.
 
 * This will be replaced by the TOC
 {:toc}
 
-## Stateless Transformations
+## 无状态的转换
 
-This section covers `map()` and `flatmap()`, the basic operations used to implement
-stateless transformations. The examples in this section assume you are familiar with the
-Taxi Ride data used in the hands-on exercises in the
-[flink-training repo](https://github.com/apache/flink-training/tree/{% if site.is_stable %}release-{{ site.version_title }}{% else %}master{% endif %}).
+本节涵盖了 `map()` 和 `flatmap()`,这两种算子可以用来实现无状态转换的基本操作。本节中的示例建立在你已经熟悉[flink-training repo](https://github.com/apache/flink-training/tree/{% if site.is_stable %}release-{{ site.version_title }}{% else %}master{% endif %})中的出租车行程数据的基础上。
 
 ### `map()`
 
-In the first exercise you filtered a stream of taxi ride events. In that same code base there's a
-`GeoUtils` class that provides a static method `GeoUtils.mapToGridCell(float lon, float lat)` which
-maps a location (longitude, latitude) to a grid cell that refers to an area that is approximately
-100x100 meters in size.
+在第一个练习中,你讲过滤出租车行程数据中的事件。在同一代码仓库中,有一个 `GeoUtils` 类,提供了一个静态方法 `GeoUtils.mapToGridCell(float lon, float lat)`,它可以将位置坐标(经度,维度)映射到100x100米的对应不同区域的网格单元。

Review comment:
       ```suggestion
   在第一个练习中,你将过滤出租车行程数据中的事件。在同一代码仓库中,有一个 `GeoUtils` 类,提供了一个静态方法 `GeoUtils.mapToGridCell(float lon, float lat)`,它可以将位置坐标(经度,维度)映射到 100x100 米的对应不同区域的网格单元。
   ```

##########
File path: docs/training/etl.zh.md
##########
@@ -24,35 +24,23 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-One very common use case for Apache Flink is to implement ETL (extract, transform, load) pipelines
-that take data from one or more sources, perform some transformations and/or enrichments, and
-then store the results somewhere. In this section we are going to look at how to use Flink's
-DataStream API to implement this kind of application.
+Apache Flink的一种常见应用场景是ETL(抽取、转换、加载)管道任务。从一个或多个数据源获取数据,进行一些转换操作和信息补充,将结果存储起来。在这个教程中,我们将介绍如何使用Flink的DataStream API实现这类应用。
+
+这里注意,Flink的[Table 和 SQL API]({% link dev/table/index.zh.md %})完全可以满足很多ETL使用场景。但无论你最终是否直接使用DataStream API,对这里介绍的基本知识有扎实的理解都是有价值的。
 
-Note that Flink's [Table and SQL APIs]({% link dev/table/index.zh.md %})
-are well suited for many ETL use cases. But regardless of whether you ultimately use
-the DataStream API directly, or not, having a solid understanding the basics presented here will
-prove valuable.
 
 * This will be replaced by the TOC
 {:toc}
 
-## Stateless Transformations
+## 无状态的转换
 
-This section covers `map()` and `flatmap()`, the basic operations used to implement
-stateless transformations. The examples in this section assume you are familiar with the
-Taxi Ride data used in the hands-on exercises in the
-[flink-training repo](https://github.com/apache/flink-training/tree/{% if site.is_stable %}release-{{ site.version_title }}{% else %}master{% endif %}).
+本节涵盖了 `map()` 和 `flatmap()`,这两种算子可以用来实现无状态转换的基本操作。本节中的示例建立在你已经熟悉[flink-training repo](https://github.com/apache/flink-training/tree/{% if site.is_stable %}release-{{ site.version_title }}{% else %}master{% endif %})中的出租车行程数据的基础上。
 
 ### `map()`
 
-In the first exercise you filtered a stream of taxi ride events. In that same code base there's a
-`GeoUtils` class that provides a static method `GeoUtils.mapToGridCell(float lon, float lat)` which
-maps a location (longitude, latitude) to a grid cell that refers to an area that is approximately
-100x100 meters in size.
+在第一个练习中,你讲过滤出租车行程数据中的事件。在同一代码仓库中,有一个 `GeoUtils` 类,提供了一个静态方法 `GeoUtils.mapToGridCell(float lon, float lat)`,它可以将位置坐标(经度,维度)映射到100x100米的对应不同区域的网格单元。
 
-Now let's enrich our stream of taxi ride objects by adding `startCell` and `endCell` fields to each
-event. You can create an `EnrichedRide` object that extends `TaxiRide`, adding these fields:
+现在让我们为每个出租车行程时间的数据对象增加 `startCell` 和 `endCell` 字段。你可以创建一个继承 `TaxiRide` and `EnrichedRide` 类,添加这些字段:

Review comment:
       ```suggestion
   现在让我们为每个出租车行程时间的数据对象增加 `startCell` 和 `endCell` 字段。你可以创建一个继承 `TaxiRide` 的 `EnrichedRide` 类,添加这些字段:
   ```

##########
File path: docs/training/etl.zh.md
##########
@@ -24,35 +24,23 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-One very common use case for Apache Flink is to implement ETL (extract, transform, load) pipelines
-that take data from one or more sources, perform some transformations and/or enrichments, and
-then store the results somewhere. In this section we are going to look at how to use Flink's
-DataStream API to implement this kind of application.
+Apache Flink的一种常见应用场景是ETL(抽取、转换、加载)管道任务。从一个或多个数据源获取数据,进行一些转换操作和信息补充,将结果存储起来。在这个教程中,我们将介绍如何使用Flink的DataStream API实现这类应用。
+
+这里注意,Flink的[Table 和 SQL API]({% link dev/table/index.zh.md %})完全可以满足很多ETL使用场景。但无论你最终是否直接使用DataStream API,对这里介绍的基本知识有扎实的理解都是有价值的。

Review comment:
       ```suggestion
   这里注意,Flink 的 [Table 和 SQL API]({% link dev/table/index.zh.md %}) 完全可以满足很多 ETL 使用场景。但无论你最终是否直接使用 DataStream API,对这里介绍的基本知识有扎实的理解都是有价值的。
   ```

##########
File path: docs/training/etl.zh.md
##########
@@ -103,9 +91,7 @@ public static class Enrichment implements MapFunction<TaxiRide, EnrichedRide> {
 
 ### `flatmap()`
 
-A `MapFunction` is suitable only when performing a one-to-one transformation: for each and every
-stream element coming in, `map()` will emit one transformed element. Otherwise, you will want to use
-`flatmap()`
+`MapFunction` 只适用于一对一的转换:对每个进入算子的流元素,`map()` 将发射一个转换后的元素。对于除此以外的场景,你将要使用 `flatmap()`。

Review comment:
       ”将发射“ 改成 ”仅输出/仅发送“是不是会更好一些?

##########
File path: docs/training/etl.zh.md
##########
@@ -131,36 +117,27 @@ public static class NYCEnrichment implements FlatMapFunction<TaxiRide, EnrichedR
 }
 {% endhighlight %}
 
-With the `Collector` provided in this interface, the `flatmap()` method can emit as many stream
-elements as you like, including none at all.
+使用接口中提供的 `Collector` ,`flatmap()` 可以发射你想要的任意数量的元素,也可以一个都不发。
 
 {% top %}
 
 ## Keyed Streams
 
 ### `keyBy()`
 
-It is often very useful to be able to partition a stream around one of its attributes, so that all
-events with the same value of that attribute are grouped together. For example, suppose you wanted
-to find the longest taxi rides starting in each of the grid cells. Thinking in terms of a SQL query,
-this would mean doing some sort of GROUP BY with the `startCell`, while in Flink this is done with
-`keyBy(KeySelector)`
+将一个流根据其中的一些属性来进行分区是十分有用的,这样我们可以使所有具有相同属性的事件分到相同的组里。例如,如果你想找到从每个网格单元出发的最远的出租车行程。按 SQL 查询的方式来考虑,这意味着要对 `startCell` 进行 GROUP BY 再排序,在 Flink 中这部分可以用 `keyBy(KeySelector)` 实现。
 
 {% highlight java %}
 rides
     .flatMap(new NYCEnrichment())
     .keyBy("startCell")
 {% endhighlight %}
 
-Every `keyBy` causes a network shuffle that repartitions the stream. In general this is pretty
-expensive, since it involves network communication along with serialization and deserialization.
+每个 `keyBy` 会通过 shuffle 来为数据流进行重新分区。总体来说这个开销是很大的,它涉及网络通信、序列化和反序列化。
 
 <img src="{{ site.baseurl }}/fig/keyBy.png" alt="keyBy and network shuffle" class="offset" width="45%" />
 
-In the example above, the key has been specified by a field name, "startCell". This style of key
-selection has the drawback that the compiler is unable to infer the type of the field being used for
-keying, and so Flink will pass around the key values as Tuples, which can be awkward. It is
-better to use a properly typed KeySelector, e.g.,
+在上面的例子中,将 "startCell" 这个字段定义为key。这种选择key的方式有个缺点,就是编译器无法推断用作键的字段的类型,所以 Flink 会将键值作为元组传递,这有时候会比较难处理。所以最好还是使用一个合适的 KeySelector,

Review comment:
       ```suggestion
   在上面的例子中,将 "startCell" 这个字段定义为 key。这种选择 key 的方式有个缺点,就是编译器无法推断用作键的字段的类型,所以 Flink 会将键值作为元组传递,这有时候会比较难处理。所以最好还是使用一个合适的 KeySelector,比如:
   ```

##########
File path: docs/training/etl.zh.md
##########
@@ -131,36 +117,27 @@ public static class NYCEnrichment implements FlatMapFunction<TaxiRide, EnrichedR
 }
 {% endhighlight %}
 
-With the `Collector` provided in this interface, the `flatmap()` method can emit as many stream
-elements as you like, including none at all.
+使用接口中提供的 `Collector` ,`flatmap()` 可以发射你想要的任意数量的元素,也可以一个都不发。

Review comment:
       ”发射“改成”输出“ 或者 ”发送“ 等是不是会更好一些?

##########
File path: docs/training/etl.zh.md
##########
@@ -175,43 +152,35 @@ rides
         })
 {% endhighlight %}
 
-which can be more succinctly expressed with a lambda:
+也可以使用 lambda 表达式使它更简洁:

Review comment:
       ”也可以使用更简洁的 lambda 表达式:“

##########
File path: docs/training/etl.zh.md
##########
@@ -175,43 +152,35 @@ rides
         })
 {% endhighlight %}
 
-which can be more succinctly expressed with a lambda:
+也可以使用 lambda 表达式使它更简洁:
 
 {% highlight java %}
 rides
     .flatMap(new NYCEnrichment())
     .keyBy(enrichedRide -> enrichedRide.startCell)
 {% endhighlight %}
 
-### Keys are computed
+### 通过计算得到键
 
-KeySelectors aren't limited to extracting a key from your events. They can, instead, 
-compute the key in whatever way you want, so long as the resulting key is deterministic,
-and has valid implementations of `hashCode()` and `equals()`. This restriction rules out
-KeySelectors that generate random numbers, or that return Arrays or Enums, but you
-can have composite keys using Tuples or POJOs, for example, so long as their elements
-follow these same rules.
+KeySelector 不仅限于从事件中抽取键。 你也可以按想要的方式计算得到键值,只要最终结果是确定的,并且有 `hashCode()` 和 `equals()` 的实现。这些限制条件不包括产生随机数或者返回 Arrays 或 Enums 的 KeySelector ,但你可以用元组和 POJO 来组成键,只要他们的元素遵循上述条件。

Review comment:
       ```suggestion
   KeySelector 不仅限于从事件中抽取键。你也可以按想要的方式计算得到键值,只要最终结果是确定的,并且实现了 `hashCode()` 和 `equals()`。这些限制条件不包括产生随机数或者返回 Arrays 或 Enums 的 KeySelector,但你可以用元组和 POJO 来组成键,只要他们的元素遵循上述条件。
   ```

##########
File path: docs/training/etl.zh.md
##########
@@ -175,43 +152,35 @@ rides
         })
 {% endhighlight %}
 
-which can be more succinctly expressed with a lambda:
+也可以使用 lambda 表达式使它更简洁:
 
 {% highlight java %}
 rides
     .flatMap(new NYCEnrichment())
     .keyBy(enrichedRide -> enrichedRide.startCell)
 {% endhighlight %}
 
-### Keys are computed
+### 通过计算得到键
 
-KeySelectors aren't limited to extracting a key from your events. They can, instead, 
-compute the key in whatever way you want, so long as the resulting key is deterministic,
-and has valid implementations of `hashCode()` and `equals()`. This restriction rules out
-KeySelectors that generate random numbers, or that return Arrays or Enums, but you
-can have composite keys using Tuples or POJOs, for example, so long as their elements
-follow these same rules.
+KeySelector 不仅限于从事件中抽取键。 你也可以按想要的方式计算得到键值,只要最终结果是确定的,并且有 `hashCode()` 和 `equals()` 的实现。这些限制条件不包括产生随机数或者返回 Arrays 或 Enums 的 KeySelector ,但你可以用元组和 POJO 来组成键,只要他们的元素遵循上述条件。
 
-The keys must be produced in a deterministic way, because they are recomputed whenever they
-are needed, rather than being attached to the stream records.
+键必须按确定的方式产生,因为它们会再需要的时候被重新计算,而不是一直被带在流的记录中。

Review comment:
       ```suggestion
   键必须按确定的方式产生,因为它们会在需要的时候被重新计算,而不是一直被带在流记录中。
   ```

##########
File path: docs/training/etl.zh.md
##########
@@ -262,65 +227,51 @@ The output stream now contains a record for each key every time the duration rea
     ...
     1> (50797,12M)
 
-### (Implicit) State
+### (隐式的)状态
 
-This is the first example in this training that involves stateful streaming. Though the state is
-being handled transparently, Flink has to keep track of the maximum duration for each distinct
-key.
+这是培训中第一个包含状态的流的例子。尽管状态的处理是透明的,Flink必须跟踪每个不同的键的最大时长。
 
-Whenever state gets involved in your application, you should think about how large the state might
-become. Whenever the key space is unbounded, then so is the amount of state Flink will need.
+只要应用中有状态,你就应该考虑状态的大小。如果键值的数量是无限的,那 Flink 的状态需要的空间也同样是无限的。
 
-When working with streams, it generally makes more sense to think in terms of aggregations over
-finite windows, rather than over the entire stream.
+当我们在流上作业时,考虑有限窗口的聚合往往比整个流聚合更有意义。
 
-### `reduce()` and other aggregators
+### `reduce()` 和其他聚合算子
 
-`maxBy()`, used above, is just one example of a number of aggregator functions available on Flink's
-`KeyedStream`s. There is also a more general purpose `reduce()` function that you can use to
-implement your own custom aggregations.
+上面用到的 `maxBy()` 只是 Flink 中 `KeyedStream` 上使用的众多聚合函数中的一个。还有一个更通用的 `reduce()` 函数可以用来实现你的自定义聚合。
 
 {% top %}
 
-## Stateful Transformations
+## 有状态的转换
 
-### Why is Flink Involved in Managing State?
+### 为什么 Flink 要参与管理状态?
 
-Your applications are certainly capable of using state without getting Flink involved in managing it
--- but Flink offers some compelling features for the state it manages:
+在Flink不参与管理状态的情况下,你的应用也可以使用状态,但Flink为其管理状态提供了一些引人注目的特性:

Review comment:
       ```suggestion
   在 Flink 不参与管理状态的情况下,你的应用也可以使用状态,但 Flink 为其管理状态提供了一些引人注目的特性:
   ```

##########
File path: docs/training/etl.zh.md
##########
@@ -262,65 +227,51 @@ The output stream now contains a record for each key every time the duration rea
     ...
     1> (50797,12M)
 
-### (Implicit) State
+### (隐式的)状态
 
-This is the first example in this training that involves stateful streaming. Though the state is
-being handled transparently, Flink has to keep track of the maximum duration for each distinct
-key.
+这是培训中第一个包含状态的流的例子。尽管状态的处理是透明的,Flink必须跟踪每个不同的键的最大时长。

Review comment:
       ```suggestion
   这是培训中第一个涉及到有状态流的例子。尽管状态的处理是透明的,Flink 必须跟踪每个不同的键的最大时长。
   ```

##########
File path: docs/training/etl.zh.md
##########
@@ -262,65 +227,51 @@ The output stream now contains a record for each key every time the duration rea
     ...
     1> (50797,12M)
 
-### (Implicit) State
+### (隐式的)状态
 
-This is the first example in this training that involves stateful streaming. Though the state is
-being handled transparently, Flink has to keep track of the maximum duration for each distinct
-key.
+这是培训中第一个包含状态的流的例子。尽管状态的处理是透明的,Flink必须跟踪每个不同的键的最大时长。
 
-Whenever state gets involved in your application, you should think about how large the state might
-become. Whenever the key space is unbounded, then so is the amount of state Flink will need.
+只要应用中有状态,你就应该考虑状态的大小。如果键值的数量是无限的,那 Flink 的状态需要的空间也同样是无限的。
 
-When working with streams, it generally makes more sense to think in terms of aggregations over
-finite windows, rather than over the entire stream.
+当我们在流上作业时,考虑有限窗口的聚合往往比整个流聚合更有意义。

Review comment:
       ”当我们在流上作业时“ -> "在流处理场景中“ 是否会更好呢?或者其他的翻译。现在这个翻译个人感觉有点不太顺

##########
File path: docs/training/etl.zh.md
##########
@@ -262,65 +227,51 @@ The output stream now contains a record for each key every time the duration rea
     ...
     1> (50797,12M)
 
-### (Implicit) State
+### (隐式的)状态
 
-This is the first example in this training that involves stateful streaming. Though the state is
-being handled transparently, Flink has to keep track of the maximum duration for each distinct
-key.
+这是培训中第一个包含状态的流的例子。尽管状态的处理是透明的,Flink必须跟踪每个不同的键的最大时长。
 
-Whenever state gets involved in your application, you should think about how large the state might
-become. Whenever the key space is unbounded, then so is the amount of state Flink will need.
+只要应用中有状态,你就应该考虑状态的大小。如果键值的数量是无限的,那 Flink 的状态需要的空间也同样是无限的。
 
-When working with streams, it generally makes more sense to think in terms of aggregations over
-finite windows, rather than over the entire stream.
+当我们在流上作业时,考虑有限窗口的聚合往往比整个流聚合更有意义。
 
-### `reduce()` and other aggregators
+### `reduce()` 和其他聚合算子
 
-`maxBy()`, used above, is just one example of a number of aggregator functions available on Flink's
-`KeyedStream`s. There is also a more general purpose `reduce()` function that you can use to
-implement your own custom aggregations.
+上面用到的 `maxBy()` 只是 Flink 中 `KeyedStream` 上使用的众多聚合函数中的一个。还有一个更通用的 `reduce()` 函数可以用来实现你的自定义聚合。
 
 {% top %}
 
-## Stateful Transformations
+## 有状态的转换
 
-### Why is Flink Involved in Managing State?
+### 为什么 Flink 要参与管理状态?
 
-Your applications are certainly capable of using state without getting Flink involved in managing it
--- but Flink offers some compelling features for the state it manages:
+在Flink不参与管理状态的情况下,你的应用也可以使用状态,但Flink为其管理状态提供了一些引人注目的特性:
 
-* **local**: Flink state is kept local to the machine that processes it, and can be accessed at memory speed
-* **durable**: Flink state is fault-tolerant, i.e., it is automatically checkpointed at regular intervals, and is restored upon failure
-* **vertically scalable**: Flink state can be kept in embedded RocksDB instances that scale by adding more local disk
-* **horizontally scalable**: Flink state is redistributed as your cluster grows and shrinks
-* **queryable**: Flink state can be queried externally via the [Queryable State API]({% link dev/stream/state/queryable_state.zh.md %}).
+* **本地性**: Flink 状态是存储在使用它的机器本地的,并且可以以内存访问速度来获取
+* **持久性**: Flink 状态是容错的,例如,它可以自动按一定的时间间隔产生 checkpoint, 并且在任务失败后进行恢复

Review comment:
       ```suggestion
   * **持久性**: Flink 状态是容错的,例如,它可以自动按一定的时间间隔产生 checkpoint,并且在任务失败后进行恢复
   ```

##########
File path: docs/training/etl.zh.md
##########
@@ -231,13 +200,9 @@ DataStream<Tuple2<Integer, Minutes>> minutesByStartCell = enrichedNYCRides
     });
 {% endhighlight %}
 
-Now it is possible to produce a stream that contains only those rides that are the longest rides
-ever seen (to that point) for each `startCell`.
+现在就可以对每个 `startCell` 找到最长的行程,并产生一个流。

Review comment:
       这里的意思是指”现在产生的流仅包含每个 `startCell` 的那些 longest rides 的数据“?

##########
File path: docs/training/etl.zh.md
##########
@@ -262,65 +227,51 @@ The output stream now contains a record for each key every time the duration rea
     ...
     1> (50797,12M)
 
-### (Implicit) State
+### (隐式的)状态
 
-This is the first example in this training that involves stateful streaming. Though the state is
-being handled transparently, Flink has to keep track of the maximum duration for each distinct
-key.
+这是培训中第一个包含状态的流的例子。尽管状态的处理是透明的,Flink必须跟踪每个不同的键的最大时长。
 
-Whenever state gets involved in your application, you should think about how large the state might
-become. Whenever the key space is unbounded, then so is the amount of state Flink will need.
+只要应用中有状态,你就应该考虑状态的大小。如果键值的数量是无限的,那 Flink 的状态需要的空间也同样是无限的。
 
-When working with streams, it generally makes more sense to think in terms of aggregations over
-finite windows, rather than over the entire stream.
+当我们在流上作业时,考虑有限窗口的聚合往往比整个流聚合更有意义。
 
-### `reduce()` and other aggregators
+### `reduce()` 和其他聚合算子
 
-`maxBy()`, used above, is just one example of a number of aggregator functions available on Flink's
-`KeyedStream`s. There is also a more general purpose `reduce()` function that you can use to
-implement your own custom aggregations.
+上面用到的 `maxBy()` 只是 Flink 中 `KeyedStream` 上使用的众多聚合函数中的一个。还有一个更通用的 `reduce()` 函数可以用来实现你的自定义聚合。

Review comment:
       ```suggestion
   上面用到的 `maxBy()` 只是 Flink 中 `KeyedStream` 上众多聚合函数中的一个。还有一个更通用的 `reduce()` 函数可以用来实现你的自定义聚合。
   ```

##########
File path: docs/training/etl.zh.md
##########
@@ -262,65 +227,51 @@ The output stream now contains a record for each key every time the duration rea
     ...
     1> (50797,12M)
 
-### (Implicit) State
+### (隐式的)状态
 
-This is the first example in this training that involves stateful streaming. Though the state is
-being handled transparently, Flink has to keep track of the maximum duration for each distinct
-key.
+这是培训中第一个包含状态的流的例子。尽管状态的处理是透明的,Flink必须跟踪每个不同的键的最大时长。
 
-Whenever state gets involved in your application, you should think about how large the state might
-become. Whenever the key space is unbounded, then so is the amount of state Flink will need.
+只要应用中有状态,你就应该考虑状态的大小。如果键值的数量是无限的,那 Flink 的状态需要的空间也同样是无限的。
 
-When working with streams, it generally makes more sense to think in terms of aggregations over
-finite windows, rather than over the entire stream.
+当我们在流上作业时,考虑有限窗口的聚合往往比整个流聚合更有意义。
 
-### `reduce()` and other aggregators
+### `reduce()` 和其他聚合算子
 
-`maxBy()`, used above, is just one example of a number of aggregator functions available on Flink's
-`KeyedStream`s. There is also a more general purpose `reduce()` function that you can use to
-implement your own custom aggregations.
+上面用到的 `maxBy()` 只是 Flink 中 `KeyedStream` 上使用的众多聚合函数中的一个。还有一个更通用的 `reduce()` 函数可以用来实现你的自定义聚合。
 
 {% top %}
 
-## Stateful Transformations
+## 有状态的转换
 
-### Why is Flink Involved in Managing State?
+### 为什么 Flink 要参与管理状态?

Review comment:
       这个地方是否有更好的翻译呢?

##########
File path: docs/training/etl.zh.md
##########
@@ -376,76 +320,52 @@ public static class Deduplicator extends RichFlatMapFunction<Event, Event> {
 }
 {% endhighlight %}
 
-When the flatMap method calls `keyHasBeenSeen.value()`, Flink's runtime looks up the value of this
-piece of state _for the key in context_, and only if it is `null` does it go ahead and collect the
-event to the output. It also updates `keyHasBeenSeen` to `true` in this case. 
+当 flatMap 方法调用 `keyHasBeenSeen.value()`,Flink 运行时将在 _当前键的上下文_ 中检索状态的值,只有当它为 `null` 时,才会输出当前事件。这种情况下,它同时也将更新 `keyHasBeenSeen` 为 `true`。

Review comment:
       ```suggestion
   当 flatMap 方法调用 `keyHasBeenSeen.value()` 时,Flink 会在 _当前键的上下文_ 中检索状态值,只有当状态为 `null` 时,才会输出当前事件。这种情况下,它同时也将更新 `keyHasBeenSeen` 为 `true`。
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org