You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/05/09 02:55:30 UTC
[GitHub] [flink] klion26 commented on a change in pull request #11971: [FLINK-17271] Translate new DataStream API tutorial

klion26 commented on a change in pull request #11971:
URL: https://github.com/apache/flink/pull/11971#discussion_r422442438



##########
File path: docs/training/datastream_api.zh.md
##########
@@ -24,30 +24,27 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-The focus of this training is to broadly cover the DataStream API well enough that you will be able
-to get started writing streaming applications.
+该练习的重点是充分全面地了解 DataStream API，以便于入门编写流式应用。

Review comment:
       ”以便于入门编写流式应用“ 这句话读起来优点拗口，”以便于编写流式应用入门“ 或者其他翻译是否会更好一点呢

##########
File path: docs/training/datastream_api.zh.md
##########
@@ -24,30 +24,27 @@ specific language governing permissions and limitations
 under the License.
 -->
 

Review comment:
       第 5 行的 ”nav-title:“ 后面的标题也需要翻译。
   翻译之后，可以在本地 flink 目录执行 "sh docs/build.sh -p" 然后打开 localhost:4000 检查翻译的效果

##########
File path: docs/training/datastream_api.zh.md
##########
@@ -24,30 +24,27 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-The focus of this training is to broadly cover the DataStream API well enough that you will be able
-to get started writing streaming applications.
+该练习的重点是充分全面地了解 DataStream API，以便于入门编写流式应用。
 
 * This will be replaced by the TOC
 {:toc}
 
-## What can be Streamed?
+## 什么能被转化成流？
 
-Flink's DataStream APIs for Java and Scala will let you stream anything they can serialize. Flink's
-own serializer is used for
+Flink 的 Java 和 Scala DataStream API 可以将任何可序列化的对象转化为流。Flink  自带的序列化器有
 
-- basic types, i.e., String, Long, Integer, Boolean, Array
-- composite types: Tuples, POJOs, and Scala case classes
+- 基本类型，即String、Long、Integer、Boolean、Array

Review comment:
       ```suggestion
   - 基本类型，即 String、Long、Integer、Boolean、Array
   ```

##########
File path: docs/training/datastream_api.zh.md
##########
@@ -24,30 +24,27 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-The focus of this training is to broadly cover the DataStream API well enough that you will be able
-to get started writing streaming applications.
+该练习的重点是充分全面地了解 DataStream API，以便于入门编写流式应用。
 
 * This will be replaced by the TOC
 {:toc}
 
-## What can be Streamed?
+## 什么能被转化成流？
 
-Flink's DataStream APIs for Java and Scala will let you stream anything they can serialize. Flink's
-own serializer is used for
+Flink 的 Java 和 Scala DataStream API 可以将任何可序列化的对象转化为流。Flink  自带的序列化器有
 
-- basic types, i.e., String, Long, Integer, Boolean, Array
-- composite types: Tuples, POJOs, and Scala case classes
+- 基本类型，即String、Long、Integer、Boolean、Array
+- 复合类型：Tuples、POJOs 和 Scala case classes
 
-and Flink falls back to Kryo for other types. It is also possible to use other serializers with
-Flink. Avro, in particular, is well supported.
+而且 Flink 可以交给 Kryo 序列化其他类型。也可以将其他序列化器和 Flink 一起使用。特别是有良好支持的 Avro。

Review comment:
       ”而且 Flink 可以交给 Kryo 序列化其他类型“ 这里是说不属于上面的类型，Flink 会使用 Kryo 进行序列化，这里能够有一个更好的翻译吗？

##########
File path: docs/training/datastream_api.zh.md
##########
@@ -82,17 +77,17 @@ public class Person {
 Person person = new Person("Fred Flintstone", 35);
 {% endhighlight %}
 
-Flink's serializer [supports schema evolution for POJO types]({% link dev/stream/state/schema_evolution.zh.md %}#pojo-types).
+Flink 的序列化器[支持的 POJO 类型数据结构升级]({% link dev/stream/state/schema_evolution.zh.md %}#pojo-types)。
 
-### Scala tuples and case classes
+### Scala tuples 和 case classes
 
-These work just as you'd expect.
+正如你期望的一样。

Review comment:
       感觉这句话优点突兀？能否再加一些描述，或者换一个更好的翻译呢？

##########
File path: docs/training/datastream_api.zh.md
##########
@@ -104,63 +99,56 @@ public class Example {
     public static void main(String[] args) throws Exception {
         final StreamExecutionEnvironment env =
                 StreamExecutionEnvironment.getExecutionEnvironment();
-
+    

Review comment:
       这些无关的修改，revert 掉吧
   下面的也一样

##########
File path: docs/training/datastream_api.zh.md
##########
@@ -104,63 +99,56 @@ public class Example {
     public static void main(String[] args) throws Exception {
         final StreamExecutionEnvironment env =
                 StreamExecutionEnvironment.getExecutionEnvironment();
-
+    
         DataStream<Person> flintstones = env.fromElements(
                 new Person("Fred", 35),
                 new Person("Wilma", 35),
                 new Person("Pebbles", 2));
-
+    
         DataStream<Person> adults = flintstones.filter(new FilterFunction<Person>() {
             @Override
             public boolean filter(Person person) throws Exception {
                 return person.age >= 18;
             }
         });
-
+    
         adults.print();
-
+    
         env.execute();
     }
-
+    
     public static class Person {
         public String name;
         public Integer age;
         public Person() {};
-
+    
         public Person(String name, Integer age) {
             this.name = name;
             this.age = age;
         };
-
+    
         public String toString() {
             return this.name.toString() + ": age " + this.age.toString();
         };
     }
 }
 {% endhighlight %}
 
-### Stream execution environment
+### Stream 执行环境
 
-Every Flink application needs an execution environment, `env` in this example. Streaming
-applications need to use a `StreamExecutionEnvironment`.
+每个 Flink 应用都需要有执行环境，在该示例中为 `env` 。流式应用需要用到 `StreamExecutionEnvironment`。
 
-The DataStream API calls made in your application build a job graph that is attached to the
-`StreamExecutionEnvironment`. When `env.execute()` is called this graph is packaged up and sent to
-the Flink Master, which parallelizes the job and distributes slices of it to the Task Managers for
-execution. Each parallel slice of your job will be executed in a *task slot*.
+DataStream API 将你的应用构建为一个由 `StreamExecutionEnvironment` 生成的 job graph。当调用 `env.execute()` 时此 graph 就被打包并发送到 Flink Master 上，后者对作业并行处理并将其切片分发给 Task Manager 来执行。每个作业并行切片将在 *task slot* 中执行。

Review comment:
       这个地方我理解 ”job graph 并不是 `StreamExecutionEnvironment` 生成的，仅仅是 attach 到 `StreamExecutionEnvironment`
   
   这里 “将其切片” 中的 “切片”有更好的翻译吗？这里相当于把一个算子给分成多个 subtask，然后并行处理

##########
File path: docs/training/datastream_api.zh.md
##########
@@ -104,63 +99,56 @@ public class Example {
     public static void main(String[] args) throws Exception {
         final StreamExecutionEnvironment env =
                 StreamExecutionEnvironment.getExecutionEnvironment();
-
+    
         DataStream<Person> flintstones = env.fromElements(
                 new Person("Fred", 35),
                 new Person("Wilma", 35),
                 new Person("Pebbles", 2));
-
+    
         DataStream<Person> adults = flintstones.filter(new FilterFunction<Person>() {
             @Override
             public boolean filter(Person person) throws Exception {
                 return person.age >= 18;
             }
         });
-
+    
         adults.print();
-
+    
         env.execute();
     }
-
+    
     public static class Person {
         public String name;
         public Integer age;
         public Person() {};
-
+    
         public Person(String name, Integer age) {
             this.name = name;
             this.age = age;
         };
-
+    
         public String toString() {
             return this.name.toString() + ": age " + this.age.toString();
         };
     }
 }
 {% endhighlight %}
 
-### Stream execution environment
+### Stream 执行环境
 
-Every Flink application needs an execution environment, `env` in this example. Streaming
-applications need to use a `StreamExecutionEnvironment`.
+每个 Flink 应用都需要有执行环境，在该示例中为 `env` 。流式应用需要用到 `StreamExecutionEnvironment`。

Review comment:
       ```suggestion
   每个 Flink 应用都需要有执行环境，在该示例中为 `env`。流式应用需要用到 `StreamExecutionEnvironment`。
   ```

##########
File path: docs/training/datastream_api.zh.md
##########
@@ -172,60 +160,49 @@ people.add(new Person("Pebbles", 2));
 DataStream<Person> flintstones = env.fromCollection(people);
 {% endhighlight %}
 
-Another convenient way to get some data into a stream while prototyping is to use a socket
+另一个获取数据到流中的便捷方法是用 socket
 
 {% highlight java %}
 DataStream<String> lines = env.socketTextStream("localhost", 9999)
 {% endhighlight %}
 
-or a file
+或读取文件
 
 {% highlight java %}
 DataStream<String> lines = env.readTextFile("file:///path");
 {% endhighlight %}
 
-In real applications the most commonly used data sources are those that support low-latency, high
-throughput parallel reads in combination with rewind and replay -- the prerequisites for high
-performance and fault tolerance -- such as Apache Kafka, Kinesis, and various filesystems. REST APIs
-and databases are also frequently used for stream enrichment.
+在真实的应用中，最常用的数据源是那些支持低延迟，高吞吐并行读取以及倒带和重放（高性能和容错能力为先决条件）的数据源，例如 Apache Kafka，Kinesis 和各种文件系统。REST API 和数据库也经常用于流富集（stream enrichment）。
 
-### Basic stream sinks
+### 基本的 stream sink
 
-The example above uses `adults.print()` to print its results to the task manager logs (which will
-appear in your IDE's console, when running in an IDE). This will call `toString()` on each element
-of the stream.
+上述示例用 `adults.print()` 打印其结果到 task manager 的日志中（如果运行在 IDE 中时，将追加到你的 IDE 控制台）。它会对流中的每个元素都调用 `toString()` 方法。
 
-The output looks something like this
+输出看起来类似于
 
     1> Fred: age 35
     2> Wilma: age 35
 
-where 1> and 2> indicate which sub-task (i.e., thread) produced the output.
+1> 和 2> 指出输出来自哪个 sub-task（即thread）

Review comment:
       ```suggestion
   1> 和 2> 指出输出来自哪个 sub-task（即 thread）
   ```

##########
File path: docs/training/datastream_api.zh.md
##########
@@ -172,60 +160,49 @@ people.add(new Person("Pebbles", 2));
 DataStream<Person> flintstones = env.fromCollection(people);
 {% endhighlight %}
 
-Another convenient way to get some data into a stream while prototyping is to use a socket
+另一个获取数据到流中的便捷方法是用 socket
 
 {% highlight java %}
 DataStream<String> lines = env.socketTextStream("localhost", 9999)
 {% endhighlight %}
 
-or a file
+或读取文件
 
 {% highlight java %}
 DataStream<String> lines = env.readTextFile("file:///path");
 {% endhighlight %}
 
-In real applications the most commonly used data sources are those that support low-latency, high
-throughput parallel reads in combination with rewind and replay -- the prerequisites for high
-performance and fault tolerance -- such as Apache Kafka, Kinesis, and various filesystems. REST APIs
-and databases are also frequently used for stream enrichment.
+在真实的应用中，最常用的数据源是那些支持低延迟，高吞吐并行读取以及倒带和重放（高性能和容错能力为先决条件）的数据源，例如 Apache Kafka，Kinesis 和各种文件系统。REST API 和数据库也经常用于流富集（stream enrichment）。

Review comment:
       “倒带和重放” 是否改成“重复” 就行呢？
   这里的 “stream enrichment” 是 “流处理的增强” -- 也就是说也会从 REST 或者数据库读取数据，增强流处理的能力？




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org