You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ga...@apache.org on 2021/11/08 02:39:24 UTC

[flink] branch master updated (b296cb4 -> 5233638)

This is an automated email from the ASF dual-hosted git repository.

gaoyunhaii pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git.


    from b296cb4  [FLINK-23427][chinese-translation] Translate the page of "Blocking Shuffle" into Chinese
     new bd8c86d  Revert "[FLINK-23427][chinese-translation] Translate the page of "Blocking Shuffle" into Chinese"
     new 5233638  [FLINK-23427][chinese-translation] Translate the page of "Blocking Shuffle" into Chinese

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:

[flink] 02/02: [FLINK-23427][chinese-translation] Translate the page of "Blocking Shuffle" into Chinese

Posted by ga...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

gaoyunhaii pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 5233638062984f4e402218e64c8ef09e8570fb56
Author: gongzhongqiang <76...@qq.com>
AuthorDate: Mon Nov 8 10:20:22 2021 +0800

    [FLINK-23427][chinese-translation] Translate the page of "Blocking Shuffle" into Chinese
    
    This closes #16627.
---
 docs/content.zh/docs/ops/batch/blocking_shuffle.md | 58 +++++++++++-----------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/docs/content.zh/docs/ops/batch/blocking_shuffle.md b/docs/content.zh/docs/ops/batch/blocking_shuffle.md
index f30f816..4ff1dc8 100644
--- a/docs/content.zh/docs/ops/batch/blocking_shuffle.md
+++ b/docs/content.zh/docs/ops/batch/blocking_shuffle.md
@@ -27,63 +27,63 @@ under the License.
 
 # Blocking Shuffle
 
-## Overview
+## 总览
 
-Flink supports a batch execution mode in both [DataStream API]({{< ref "docs/dev/datastream/execution_mode" >}}) and [Table / SQL]({{< ref "/docs/dev/table/overview" >}}) for jobs executing across bounded input. In this mode, network exchanges occur via a blocking shuffle. Unlike the pipeline shuffle used for streaming applications, blocking exchanges persists data to some storage. Downstream tasks then fetch these values via the network. Such an exchange reduces the resources required t [...]
+Flink [DataStream API]({{< ref "docs/dev/datastream/execution_mode" >}}) 和 [Table / SQL]({{< ref "/docs/dev/table/overview" >}}) 都支持通过批处理执行模式处理有界输入。此模式是通过 blocking shuffle 进行网络传输。与流式应用使用管道 shuffle 阻塞交换的数据并存储,然后下游任务通过网络获取这些值的方式不同。这种交换减少了执行作业所需的资源,因为它不需要同时运行上游和下游任务。 
 
-As a whole, Flink provides two different types of blocking shuffles; `Hash shuffle` and `Sort shuffle`.
+总的来说,Flink 提供了两种不同类型的 blocking shuffles:`Hash shuffle` 和 `Sort shuffle`。
 
-They will be detailed in the following sections.
+在下面章节会详细说明它们。
 
 ## Hash Shuffle
 
-The default blocking shuffle implementation, `Hash Shuffle`, has each upstream task persist its results in a separate file for each downstream task on the local disk of the TaskManager. When the downstream tasks run, they will request partitions from the upstream TaskManager's, which read the files and transmit data via the network.
+`Hash Shuffle` 是 blocking shuffle 的默认实现,它为每个下游任务将每个上游任务的结果以单独文件的方式保存在 TaskManager 本地磁盘上。当下游任务运行时会向上游的 TaskManager 请求分片,TaskManager 读取文件之后通过网络传输(给下游任务)。
 
-`Hash Shuffle` provides different mechanisms for writing and reading files:
+`Hash Shuffle` 为读写文件提供了不同的机制:
 
-- `file`: Writes files with the normal File IO, reads and transmits files with Netty `FileRegion`. `FileRegion` relies on `sendfile` system call to reduce the number of data copies and memory consumption.
-- `mmap`: Writes and reads files with `mmap` system call.
-- `Auto`: Writes files with the normal File IO, for file reading, it falls back to normal `file` option on 32 bit machine and use `mmap` on 64 bit machine. This is to avoid file size limitation of java `mmap` implementation on 32 bit machine.
+- `file`: 通过标准文件 IO 写文件,读取和传输文件需要通过 Netty 的 `FileRegion`。`FileRegion` 依靠系统调用 `sendfile` 来减少数据拷贝和内存消耗。
+- `mmap`: 通过系统调用 `mmap` 来读写文件。
+- `Auto`: 通过标准文件 IO 写文件,对于文件读取,在 32 位机器上降级到 `file` 选项并且在 64 位机器上使用 `mmap` 。这是为了避免在 32 位机器上 java 实现 `mmap` 的文件大小限制。
 
-The different mechanism could be chosen via [TaskManager configurations]({{< ref "docs/deployment/config#taskmanager-network-blocking-shuffle-type" >}}).
+可通过设置 [TaskManager 参数]({{< ref "docs/deployment/config#taskmanager-network-blocking-shuffle-type" >}}) 选择不同的机制。
 
 {{< hint warning >}}
-This option is experimental and might be changed future.
+这个选项是实验性的,将来或许会有改动。
 {{< /hint >}}
 
 {{< hint warning >}}
-If [SSL]({{< ref "docs/deployment/security/security-ssl" >}}) is enabled, the `file` mechanism can not use `FileRegion` and instead uses an un-pooled buffer to cache data before transmitting. This might [cause direct memory OOM](https://issues.apache.org/jira/browse/FLINK-15981). Additionally, since the synchronous file reading might block Netty threads for some time, the [SSL handshake timeout]({{< ref "docs/deployment/config#security-ssl-internal-handshake-timeout" >}}) needs to be inc [...]
+如果开启 [SSL]({{< ref "docs/deployment/security/security-ssl" >}}),`file` 机制不能使用 `FileRegion` 而是在传输之前使用非池化的缓存去缓存数据。这可能会 [导致 direct memory OOM](https://issues.apache.org/jira/browse/FLINK-15981)。此外,因为同步读取文件有时会造成 netty 线程阻塞,[SSL handshake timeout]({{< ref "docs/deployment/config#security-ssl-internal-handshake-timeout" >}}) 配置需要调大以防 [connection reset 异常](https://issues.apache.org/jira/browse/FLINK-21416)。
 {{< /hint >}}
 
 {{< hint info >}}
-The memory usage of `mmap` is not accounted for by configured memory limits, but some resource frameworks like Yarn will track this memory usage and kill the container if memory exceeds some threshold.
+`mmap`使用的内存不计算进已有配置的内存限制中,但是一些资源管理框架比如 yarn 将追踪这块内存使用,并且如果容器使用内存超过阈值会被杀掉。
 {{< /hint >}}
 
-To further improve the performance, for most jobs we also recommend [enabling compression]({{< ref "docs/deployment/config">}}#taskmanager-network-blocking-shuffle-compression-enabled) unless the data is hard to compress.
+为了进一步的提升性能,对于绝大多数的任务我们推荐 [启用压缩]({{< ref "docs/deployment/config">}}#taskmanager-network-blocking-shuffle-compression-enabled) ,除非数据很难被压缩。
 
-`Hash Shuffle` works well for small scale jobs with SSD, but it also have some disadvantages:
+`Hash Shuffle` 在小规模运行在固态硬盘的任务情况下效果显著,但是依旧有一些问题:
 
-1. If the job scale is large, it might create too many files, and it requires a large write buffer to write these files at the same time.
-2. On HDD, when multiple downstream tasks fetch their data simultaneously, it might incur the issue of random IO.
+1. 如果任务的规模庞大将会创建很多文件,并且要求同时对这些文件进行大量的写操作。
+2. 在机械硬盘情况下,当大量的下游任务同时读取数据,可能会导致随机读写问题。
 
 ## Sort Shuffle
 
-`Sort Shuffle` is another blocking shuffle implementation introduced in version 1.13. Different from `Hash Shuffle`, sort shuffle writes only one file for each result partition. When the result partition is read by multiple downstream tasks concurrently, the data file is opened only once and shared by all readers. As a result, the cluster uses fewer resources like inode and file descriptors, which improves stability. Furthermore, by writing fewer files and making a best effort to read da [...]
+`Sort Shuffle` 是 1.13 版中引入的另一种 blocking shuffle 实现。不同于 `Hash Shuffle`,sort shuffle 将每个分区结果写入到一个文件。当多个下游任务同时读取结果分片,数据文件只会被打开一次并共享给所有的读请求。因此,集群使用更少的资源。例如:节点和文件描述符以提升稳定性。此外,通过写更少的文件和尽可能线性的读取文件,尤其是在使用机械硬盘情况下 sort shuffle 可以获得比 hash shuffle 更好的性能。另外,`sort shuffle` 使用额外管理的内存作为读数据缓存并不依赖 `sendfile` 或 `mmap` 机制,因此也适用于 [SSL]({{< ref "docs/deployment/security/security-ssl" >}})。关于 sort shuffle 的更多细节请参考 [FLINK-19582](https://issues.apache.org/jira/browse/FLINK-19582) 和 [FLINK-19614](https://issues.a [...]
 
-There are several config options that might need adjustment when using sort blocking shuffle:
-- [taskmanager.network.blocking-shuffle.compression.enabled]({{< ref "docs/deployment/config" >}}#taskmanager-network-blocking-shuffle-compression-enabled): Config option for shuffle data compression. it is suggested to enable it for most jobs except that the compression ratio of your data is low.
-- [taskmanager.network.sort-shuffle.min-parallelism]({{< ref "docs/deployment/config" >}}#taskmanager-network-sort-shuffle-min-parallelism): Config option to enable sort shuffle depending on the parallelism of downstream tasks. If parallelism is lower than the configured value, `hash shuffle` will be used, otherwise `sort shuffle` will be used.
-- [taskmanager.network.sort-shuffle.min-buffers]({{< ref "docs/deployment/config" >}}#taskmanager-network-sort-shuffle-min-buffers): Config option to control data writing buffer size. For large scale jobs, you may need to increase this value, usually, several hundreds of megabytes memory is enough.
-- [taskmanager.memory.framework.off-heap.batch-shuffle.size]({{< ref "docs/deployment/config" >}}#taskmanager-memory-framework-off-heap-batch-shuffle-size): Config option to control data reading buffer size. For large scale jobs, you may need to increase this value, usually, several hundreds of megabytes memory is enough.
+当使用sort blocking shuffle的时候有些配置需要适配:
+- [taskmanager.network.blocking-shuffle.compression.enabled]({{< ref "docs/deployment/config" >}}#taskmanager-network-blocking-shuffle-compression-enabled): 配置该选项以启用 shuffle data 压缩,大部分任务建议开启除非你的数据压缩比率比较低。
+- [taskmanager.network.sort-shuffle.min-parallelism]({{< ref "docs/deployment/config" >}}#taskmanager-network-sort-shuffle-min-parallelism): 根据下游任务的并行度配置该选项以启用 sort shuffle。如果并行度低于设置的值,则使用 `hash shuffle`,否则 `sort shuffle`。
+- [taskmanager.network.sort-shuffle.min-buffers]({{< ref "docs/deployment/config" >}}#taskmanager-network-sort-shuffle-min-buffers): 配置该选项以控制数据写缓存大小。对于大规模的任务而言,你可能需要调大这个值,正常几百兆内存就足够了。
+- [taskmanager.memory.framework.off-heap.batch-shuffle.size]({{< ref "docs/deployment/config" >}}#taskmanager-memory-framework-off-heap-batch-shuffle-size): 配置该选项以控制数据读取缓存大小。对于大规模的任务而言,你可能需要调大这个值,正常几百兆内存就足够了。
 
 {{< hint info >}}
-Currently `sort shuffle` only sort records by partition index instead of the records themselves, that is to say, the `sort` is only used as a data clustering algorithm.
+目前 `sort shuffle` 只通过分区索引来排序而不是记录本身,也就是说 `sort` 只是被当成数据聚类算法使用。
 {{< /hint >}}
 
-## Choices of Blocking Shuffle
+## 如何选择 Blocking Shuffle
 
-As a summary,
+总的来说,
 
-- For small scale jobs running on SSD, both implementation should work.
-- For large scale jobs or for jobs running on HDD, `sort shuffle` should be more suitable.
-- In both case, you may consider [enabling compression]({{< ref "docs/deployment/config">}}#taskmanager-network-blocking-shuffle-compression-enabled) to improve the performance unless the data is hard to compress.
+- 对于在固态硬盘上运行的小规模任务而言,两者都可以。
+- 对于在机械硬盘上运行的大规模任务而言,`sort shuffle` 更为合适。
+- 在这两种情况下,你可以考虑 [enabling compression]({{< ref "docs/deployment/config">}}#taskmanager-network-blocking-shuffle-compression-enabled) 来提升性能,除非数据很难被压缩。

[flink] 01/02: Revert "[FLINK-23427][chinese-translation] Translate the page of "Blocking Shuffle" into Chinese"

Posted by ga...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

gaoyunhaii pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git

commit bd8c86d5ce08fc7498e75d0065c1531b538f1ae6
Author: Yun Gao <ga...@gmail.com>
AuthorDate: Mon Nov 8 10:32:52 2021 +0800

    Revert "[FLINK-23427][chinese-translation] Translate the page of "Blocking Shuffle" into Chinese"
    
    This reverts commit b296cb4849efb05ac94da7b5e95b7e5afa3d8e14.
---
 docs/content.zh/docs/ops/batch/blocking_shuffle.md | 58 +++++++++++-----------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/docs/content.zh/docs/ops/batch/blocking_shuffle.md b/docs/content.zh/docs/ops/batch/blocking_shuffle.md
index 4ff1dc8..f30f816 100644
--- a/docs/content.zh/docs/ops/batch/blocking_shuffle.md
+++ b/docs/content.zh/docs/ops/batch/blocking_shuffle.md
@@ -27,63 +27,63 @@ under the License.
 
 # Blocking Shuffle
 
-## 总览
+## Overview
 
-Flink [DataStream API]({{< ref "docs/dev/datastream/execution_mode" >}}) 和 [Table / SQL]({{< ref "/docs/dev/table/overview" >}}) 都支持通过批处理执行模式处理有界输入。此模式是通过 blocking shuffle 进行网络传输。与流式应用使用管道 shuffle 阻塞交换的数据并存储,然后下游任务通过网络获取这些值的方式不同。这种交换减少了执行作业所需的资源,因为它不需要同时运行上游和下游任务。 
+Flink supports a batch execution mode in both [DataStream API]({{< ref "docs/dev/datastream/execution_mode" >}}) and [Table / SQL]({{< ref "/docs/dev/table/overview" >}}) for jobs executing across bounded input. In this mode, network exchanges occur via a blocking shuffle. Unlike the pipeline shuffle used for streaming applications, blocking exchanges persists data to some storage. Downstream tasks then fetch these values via the network. Such an exchange reduces the resources required t [...]
 
-总的来说,Flink 提供了两种不同类型的 blocking shuffles:`Hash shuffle` 和 `Sort shuffle`。
+As a whole, Flink provides two different types of blocking shuffles; `Hash shuffle` and `Sort shuffle`.
 
-在下面章节会详细说明它们。
+They will be detailed in the following sections.
 
 ## Hash Shuffle
 
-`Hash Shuffle` 是 blocking shuffle 的默认实现,它为每个下游任务将每个上游任务的结果以单独文件的方式保存在 TaskManager 本地磁盘上。当下游任务运行时会向上游的 TaskManager 请求分片,TaskManager 读取文件之后通过网络传输(给下游任务)。
+The default blocking shuffle implementation, `Hash Shuffle`, has each upstream task persist its results in a separate file for each downstream task on the local disk of the TaskManager. When the downstream tasks run, they will request partitions from the upstream TaskManager's, which read the files and transmit data via the network.
 
-`Hash Shuffle` 为读写文件提供了不同的机制:
+`Hash Shuffle` provides different mechanisms for writing and reading files:
 
-- `file`: 通过标准文件 IO 写文件,读取和传输文件需要通过 Netty 的 `FileRegion`。`FileRegion` 依靠系统调用 `sendfile` 来减少数据拷贝和内存消耗。
-- `mmap`: 通过系统调用 `mmap` 来读写文件。
-- `Auto`: 通过标准文件 IO 写文件,对于文件读取,在 32 位机器上降级到 `file` 选项并且在 64 位机器上使用 `mmap` 。这是为了避免在 32 位机器上 java 实现 `mmap` 的文件大小限制。
+- `file`: Writes files with the normal File IO, reads and transmits files with Netty `FileRegion`. `FileRegion` relies on `sendfile` system call to reduce the number of data copies and memory consumption.
+- `mmap`: Writes and reads files with `mmap` system call.
+- `Auto`: Writes files with the normal File IO, for file reading, it falls back to normal `file` option on 32 bit machine and use `mmap` on 64 bit machine. This is to avoid file size limitation of java `mmap` implementation on 32 bit machine.
 
-可通过设置 [TaskManager 参数]({{< ref "docs/deployment/config#taskmanager-network-blocking-shuffle-type" >}}) 选择不同的机制。
+The different mechanism could be chosen via [TaskManager configurations]({{< ref "docs/deployment/config#taskmanager-network-blocking-shuffle-type" >}}).
 
 {{< hint warning >}}
-这个选项是实验性的,将来或许会有改动。
+This option is experimental and might be changed future.
 {{< /hint >}}
 
 {{< hint warning >}}
-如果开启 [SSL]({{< ref "docs/deployment/security/security-ssl" >}}),`file` 机制不能使用 `FileRegion` 而是在传输之前使用非池化的缓存去缓存数据。这可能会 [导致 direct memory OOM](https://issues.apache.org/jira/browse/FLINK-15981)。此外,因为同步读取文件有时会造成 netty 线程阻塞,[SSL handshake timeout]({{< ref "docs/deployment/config#security-ssl-internal-handshake-timeout" >}}) 配置需要调大以防 [connection reset 异常](https://issues.apache.org/jira/browse/FLINK-21416)。
+If [SSL]({{< ref "docs/deployment/security/security-ssl" >}}) is enabled, the `file` mechanism can not use `FileRegion` and instead uses an un-pooled buffer to cache data before transmitting. This might [cause direct memory OOM](https://issues.apache.org/jira/browse/FLINK-15981). Additionally, since the synchronous file reading might block Netty threads for some time, the [SSL handshake timeout]({{< ref "docs/deployment/config#security-ssl-internal-handshake-timeout" >}}) needs to be inc [...]
 {{< /hint >}}
 
 {{< hint info >}}
-`mmap`使用的内存不计算进已有配置的内存限制中,但是一些资源管理框架比如 yarn 将追踪这块内存使用,并且如果容器使用内存超过阈值会被杀掉。
+The memory usage of `mmap` is not accounted for by configured memory limits, but some resource frameworks like Yarn will track this memory usage and kill the container if memory exceeds some threshold.
 {{< /hint >}}
 
-为了进一步的提升性能,对于绝大多数的任务我们推荐 [启用压缩]({{< ref "docs/deployment/config">}}#taskmanager-network-blocking-shuffle-compression-enabled) ,除非数据很难被压缩。
+To further improve the performance, for most jobs we also recommend [enabling compression]({{< ref "docs/deployment/config">}}#taskmanager-network-blocking-shuffle-compression-enabled) unless the data is hard to compress.
 
-`Hash Shuffle` 在小规模运行在固态硬盘的任务情况下效果显著,但是依旧有一些问题:
+`Hash Shuffle` works well for small scale jobs with SSD, but it also have some disadvantages:
 
-1. 如果任务的规模庞大将会创建很多文件,并且要求同时对这些文件进行大量的写操作。
-2. 在机械硬盘情况下,当大量的下游任务同时读取数据,可能会导致随机读写问题。
+1. If the job scale is large, it might create too many files, and it requires a large write buffer to write these files at the same time.
+2. On HDD, when multiple downstream tasks fetch their data simultaneously, it might incur the issue of random IO.
 
 ## Sort Shuffle
 
-`Sort Shuffle` 是 1.13 版中引入的另一种 blocking shuffle 实现。不同于 `Hash Shuffle`,sort shuffle 将每个分区结果写入到一个文件。当多个下游任务同时读取结果分片,数据文件只会被打开一次并共享给所有的读请求。因此,集群使用更少的资源。例如:节点和文件描述符以提升稳定性。此外,通过写更少的文件和尽可能线性的读取文件,尤其是在使用机械硬盘情况下 sort shuffle 可以获得比 hash shuffle 更好的性能。另外,`sort shuffle` 使用额外管理的内存作为读数据缓存并不依赖 `sendfile` 或 `mmap` 机制,因此也适用于 [SSL]({{< ref "docs/deployment/security/security-ssl" >}})。关于 sort shuffle 的更多细节请参考 [FLINK-19582](https://issues.apache.org/jira/browse/FLINK-19582) 和 [FLINK-19614](https://issues.a [...]
+`Sort Shuffle` is another blocking shuffle implementation introduced in version 1.13. Different from `Hash Shuffle`, sort shuffle writes only one file for each result partition. When the result partition is read by multiple downstream tasks concurrently, the data file is opened only once and shared by all readers. As a result, the cluster uses fewer resources like inode and file descriptors, which improves stability. Furthermore, by writing fewer files and making a best effort to read da [...]
 
-当使用sort blocking shuffle的时候有些配置需要适配:
-- [taskmanager.network.blocking-shuffle.compression.enabled]({{< ref "docs/deployment/config" >}}#taskmanager-network-blocking-shuffle-compression-enabled): 配置该选项以启用 shuffle data 压缩,大部分任务建议开启除非你的数据压缩比率比较低。
-- [taskmanager.network.sort-shuffle.min-parallelism]({{< ref "docs/deployment/config" >}}#taskmanager-network-sort-shuffle-min-parallelism): 根据下游任务的并行度配置该选项以启用 sort shuffle。如果并行度低于设置的值,则使用 `hash shuffle`,否则 `sort shuffle`。
-- [taskmanager.network.sort-shuffle.min-buffers]({{< ref "docs/deployment/config" >}}#taskmanager-network-sort-shuffle-min-buffers): 配置该选项以控制数据写缓存大小。对于大规模的任务而言,你可能需要调大这个值,正常几百兆内存就足够了。
-- [taskmanager.memory.framework.off-heap.batch-shuffle.size]({{< ref "docs/deployment/config" >}}#taskmanager-memory-framework-off-heap-batch-shuffle-size): 配置该选项以控制数据读取缓存大小。对于大规模的任务而言,你可能需要调大这个值,正常几百兆内存就足够了。
+There are several config options that might need adjustment when using sort blocking shuffle:
+- [taskmanager.network.blocking-shuffle.compression.enabled]({{< ref "docs/deployment/config" >}}#taskmanager-network-blocking-shuffle-compression-enabled): Config option for shuffle data compression. it is suggested to enable it for most jobs except that the compression ratio of your data is low.
+- [taskmanager.network.sort-shuffle.min-parallelism]({{< ref "docs/deployment/config" >}}#taskmanager-network-sort-shuffle-min-parallelism): Config option to enable sort shuffle depending on the parallelism of downstream tasks. If parallelism is lower than the configured value, `hash shuffle` will be used, otherwise `sort shuffle` will be used.
+- [taskmanager.network.sort-shuffle.min-buffers]({{< ref "docs/deployment/config" >}}#taskmanager-network-sort-shuffle-min-buffers): Config option to control data writing buffer size. For large scale jobs, you may need to increase this value, usually, several hundreds of megabytes memory is enough.
+- [taskmanager.memory.framework.off-heap.batch-shuffle.size]({{< ref "docs/deployment/config" >}}#taskmanager-memory-framework-off-heap-batch-shuffle-size): Config option to control data reading buffer size. For large scale jobs, you may need to increase this value, usually, several hundreds of megabytes memory is enough.
 
 {{< hint info >}}
-目前 `sort shuffle` 只通过分区索引来排序而不是记录本身,也就是说 `sort` 只是被当成数据聚类算法使用。
+Currently `sort shuffle` only sort records by partition index instead of the records themselves, that is to say, the `sort` is only used as a data clustering algorithm.
 {{< /hint >}}
 
-## 如何选择 Blocking Shuffle
+## Choices of Blocking Shuffle
 
-总的来说,
+As a summary,
 
-- 对于在固态硬盘上运行的小规模任务而言,两者都可以。
-- 对于在机械硬盘上运行的大规模任务而言,`sort shuffle` 更为合适。
-- 在这两种情况下,你可以考虑 [enabling compression]({{< ref "docs/deployment/config">}}#taskmanager-network-blocking-shuffle-compression-enabled) 来提升性能,除非数据很难被压缩。
+- For small scale jobs running on SSD, both implementation should work.
+- For large scale jobs or for jobs running on HDD, `sort shuffle` should be more suitable.
+- In both case, you may consider [enabling compression]({{< ref "docs/deployment/config">}}#taskmanager-network-blocking-shuffle-compression-enabled) to improve the performance unless the data is hard to compress.