You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/05/04 08:52:38 UTC

[GitHub] [flink] leonardBang commented on a change in pull request #11897: [FLINK-16104] Translate "Streaming Aggregation" page of "Table API & SQL" into Chinese

leonardBang commented on a change in pull request #11897:
URL: https://github.com/apache/flink/pull/11897#discussion_r419283115



##########
File path: docs/dev/table/tuning/streaming_aggregation_optimization.zh.md
##########
@@ -22,33 +23,32 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-SQL is the most widely used language for data analytics. Flink's Table API and SQL enables users to define efficient stream analytics applications in less time and effort. Moreover, Flink Table API and SQL is effectively optimized, it integrates a lot of query optimizations and tuned operator implementations. But not all of the optimizations are enabled by default, so for some workloads, it is possible to improve performance by turning on some options.
+SQL 是数据分析中使用最广泛的语言。Flink Table API 和 SQL 使用户能够以更少的时间和精力定义高效的流分析应用程序。而且,Flink Table API 和 SQL 是有效优化过的,它集成了许多查询优化和算子优化。但并不是所有的优化都是默认开启的,因此对于某些工作负载,可以通过打开某些选项来提高性能。

Review comment:
       此外,Flink Table API 和 SQL 是高效优化过的

##########
File path: docs/dev/table/tuning/streaming_aggregation_optimization.zh.md
##########
@@ -22,33 +23,32 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-SQL is the most widely used language for data analytics. Flink's Table API and SQL enables users to define efficient stream analytics applications in less time and effort. Moreover, Flink Table API and SQL is effectively optimized, it integrates a lot of query optimizations and tuned operator implementations. But not all of the optimizations are enabled by default, so for some workloads, it is possible to improve performance by turning on some options.
+SQL 是数据分析中使用最广泛的语言。Flink Table API 和 SQL 使用户能够以更少的时间和精力定义高效的流分析应用程序。而且,Flink Table API 和 SQL 是有效优化过的,它集成了许多查询优化和算子优化。但并不是所有的优化都是默认开启的,因此对于某些工作负载,可以通过打开某些选项来提高性能。
 
-In this page, we will introduce some useful optimization options and the internals of streaming aggregation which will bring great improvement in some cases.
+在这一页,我们将介绍一些实用的优化选项以及流式聚合的内部原理,它们在某些情况下能带来很大的提升。
 
-<span class="label label-danger">Attention</span> Currently, the optimization options mentioned in this page are only supported in the Blink planner.
+<span class="label label-danger">注意</span> 目前,这一页提到的优化选项仅支持 Blink planner。
 
-<span class="label label-danger">Attention</span> Currently, the streaming aggregations optimization are only supported for [unbounded-aggregations]({{ site.baseurl }}/dev/table/sql/queries.html#aggregations). Optimizations for [window aggregations]({{ site.baseurl }}/dev/table/sql/queries.html#group-windows) will be supported in the future.
+<span class="label label-danger">注意</span> 目前,流聚合优化仅支持 [无界聚合]({{ site.baseurl }}/zh/dev/table/sql/queries.html#aggregations)。[窗口聚合]({{ site.baseurl }}/zh/dev/table/sql/queries.html#group-windows) 优化将在未来支持。

Review comment:
       {{ site.baseurl }}/zh/dev/table/sql/queries.html#aggregations -> {{ site.baseurl }}/zh/dev/table/sql/queries.html#聚合)
   {{ site.baseurl }}/zh/dev/table/sql/queries.html#group-windows -> {{ site.baseurl }}/zh/dev/table/sql/queries.html#分组窗口
   
   这些链接地址需要换成对应的中文页面链接,不然用户点击的时候跳转不到对应的位置

##########
File path: docs/dev/table/tuning/streaming_aggregation_optimization.zh.md
##########
@@ -94,28 +94,26 @@ configuration.set_string("table.exec.mini-batch.size", "5000"); # the maximum nu
 </div>
 </div>
 
-## Local-Global Aggregation
+## Local-Global 聚合
 
-Local-Global is proposed to solve data skew problem by dividing a group aggregation into two stages, that is doing local aggregation in upstream firstly, and followed by global aggregation in downstream, which is similar to Combine + Reduce pattern in MapReduce. For example, considering the following SQL:
+Local-Global 聚合是为解决数据倾斜问题提出的,通过将一组聚合分为两个阶段,首先在上游进行本地聚合,然后在下游进行全局聚合,类似于 MapReduce 中的 Combine + Reduce 模式。例如,就以下 SQL 而言:
 
 {% highlight sql %}
 SELECT color, sum(id)
 FROM T
 GROUP BY color
 {% endhighlight %}
 
-It is possible that the records in the data stream are skewed, thus some instances of aggregation operator have to process much more records than others, which leads to hotspot.
-The local aggregation can help to accumulate a certain amount of inputs which have the same key into a single accumulator. The global aggregation will only receive the reduced accumulators instead of large number of raw inputs.
-This can significantly reduce the network shuffle and the cost of state access. The number of inputs accumulated by local aggregation every time is based on mini-batch interval. It means local-global aggregation depends on mini-batch optimization is enabled.
+数据流中的记录可能会倾斜,因此某些聚合算子的实例必须比其他实例处理更多的记录,这会导致 hotspot。本地聚合可以将一定数量具有相同 key 的输入数据累加到单个累加器中。全局聚合将仅接收 reduce 后的累加器,而不是大量的原始输入数据。这可以大大减少网络 shuffle 和状态访问的成本。每次本地聚合累积的输入数据量基于 mini-batch 间隔。这意味着 local-global 聚合依赖于启用了 mini-batch 优化。

Review comment:
       这会产生热点问题

##########
File path: docs/dev/table/tuning/streaming_aggregation_optimization.zh.md
##########
@@ -195,17 +191,17 @@ GROUP BY day
 {% endhighlight %}
 
 
-The following figure shows how the split distinct aggregation improve performance (assuming color represents days, and letter represents user_id).
+下图显示了拆分 distinct 聚合如何提高性能(假设颜色表示 days,字母表示 user_id)。
 
 <div style="text-align: center">
   <img src="{{ site.baseurl }}/fig/table-streaming/distinct_split.png" width="70%" height="70%" />
 </div>
 
-NOTE: Above is the simplest example which can benefit from this optimization. Besides that, Flink supports to split more complex aggregation queries, for example, more than one distinct aggregates with different distinct key (e.g. `COUNT(DISTINCT a), SUM(DISTINCT b)`), works with other non-distinct aggregates (e.g. `SUM`, `MAX`, `MIN`, `COUNT`).
+注意:上面是可以从这个优化中受益的最简单的示例。除此之外,Flink 还支持拆分更复杂的聚合查询,例如,多个具有不同 distinct key (例如 `COUNT(DISTINCT a), SUM(DISTINCT b)` )的 distinct 聚合,可以与其他非 distinct 聚合(例如 `SUM`、`MAX`、`MIN`、`COUNT` )一起使用。
 
-<span class="label label-danger">Attention</span> However, currently, the split optimization doesn't support aggregations which contains user defined AggregateFunction.
+<span class="label label-danger">注意</span> 但是,当前,拆分优化不支持包含用户定义的 AggregateFunction 聚合。

Review comment:
       但是,当前 -> 当前
   这里我们意译下吧,直译过来有点怪怪的




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org