You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2019/11/13 01:36:49 UTC

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1010: [HUDI-277] Translate the Performance page into Chinese

leesf commented on a change in pull request #1010: [HUDI-277] Translate the Performance page into Chinese
URL: https://github.com/apache/incubator-hudi/pull/1010#discussion_r345532446
 
 

 ##########
 File path: docs/performance.cn.md
 ##########
 @@ -1,50 +1,47 @@
 ---
-title: Performance
+title: 性能
 keywords: hudi, index, storage, compaction, cleaning, implementation
 sidebar: mydoc_sidebar
 toc: false
 permalink: performance.html
 ---
 
-In this section, we go over some real world performance numbers for Hudi upserts, incremental pull and compare them against
-the conventional alternatives for achieving these tasks. 
+在本节中,我们将介绍一些有关Hudi插入更新、增量提取的实际性能数据,并将其与实现这些任务的其它传统工具进行比较。
 
-## Upserts
+## 插入更新
 
-Following shows the speed up obtained for NoSQL database ingestion, from incrementally upserting on a Hudi dataset on the copy-on-write storage,
-on 5 tables ranging from small to huge (as opposed to bulk loading the tables)
+下面显示了从NoSQL数据库摄取获得的速度提升,这些速度提升数据是通过在写入时复制存储上的Hudi数据集上插入更新而获得的,
+数据集包括5个从小到大的表(相对于批量加载表)。
 
 <figure>
     <img class="docimage" src="/images/hudi_upsert_perf1.png" alt="hudi_upsert_perf1.png" style="max-width: 1000px" />
 </figure>
 
-Given Hudi can build the dataset incrementally, it opens doors for also scheduling ingesting more frequently thus reducing latency, with
-significant savings on the overall compute cost.
+由于Hudi可以通过增量构建数据集,它也为更频繁地调度摄取提供了可能性,从而减少了延迟,并显著节省了总体计算成本。
 
 <figure>
     <img class="docimage" src="/images/hudi_upsert_perf2.png" alt="hudi_upsert_perf2.png" style="max-width: 1000px" />
 </figure>
 
-Hudi upserts have been stress tested upto 4TB in a single commit across the t1 table. 
-See [here](https://cwiki.apache.org/confluence/display/HUDI/Tuning+Guide) for some tuning tips.
+Hudi插入更新在t1表的一次提交中就进行了高达4TB的压力测试。
+有关一些调优技巧,请参见[这里](https://cwiki.apache.org/confluence/display/HUDI/Tuning+Guide)。
 
-## Indexing
+## 索引
 
-In order to efficiently upsert data, Hudi needs to classify records in a write batch into inserts & updates (tagged with the file group 
-it belongs to). In order to speed this operation, Hudi employs a pluggable index mechanism that stores a mapping between recordKey and 
-the file group id it belongs to. By default, Hudi uses a built in index that uses file ranges and bloom filters to accomplish this, with
-upto 10x speed up over a spark join to do the same. 
+为了有效地插入更新数据,Hudi需要将要写入的批量数据中的记录分类为插入和更新(并标记它所属的文件组)。
+为了加快此操作的速度,Hudi采用了可插入索引机制,该机制存储了recordKey和它所属的文件组ID之间的映射。
 
 Review comment:
   可插入 -> 可插拔?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services