You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2019/09/20 09:54:04 UTC

[GitHub] [incubator-hudi] yanghua commented on a change in pull request #911: HUDI-220 Translate root index page

yanghua commented on a change in pull request #911: HUDI-220 Translate root index page
URL: https://github.com/apache/incubator-hudi/pull/911#discussion_r326556053
 
 

 ##########
 File path: docs/index.cn.md
 ##########
 @@ -1,23 +1,24 @@
 ---
-title: What is Hudi?
+title: 什么是Hudi?
 keywords: big data, stream processing, cloud, hdfs, storage, upserts, change capture
 tags: [getting_started]
 sidebar: mydoc_sidebar
 permalink: index.html
-summary: "Hudi brings stream processing to big data, providing fresh data while being an order of magnitude efficient over traditional batch processing."
+summary: "Hudi为大数据带来流处理,在提供新数据的同时,比传统的批处理效率高出一个数量级。"
 ---
 
-Hudi (pronounced “Hoodie”) ingests & manages storage of large analytical datasets over DFS ([HDFS](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) or cloud stores) and provides three logical views for query access.
+Hudi(发音为“hoodie”)摄取与管理处于DFS([HDFS](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) 或云存储)之上的大型分析数据集并为查询访问提供三个逻辑视图。
+
+ * **读优化视图** - 在纯列式存储上提供出色的查询性能,非常像[parquet](https://parquet.apache.org/)表。
+ * **增量视图** - 在数据集之上提供一个变更流并提供给下游的作业或ETL任务。
+ * **准实时的表** - 提供对准实时数据的查询, 联合使用了基于行与列的存储 (例如 Parquet + [Avro](http://avro.apache.org/docs/current/mr.html))
 
- * **Read Optimized View** - Provides excellent query performance on pure columnar storage, much like plain [Parquet](https://parquet.apache.org/) tables.
- * **Incremental View** - Provides a change stream out of the dataset to feed downstream jobs/ETLs.
- * **Near-Real time Table** - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + [Avro](http://avro.apache.org/docs/current/mr.html))
 
 
 <figure>
     <img class="docimage" src="/images/hudi_intro_1.png" alt="hudi_intro_1.png" />
 </figure>
 
-By carefully managing how data is laid out in storage & how it’s exposed to queries, Hudi is able to power a rich data ecosystem where external sources can be ingested in near real-time and made available for interactive SQL Engines like [Presto](https://prestodb.io) & [Spark](https://spark.apache.org/sql/), while at the same time capable of being consumed incrementally from processing/ETL frameworks like [Hive](https://hive.apache.org/) & [Spark](https://spark.apache.org/docs/latest/) to build derived (Hudi) datasets.
+通过仔细地管理数据在存储中的布局和如何将数据暴露给查询,Hudi能够为一个丰富的数据生态系统提供动力,在这个系统中,可以几乎实时地接收外部资源,并使其可用于[presto](https://prestodb.io)和[spark](https://spark.apache.org/sql/)等交互式SQL引擎,同时能够从处理/ETL框架(如[hive](https://hive.apache.org/)& [spark](https://spark.apache.org/docs/latest/)中进行增量消费以构建派生(Hudi)数据集。
 
-Hudi broadly consists of a self contained Spark library to build datasets and integrations with existing query engines for data access. See [quickstart](quickstart.html) for a demo.
+Hudi 大体上由一个自包含的Spark库组成,它用于构建数据集并与现有的数据访问查询引擎集成。有关演示,请参见[快速启动](quickstart.html)。
 
 Review comment:
   No, I just tried this, when I checked to the Chinese version, it will add `cn` automatically.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services