You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by vi...@apache.org on 2019/09/06 22:15:35 UTC
[incubator-hudi] branch asf-site updated: [HUDI-237] Translate
quickstart page
This is an automated email from the ASF dual-hosted git repository.
vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new a07fc11 [HUDI-237] Translate quickstart page
a07fc11 is described below
commit a07fc111912c184d8673ee5e6a2dd10f266bad05
Author: leesf <49...@qq.com>
AuthorDate: Thu Sep 5 15:37:14 2019 +0800
[HUDI-237] Translate quickstart page
---
docs/quickstart.cn.md | 78 ++++++++++++++++++++++++---------------------------
1 file changed, 36 insertions(+), 42 deletions(-)
diff --git a/docs/quickstart.cn.md b/docs/quickstart.cn.md
index db59664..75bcec0 100644
--- a/docs/quickstart.cn.md
+++ b/docs/quickstart.cn.md
@@ -7,50 +7,47 @@ toc: false
permalink: quickstart.html
---
<br/>
-To get a quick peek at Hudi's capabilities, we have put together a [demo video](https://www.youtube.com/watch?v=VhNgUsxdrD0)
-that showcases this on a docker based setup with all dependent systems running locally. We recommend you replicate the same setup
-and run the demo yourself, by following steps [here](docker_demo.html). Also, if you are looking for ways to migrate your existing data to Hudi,
-refer to [migration guide](migration_guide.html).
+为快速了解Hudi的功能,我们制作了一个基于Docker设置、所有依赖系统都在本地运行的[演示视频](https://www.youtube.com/watch?V=vhngusxdrd0)。
+我们建议您复制相同的设置然后按照[这里](docker_demo.html)的步骤自己运行这个演示。另外,如果您正在寻找将现有数据迁移到Hudi的方法,请参阅[迁移指南](migration_guide.html)。
-If you have Hive, Hadoop, Spark installed already & prefer to do it on your own setup, read on.
+如果您已经安装了hive、hadoop和spark,那么请继续阅读。
-## Download Hudi
+## 下载Hudi
-Check out [code](https://github.com/apache/incubator-hudi) or download [latest release](https://github.com/apache/incubator-hudi/archive/hudi-0.4.5.zip)
-and normally build the maven project, from command line
+Git检出[代码](https://github.com/apache/incubator-hudi)或下载[最新版本](https://github.com/apache/incubator-hudi/archive/hudi-0.4.5.zip)
+
+并通过命令行构建maven项目
```
$ mvn clean install -DskipTests -DskipITs
```
-To work with older version of Hive (pre Hive-1.2.1), use
+如果使用旧版本的Hive(Hive-1.2.1以前),使用
```
$ mvn clean install -DskipTests -DskipITs -Dhive11
```
-For IDE, you can pull in the code into IntelliJ as a normal maven project.
-You might want to add your spark jars folder to project dependencies under 'Module Setttings', to be able to run from IDE.
+> 对于IDE,您可以将代码作为普通的maven项目导入IntelliJ。
+您可能需要将spark jars文件夹添加到 'Module Settings' 下的项目依赖中,以便能够在IDE运行。
-### Version Compatibility
+### 版本兼容性
-Hudi requires Java 8 to be installed on a *nix system. Hudi works with Spark-2.x versions.
-Further, we have verified that Hudi works with the following combination of Hadoop/Hive/Spark.
+Hudi要求在*nix系统上安装Java 8。 Hudi使用Spark-2.x版本。此外,我们已经验证Hudi可使用以下Hadoop/Hive/Spark组合。
-| Hadoop | Hive | Spark | Instructions to Build Hudi |
+| Hadoop | Hive | Spark | 构建Hudi说明 |
| ---- | ----- | ---- | ---- |
| 2.6.0-cdh5.7.2 | 1.1.0-cdh5.7.2 | spark-2.[1-3].x | Use “mvn clean install -DskipTests -Dhadoop.version=2.6.0-cdh5.7.2 -Dhive.version=1.1.0-cdh5.7.2” |
| Apache hadoop-2.8.4 | Apache hive-2.3.3 | spark-2.[1-3].x | Use "mvn clean install -DskipTests" |
| Apache hadoop-2.7.3 | Apache hive-1.2.1 | spark-2.[1-3].x | Use "mvn clean install -DskipTests" |
-If your environment has other versions of hadoop/hive/spark, please try out Hudi
-and let us know if there are any issues.
+> 如果你的环境有其他版本的hadoop/hive/spark,请使用Hudi并告诉我们是否存在任何问题。
-## Generate Sample Dataset
+## 生成样本数据集
-### Environment Variables
+### 环境变量
-Please set the following environment variables according to your setup. We have given an example setup with CDH version
+请根据您的设置配置以下环境变量。我们给出了一个CDH版本的示例设置
```
cd incubator-hudi
@@ -65,10 +62,9 @@ export SPARK_CONF_DIR=$SPARK_HOME/conf
export PATH=$JAVA_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$SPARK_INSTALL/bin:$PATH
```
-### Run HoodieJavaApp
+### 运行HoodieJavaApp
-Run __hudi-spark/src/test/java/HoodieJavaApp.java__ class, to place a two commits (commit 1 => 100 inserts, commit 2 => 100 updates to previously inserted 100 records) onto your DFS/local filesystem. Use the wrapper script
-to run from command-line
+运行 __hudi-spark/src/test/java/HoodieJavaApp.java__ 类,将两个提交(提交1 => 100个插入,提交2 => 100个更新到先前插入的100个记录)放到DFS/本地文件系统上。从命令行运行脚本
```
cd hudi-spark
@@ -88,14 +84,13 @@ Usage: <main class> [options]
Default: COPY_ON_WRITE
```
-The class lets you choose table names, output paths and one of the storage types. In your own applications, be sure to include the `hudi-spark` module as dependency
-and follow a similar pattern to write/read datasets via the datasource.
+这个类允许您选择表名、输出路径和存储类型。在您自己的应用程序中,请确保包含`hudi-spark`模块依赖并遵循类似模式通过数据源写入/读取数据集。
-## Query a Hudi dataset
+## 查询Hudi数据集
-Next, we will register the sample dataset into Hive metastore and try to query using [Hive](#hive), [Spark](#spark) & [Presto](#presto)
+接下来,我们将样本数据集注册到Hive Metastore并尝试使用[Hive](#hive)、[Spark](#spark)和[Presto](#presto)进行查询
-### Start Hive Server locally
+### 本地启动Hive Server
```
hdfs namenode # start name node
@@ -110,10 +105,10 @@ bin/hiveserver2 \
```
-### Run Hive Sync Tool
-Hive Sync Tool will update/create the necessary metadata(schema and partitions) in hive metastore. This allows for schema evolution and incremental addition of new partitions written to.
-It uses an incremental approach by storing the last commit time synced in the TBLPROPERTIES and only syncing the commits from the last sync commit time stored.
-Both [Spark Datasource](writing_data.html#datasource-writer) & [DeltaStreamer](writing_data.html#deltastreamer) have capability to do this, after each write.
+### 运行Hive同步工具
+Hive同步工具将在hive Metastore中更新/创建必要的元数据(模式和分区),这允许模式演变和新分区的增量添加。它使用一种增量方法,将最后一次的同步提交时间存储在TBLPROPERTIES中,并且只同步上次存储的同步提交时间后的提交。
+每次写入后, [Spark Datasource](writing_data.html #datasource-writer)和[DeltaStreamer](writing_data.html#deltastreamer)都可执行此操作。
+
```
cd hudi-hive
@@ -127,13 +122,12 @@ cd hudi-hive
--partitioned-by field1,field2
```
-For some reason, if you want to do this by hand. Please
-follow [this](https://cwiki.apache.org/confluence/display/HUDI/Registering+sample+dataset+to+Hive+via+beeline).
+> 若你想亲手运行。请参考[这个](https://cwiki.apache.org/confluence/display/HUDI/Registering+sample+dataset+to+Hive+via+beeline).
### HiveQL {#hive}
-Let's first perform a query on the latest committed snapshot of the table
+我们首先对表的最新提交的快照进行查询
```
hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
@@ -149,7 +143,7 @@ hive>
### SparkSQL {#spark}
-Spark is super easy, once you get Hive working as above. Just spin up a Spark Shell as below
+只要你按上述方式让Hive工作起来,使用Spark将非常容易。只需启动Spark Shell,如下所示
```
$ cd $SPARK_INSTALL
@@ -164,19 +158,19 @@ scala> sqlContext.sql("select count(*) from hoodie_test").show(10000)
### Presto {#presto}
-Checkout the 'master' branch on OSS Presto, build it, and place your installation somewhere.
+Git检出OSS Presto上的 'master' 分支,构建并安装。
-* Copy the hudi/packaging/hudi-presto-bundle/target/hudi-presto-bundle-*.jar into $PRESTO_INSTALL/plugin/hive-hadoop2/
-* Startup your server and you should be able to query the same Hive table via Presto
+* 将hudi/packaging/hudi-presto-bundle/target/hudi-presto-bundle-*.jar 复制到 $PRESTO_INSTALL/plugin/hive-hadoop2/
+* 启动服务器,您应该能够通过Presto查询到相同的Hive表
```
show columns from hive.default.hoodie_test;
select count(*) from hive.default.hoodie_test
```
-### Incremental HiveQL
+### 增量 HiveQL
-Let's now perform a query, to obtain the __ONLY__ changed rows since a commit in the past.
+现在我们执行一个查询,以获取自上次提交以来已更改的行。
```
hive> set hoodie.hoodie_test.consume.mode=INCREMENTAL;
@@ -200,4 +194,4 @@ hive>
hive>
```
-This is only supported for Read-optimized view for now."
+> 注意:当前只有读优化视图支持。