You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by vi...@apache.org on 2019/10/10 12:43:17 UTC
[incubator-hudi] branch asf-site updated: [Docs] Updating site to reflect recent doc changes (#950)

This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new c0a6642  [Docs] Updating site to reflect recent doc changes (#950)
c0a6642 is described below

commit c0a66420f63aa987893d3e202eaa6fcf473ea27b
Author: Bhavani Sudha Saktheeswaran <bh...@uber.com>
AuthorDate: Thu Oct 10 05:43:12 2019 -0700

    [Docs] Updating site to reflect recent doc changes (#950)
---
 content/README.md             |   2 +-
 content/cn/admin_guide.html   | 174 ++++++++++++-----------
 content/cn/comparison.html    |  71 +++++-----
 content/cn/index.html         |  24 ++--
 content/cn/powered_by.html    |  18 +--
 content/cn/querying_data.html |   6 +-
 content/configurations.html   |   2 +-
 content/contributing.html     |   8 ++
 content/feed.xml              |   4 +-
 content/querying_data.html    |   6 +-
 content/quickstart.html       | 312 +++++++++++++++++-------------------------
 content/search.json           |   6 +-
 content/writing_data.html     |   2 +
 13 files changed, 288 insertions(+), 347 deletions(-)

diff --git a/content/README.md b/content/README.md
index 4307a6a..74c78e1 100644
--- a/content/README.md
+++ b/content/README.md
@@ -5,7 +5,7 @@ This folder contains resources that build the [Apache Hudi website](https://hudi
 
 ### Building docs
 
-The site is based on a [Jekyll](https://jekyllrb.com/) theme hosted [here](idratherbewriting.com/documentation-theme-jekyll/) with detailed instructions.
+The site is based on a [Jekyll](https://jekyllrb.com/) theme hosted [here](https://idratherbewriting.com/documentation-theme-jekyll/) with detailed instructions.
 
 #### Docker
 
diff --git a/content/cn/admin_guide.html b/content/cn/admin_guide.html
index fc64ae6..1023472 100644
--- a/content/cn/admin_guide.html
+++ b/content/cn/admin_guide.html
@@ -3,7 +3,7 @@
     <meta charset="utf-8">
 <meta http-equiv="X-UA-Compatible" content="IE=edge">
 <meta name="viewport" content="width=device-width, initial-scale=1">
-<meta name="description" content="This section offers an overview of tools available to operate an ecosystem of Hudi datasets">
+<meta name="description" content="本节概述了可用于操作Hudi数据集生态系统的工具">
 <meta name="keywords" content="hudi, administration, operation, devops">
 <title>Administering Hudi Pipelines | Hudi</title>
 <link rel="stylesheet" href="/css/syntax.css">
@@ -332,7 +332,7 @@
 <div class="post-content">
 
    
-    <div class="summary">This section offers an overview of tools available to operate an ecosystem of Hudi datasets</div>
+    <div class="summary">本节概述了可用于操作Hudi数据集生态系统的工具</div>
    
 
     
@@ -340,23 +340,23 @@
 
     
 
-  <p>Admins/ops can gain visibility into Hudi datasets/pipelines in the following ways</p>
+  <p>管理员/运维人员可以通过以下方式了解Hudi数据集/管道</p>
 
 <ul>
-  <li><a href="#admin-cli">Administering via the Admin CLI</a></li>
-  <li><a href="#metrics">Graphite metrics</a></li>
-  <li><a href="#spark-ui">Spark UI of the Hudi Application</a></li>
+  <li><a href="#admin-cli">通过Admin CLI进行管理</a></li>
+  <li><a href="#metrics">Graphite指标</a></li>
+  <li><a href="#spark-ui">Hudi应用程序的Spark UI</a></li>
 </ul>
 
-<p>This section provides a glimpse into each of these, with some general guidance on <a href="#troubleshooting">troubleshooting</a></p>
+<p>本节简要介绍了每一种方法，并提供了有关<a href="#troubleshooting">故障排除</a>的一些常规指南</p>
 
 <h2 id="admin-cli">Admin CLI</h2>
 
-<p>Once hudi has been built, the shell can be fired by via  <code class="highlighter-rouge">cd hudi-cli &amp;&amp; ./hudi-cli.sh</code>.
-A hudi dataset resides on DFS, in a location referred to as the <strong>basePath</strong> and we would need this location in order to connect to a Hudi dataset.
-Hudi library effectively manages this dataset internally, using .hoodie subfolder to track all metadata</p>
+<p>一旦构建了hudi，就可以通过<code class="highlighter-rouge">cd hudi-cli &amp;&amp; ./hudi-cli.sh</code>启动shell。
+一个hudi数据集位于DFS上的<strong>basePath</strong>位置，我们需要该位置才能连接到Hudi数据集。
+Hudi库使用.hoodie子文件夹跟踪所有元数据，从而有效地在内部管理该数据集。</p>
 
-<p>To initialize a hudi table, use the following command.</p>
+<p>初始化hudi表，可使用如下命令。</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>18/09/06 15:56:52 INFO annotation.AutowiredAnnotationBeanPostProcessor: JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
 ============================================
@@ -391,7 +391,7 @@ hudi-&gt;create --path /user/hive/warehouse/table1 --tableName hoodie_table_1 --
     | hoodie.archivelog.folder|                              |
 </code></pre></div></div>
 
-<p>Following is a sample command to connect to a Hudi dataset contains uber trips.</p>
+<p>以下是连接到包含uber trips的Hudi数据集的示例命令。</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:trips-&gt;connect --path /app/uber/trips
 
@@ -402,8 +402,7 @@ Metadata for table trips loaded
 hoodie:trips-&gt;
 </code></pre></div></div>
 
-<p>Once connected to the dataset, a lot of other commands become available. The shell has contextual autocomplete help (press TAB) and below is a list of all commands, few of which are reviewed in this section
-are reviewed</p>
+<p>连接到数据集后，便可使用许多其他命令。该shell程序具有上下文自动完成帮助(按TAB键)，下面是所有命令的列表，本节中对其中的一些命令进行了详细示例。</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:trips-&gt;help
 * ! - Allows execution of operating system (OS) commands
@@ -436,12 +435,12 @@ are reviewed</p>
 hoodie:trips-&gt;
 </code></pre></div></div>
 
-<h4 id="inspecting-commits">Inspecting Commits</h4>
+<h4 id="检查提交">检查提交</h4>
 
-<p>The task of upserting or inserting a batch of incoming records is known as a <strong>commit</strong> in Hudi. A commit provides basic atomicity guarantees such that only commited data is available for querying.
-Each commit has a monotonically increasing string/number called the <strong>commit number</strong>. Typically, this is the time at which we started the commit.</p>
+<p>在Hudi中，更新或插入一批记录的任务被称为<strong>提交</strong>。提交可提供基本的原子性保证，即只有提交的数据可用于查询。
+每个提交都有一个单调递增的字符串/数字，称为<strong>提交编号</strong>。通常，这是我们开始提交的时间。</p>
 
-<p>To view some basic information about the last 10 commits,</p>
+<p>查看有关最近10次提交的一些基本信息，</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:trips-&gt;commits show --sortBy "Total Bytes Written" --desc true --limit 10
     ________________________________________________________________________________________________________________________________________________________________________
@@ -453,15 +452,15 @@ Each commit has a monotonically increasing string/number called the <strong>comm
 hoodie:trips-&gt;
 </code></pre></div></div>
 
-<p>At the start of each write, Hudi also writes a .inflight commit to the .hoodie folder. You can use the timestamp there to estimate how long the commit has been inflight</p>
+<p>在每次写入开始时，Hudi还将.inflight提交写入.hoodie文件夹。您可以使用那里的时间戳来估计正在进行的提交已经花费的时间</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ hdfs dfs -ls /app/uber/trips/.hoodie/*.inflight
 -rw-r--r--   3 vinoth supergroup     321984 2016-10-05 23:18 /app/uber/trips/.hoodie/20161005225920.inflight
 </code></pre></div></div>
 
-<h4 id="drilling-down-to-a-specific-commit">Drilling Down to a specific Commit</h4>
+<h4 id="深入到特定的提交">深入到特定的提交</h4>
 
-<p>To understand how the writes spread across specific partiions,</p>
+<p>了解写入如何分散到特定分区，</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:trips-&gt;commit showpartitions --commit 20161005165855 --sortBy "Total Bytes Written" --desc true --limit 10
     __________________________________________________________________________________________________________________________________________
@@ -471,7 +470,7 @@ hoodie:trips-&gt;
      ....
 </code></pre></div></div>
 
-<p>If you need file level granularity , we can do the following</p>
+<p>如果您需要文件级粒度，我们可以执行以下操作</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:trips-&gt;commit showfiles --commit 20161005165855 --sortBy "Partition Path"
     ________________________________________________________________________________________________________________________________________________________
@@ -481,10 +480,9 @@ hoodie:trips-&gt;
     ....
 </code></pre></div></div>
 
-<h4 id="filesystem-view">FileSystem View</h4>
+<h4 id="文件系统视图">文件系统视图</h4>
 
-<p>Hudi views each partition as a collection of file-groups with each file-group containing a list of file-slices in commit
-order (See Concepts). The below commands allow users to view the file-slices for a data-set.</p>
+<p>Hudi将每个分区视为文件组的集合，每个文件组包含按提交顺序排列的文件切片列表(请参阅概念)。以下命令允许用户查看数据集的文件切片。</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> hoodie:stock_ticks_mor-&gt;show fsview all
  ....
@@ -505,9 +503,9 @@ order (See Concepts). The below commands allow users to view the file-slices for
  hoodie:stock_ticks_mor-&gt;
 </code></pre></div></div>
 
-<h4 id="statistics">Statistics</h4>
+<h4 id="统计信息">统计信息</h4>
 
-<p>Since Hudi directly manages file sizes for DFS dataset, it might be good to get an overall picture</p>
+<p>由于Hudi直接管理DFS数据集的文件大小，这些信息会帮助你全面了解Hudi的运行状况</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:trips-&gt;stats filesizes --partitionPath 2016/09/01 --sortBy "95th" --desc true --limit 10
     ________________________________________________________________________________________________
@@ -518,7 +516,7 @@ order (See Concepts). The below commands allow users to view the file-slices for
     ....
 </code></pre></div></div>
 
-<p>In case of Hudi write taking much longer, it might be good to see the write amplification for any sudden increases</p>
+<p>如果Hudi写入花费的时间更长，那么可以通过观察写放大指标来发现任何异常</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:trips-&gt;stats wa
     __________________________________________________________________________
@@ -528,15 +526,14 @@ order (See Concepts). The below commands allow users to view the file-slices for
     ....
 </code></pre></div></div>
 
-<h4 id="archived-commits">Archived Commits</h4>
+<h4 id="归档的提交">归档的提交</h4>
 
-<p>In order to limit the amount of growth of .commit files on DFS, Hudi archives older .commit files (with due respect to the cleaner policy) into a commits.archived file.
-This is a sequence file that contains a mapping from commitNumber =&gt; json with raw information about the commit (same that is nicely rolled up above).</p>
+<p>为了限制DFS上.commit文件的增长量，Hudi将较旧的.commit文件(适当考虑清理策略)归档到commits.archived文件中。
+这是一个序列文件，其包含commitNumber =&gt; json的映射，及有关提交的原始信息(上面已很好地汇总了相同的信息)。</p>
 
-<h4 id="compactions">Compactions</h4>
+<h4 id="压缩">压缩</h4>
 
-<p>To get an idea of the lag between compaction and writer applications, use the below command to list down all
-pending compactions.</p>
+<p>要了解压缩和写程序之间的时滞，请使用以下命令列出所有待处理的压缩。</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:trips-&gt;compactions show all
      ___________________________________________________________________
@@ -546,7 +543,7 @@ pending compactions.</p>
     | &lt;INSTANT_2&gt;            | INFLIGHT | 27                           |
 </code></pre></div></div>
 
-<p>To inspect a specific compaction plan, use</p>
+<p>要检查特定的压缩计划，请使用</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:trips-&gt;compaction show --instant &lt;INSTANT_1&gt;
     _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
@@ -556,8 +553,8 @@ pending compactions.</p>
 
 </code></pre></div></div>
 
-<p>To manually schedule or run a compaction, use the below command. This command uses spark launcher to perform compaction
-operations. NOTE : Make sure no other application is scheduling compaction for this dataset concurrently</p>
+<p>要手动调度或运行压缩，请使用以下命令。该命令使用spark启动器执行压缩操作。
+注意：确保没有其他应用程序正在同时调度此数据集的压缩</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:trips-&gt;help compaction schedule
 Keyword:                   compaction schedule
@@ -613,9 +610,9 @@ Description:               Run Compaction for given instant time
 * compaction run - Run Compaction for given instant time
 </code></pre></div></div>
 
-<h5 id="validate-compaction">Validate Compaction</h5>
+<h5 id="验证压缩">验证压缩</h5>
 
-<p>Validating a compaction plan : Check if all the files necessary for compactions are present and are valid</p>
+<p>验证压缩计划：检查压缩所需的所有文件是否都存在且有效</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:stock_ticks_mor-&gt;compaction validate --instant 20181005222611
 ...
@@ -639,35 +636,33 @@ hoodie:stock_ticks_mor-&gt;compaction validate --instant 20181005222601
     | 05320e98-9a57-4c38-b809-a6beaaeb36bd| 20181005222445   | hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/05320e98-9a57-4c38-b809-a6beaaeb36bd_0_20181005222445.parquet| 1              | false| All log files specified in compaction operation is not present. Missing ....    |
 </code></pre></div></div>
 
-<h5 id="note">NOTE</h5>
+<h5 id="注意">注意</h5>
 
-<p>The following commands must be executed without any other writer/ingestion application running.</p>
+<p>必须在其他写入/摄取程序没有运行的情况下执行以下命令。</p>
 
-<p>Sometimes, it becomes necessary to remove a fileId from a compaction-plan inorder to speed-up or unblock compaction
-operation. Any new log-files that happened on this file after the compaction got scheduled will be safely renamed
-so that are preserved. Hudi provides the following CLI to support it</p>
+<p>有时，有必要从压缩计划中删除fileId以便加快或取消压缩操作。
+压缩计划之后在此文件上发生的所有新日志文件都将被安全地重命名以便进行保留。Hudi提供以下CLI来支持</p>
 
-<h5 id="unscheduling-compaction">UnScheduling Compaction</h5>
+<h5 id="取消调度压缩">取消调度压缩</h5>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:trips-&gt;compaction unscheduleFileId --fileId &lt;FileUUID&gt;
 ....
 No File renames needed to unschedule file from pending compaction. Operation successful.
 </code></pre></div></div>
 
-<p>In other cases, an entire compaction plan needs to be reverted. This is supported by the following CLI</p>
+<p>在其他情况下，需要撤销整个压缩计划。以下CLI支持此功能</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:trips-&gt;compaction unschedule --compactionInstant &lt;compactionInstant&gt;
 .....
 No File renames needed to unschedule pending compaction. Operation successful.
 </code></pre></div></div>
 
-<h5 id="repair-compaction">Repair Compaction</h5>
+<h5 id="修复压缩">修复压缩</h5>
 
-<p>The above compaction unscheduling operations could sometimes fail partially (e:g -&gt; DFS temporarily unavailable). With
-partial failures, the compaction operation could become inconsistent with the state of file-slices. When you run
-<code class="highlighter-rouge">compaction validate</code>, you can notice invalid compaction operations if there is one.  In these cases, the repair
-command comes to the rescue, it will rearrange the file-slices so that there is no loss and the file-slices are
-consistent with the compaction plan</p>
+<p>上面的压缩取消调度操作有时可能会部分失败(例如：DFS暂时不可用)。
+如果发生部分故障，则压缩操作可能与文件切片的状态不一致。
+当您运行<code class="highlighter-rouge">压缩验证</code>时，您会注意到无效的压缩操作(如果有的话)。
+在这种情况下，修复命令将立即执行，它将重新排列文件切片，以使文件不丢失，并且文件切片与压缩计划一致</p>
 
 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hoodie:stock_ticks_mor-&gt;compaction repair --instant 20181005222611
 ......
@@ -675,81 +670,80 @@ Compaction successfully repaired
 .....
 </code></pre></div></div>
 
-<h2 id="metrics">Metrics</h2>
+<h2 id="metrics">指标</h2>
 
-<p>Once the Hudi Client is configured with the right datasetname and environment for metrics, it produces the following graphite metrics, that aid in debugging hudi datasets</p>
+<p>为Hudi Client配置正确的数据集名称和指标环境后，它将生成以下graphite指标，以帮助调试hudi数据集</p>
 
 <ul>
-  <li><strong>Commit Duration</strong> - This is amount of time it took to successfully commit a batch of records</li>
-  <li><strong>Rollback Duration</strong> - Similarly, amount of time taken to undo partial data left over by a failed commit (happens everytime automatically after a failing write)</li>
-  <li><strong>File Level metrics</strong> - Shows the amount of new files added, versions, deleted (cleaned) in each commit</li>
-  <li><strong>Record Level Metrics</strong> - Total records inserted/updated etc per commit</li>
-  <li><strong>Partition Level metrics</strong> - number of partitions upserted (super useful to understand sudden spikes in commit duration)</li>
+  <li><strong>提交持续时间</strong> - 这是成功提交一批记录所花费的时间</li>
+  <li><strong>回滚持续时间</strong> - 同样，撤消失败的提交所剩余的部分数据所花费的时间(每次写入失败后都会自动发生)</li>
+  <li><strong>文件级别指标</strong> - 显示每次提交中新增、版本、删除(清除)的文件数量</li>
+  <li><strong>记录级别指标</strong> - 每次提交插入/更新的记录总数</li>
+  <li><strong>分区级别指标</strong> - 更新的分区数量(对于了解提交持续时间的突然峰值非常有用)</li>
 </ul>
 
-<p>These metrics can then be plotted on a standard tool like grafana. Below is a sample commit duration chart.</p>
+<p>然后可以将这些指标绘制在grafana等标准工具上。以下是提交持续时间图表示例。</p>
 
 <figure>
     <img class="docimage" src="/images/hudi_commit_duration.png" alt="hudi_commit_duration.png" style="max-width: 1000px" />
 </figure>
 
-<h2 id="troubleshooting">Troubleshooting Failures</h2>
+<h2 id="troubleshooting">故障排除</h2>
 
-<p>Section below generally aids in debugging Hudi failures. Off the bat, the following metadata is added to every record to help triage  issues easily using standard Hadoop SQL engines (Hive/Presto/Spark)</p>
+<p>以下部分通常有助于调试Hudi故障。以下元数据已被添加到每条记录中，可以通过标准Hadoop SQL引擎(Hive/Presto/Spark)检索，来更容易地诊断问题的严重性。</p>
 
 <ul>
-  <li><strong>_hoodie_record_key</strong> - Treated as a primary key within each DFS partition, basis of all updates/inserts</li>
-  <li><strong>_hoodie_commit_time</strong> - Last commit that touched this record</li>
-  <li><strong>_hoodie_file_name</strong> - Actual file name containing the record (super useful to triage duplicates)</li>
-  <li><strong>_hoodie_partition_path</strong> - Path from basePath that identifies the partition containing this record</li>
+  <li><strong>_hoodie_record_key</strong> - 作为每个DFS分区内的主键，是所有更新/插入的基础</li>
+  <li><strong>_hoodie_commit_time</strong> - 该记录上次的提交</li>
+  <li><strong>_hoodie_file_name</strong> - 包含记录的实际文件名(对检查重复非常有用)</li>
+  <li><strong>_hoodie_partition_path</strong> - basePath的路径，该路径标识包含此记录的分区</li>
 </ul>
 
-<p>Note that as of now, Hudi assumes the application passes in the same deterministic partitionpath for a given recordKey. i.e the uniqueness of record key is only enforced within each partition</p>
+<p>请注意，到目前为止，Hudi假定应用程序为给定的recordKey传递相同的确定性分区路径。即仅在每个分区内保证recordKey(主键)的唯一性。</p>
 
-<h4 id="missing-records">Missing records</h4>
+<h4 id="缺失记录">缺失记录</h4>
 
-<p>Please check if there were any write errors using the admin commands above, during the window at which the record could have been written.
-If you do find errors, then the record was not actually written by Hudi, but handed back to the application to decide what to do with it.</p>
+<p>请在可能写入记录的窗口中，使用上面的admin命令检查是否存在任何写入错误。
+如果确实发现错误，那么记录实际上不是由Hudi写入的，而是交还给应用程序来决定如何处理。</p>
 
-<h4 id="duplicates">Duplicates</h4>
+<h4 id="重复">重复</h4>
 
-<p>First of all, please confirm if you do indeed have duplicates <strong>AFTER</strong> ensuring the query is accessing the Hudi datasets <a href="sql_queries.html">properly</a> .</p>
+<p>首先，请确保访问Hudi数据集的查询是<a href="sql_queries.html">没有问题的</a>，并之后确认的确有重复。</p>
 
 <ul>
-  <li>If confirmed, please use the metadata fields above, to identify the physical files &amp; partition files containing the records .</li>
-  <li>If duplicates span files across partitionpath, then this means your application is generating different partitionPaths for same recordKey, Please fix your app</li>
-  <li>if duplicates span multiple files within the same partitionpath, please engage with mailing list. This should not happen. You can use the <code class="highlighter-rouge">records deduplicate</code> command to fix your data.</li>
+  <li>如果确认，请使用上面的元数据字段来标识包含记录的物理文件和分区文件。</li>
+  <li>如果重复的记录存在于不同分区路径下的文件，则意味着您的应用程序正在为同一recordKey生成不同的分区路径，请修复您的应用程序.</li>
+  <li>如果重复的记录存在于同一分区路径下的多个文件，请使用邮件列表汇报这个问题。这不应该发生。您可以使用<code class="highlighter-rouge">records deduplicate</code>命令修复数据。</li>
 </ul>
 
-<h4 id="spark-ui">Spark failures</h4>
-
-<p>Typical upsert() DAG looks like below. Note that Hudi client also caches intermediate RDDs to intelligently profile workload and size files and spark parallelism.
-Also Spark UI shows sortByKey twice due to the probe job also being shown, nonetheless its just a single sort.</p>
+<h4 id="spark-ui">Spark故障</h4>
 
+<p>典型的upsert() DAG如下所示。请注意，Hudi客户端会缓存中间的RDD，以智能地并调整文件大小和Spark并行度。
+另外，由于还显示了探针作业，Spark UI显示了两次sortByKey，但它只是一个排序。</p>
 <figure>
     <img class="docimage" src="/images/hudi_upsert_dag.png" alt="hudi_upsert_dag.png" style="max-width: 1000px" />
 </figure>
 
-<p>At a high level, there are two steps</p>
+<p>概括地说，有两个步骤</p>
 
-<p><strong>Index Lookup to identify files to be changed</strong></p>
+<p><strong>索引查找以标识要更改的文件</strong></p>
 
 <ul>
-  <li>Job 1 : Triggers the input data read, converts to HoodieRecord object and then stops at obtaining a spread of input records to target partition paths</li>
-  <li>Job 2 : Load the set of file names which we need check against</li>
-  <li>Job 3  &amp; 4 : Actual lookup after smart sizing of spark join parallelism, by joining RDDs in 1 &amp; 2 above</li>
-  <li>Job 5 : Have a tagged RDD of recordKeys with locations</li>
+  <li>Job 1 : 触发输入数据读取，转换为HoodieRecord对象，然后根据输入记录拿到目标分区路径。</li>
+  <li>Job 2 : 加载我们需要检查的文件名集。</li>
+  <li>Job 3  &amp; 4 : 通过联合上面1和2中的RDD，智能调整spark join并行度，然后进行实际查找。</li>
+  <li>Job 5 : 生成带有位置的recordKeys作为标记的RDD。</li>
 </ul>
 
-<p><strong>Performing the actual writing of data</strong></p>
+<p><strong>执行数据的实际写入</strong></p>
 
 <ul>
-  <li>Job 6 : Lazy join of incoming records against recordKey, location to provide a final set of HoodieRecord which now contain the information about which file/partitionpath they are found at (or null if insert). Then also profile the workload again to determine sizing of files</li>
-  <li>Job 7 : Actual writing of data (update + insert + insert turned to updates to maintain file size)</li>
+  <li>Job 6 : 将记录与recordKey(位置)进行懒惰连接，以提供最终的HoodieRecord集，现在它包含每条记录的文件/分区路径信息(如果插入，则为null)。然后还要再次分析工作负载以确定文件的大小。</li>
+  <li>Job 7 : 实际写入数据(更新 + 插入 + 插入转为更新以保持文件大小)</li>
 </ul>
 
-<p>Depending on the exception source (Hudi/Spark), the above knowledge of the DAG can be used to pinpoint the actual issue. The most often encountered failures result from YARN/DFS temporary failures.
-In the future, a more sophisticated debug/management UI would be added to the project, that can help automate some of this debugging.</p>
+<p>根据异常源(Hudi/Spark)，上述关于DAG的信息可用于查明实际问题。最常遇到的故障是由YARN/DFS临时故障引起的。
+将来，将在项目中添加更复杂的调试/管理UI，以帮助自动进行某些调试。</p>
 
 
     <div class="tags">
diff --git a/content/cn/comparison.html b/content/cn/comparison.html
index a1d3d4a..5dddc97 100644
--- a/content/cn/comparison.html
+++ b/content/cn/comparison.html
@@ -338,54 +338,47 @@
 
     
 
-  <p>Apache Hudi fills a big void for processing data on top of DFS, and thus mostly co-exists nicely with these technologies. However,
-it would be useful to understand how Hudi fits into the current big data ecosystem, contrasting it with a few related systems
-and bring out the different tradeoffs these systems have accepted in their design.</p>
+  <p>Apache Hudi填补了在DFS上处理数据的巨大空白，并可以和这些技术很好地共存。然而，
+通过将Hudi与一些相关系统进行对比，来了解Hudi如何适应当前的大数据生态系统，并知晓这些系统在设计中做的不同权衡仍将非常有用。</p>
 
 <h2 id="kudu">Kudu</h2>
 
-<p><a href="https://kudu.apache.org">Apache Kudu</a> is a storage system that has similar goals as Hudi, which is to bring real-time analytics on petabytes of data via first
-class support for <code class="highlighter-rouge">upserts</code>. A key differentiator is that Kudu also attempts to serve as a datastore for OLTP workloads, something that Hudi does not aspire to be.
-Consequently, Kudu does not support incremental pulling (as of early 2017), something Hudi does to enable incremental processing use cases.</p>
+<p><a href="https://kudu.apache.org">Apache Kudu</a>是一个与Hudi具有相似目标的存储系统，该系统通过对<code class="highlighter-rouge">upserts</code>支持来对PB级数据进行实时分析。
+一个关键的区别是Kudu还试图充当OLTP工作负载的数据存储，而Hudi并不希望这样做。
+因此，Kudu不支持增量拉取(截至2017年初)，而Hudi支持以便进行增量处理。</p>
 
-<p>Kudu diverges from a distributed file system abstraction and HDFS altogether, with its own set of storage servers talking to each  other via RAFT.
-Hudi, on the other hand, is designed to work with an underlying Hadoop compatible filesystem (HDFS,S3 or Ceph) and does not have its own fleet of storage servers,
-instead relying on Apache Spark to do the heavy-lifting. Thu, Hudi can be scaled easily, just like other Spark jobs, while Kudu would require hardware
-&amp; operational support, typical to datastores like HBase or Vertica. We have not at this point, done any head to head benchmarks against Kudu (given RTTable is WIP).
-But, if we were to go with results shared by <a href="https://db-blog.web.cern.ch/blog/zbigniew-baranowski/2017-01-performance-comparison-different-file-formats-and-storage-engines">CERN</a> ,
-we expect Hudi to positioned at something that ingests parquet with superior performance.</p>
+<p>Kudu与分布式文件系统抽象和HDFS完全不同，它自己的一组存储服务器通过RAFT相互通信。
+与之不同的是，Hudi旨在与底层Hadoop兼容的文件系统(HDFS，S3或Ceph)一起使用，并且没有自己的存储服务器群，而是依靠Apache Spark来完成繁重的工作。
+因此，Hudi可以像其他Spark作业一样轻松扩展，而Kudu则需要硬件和运营支持，特别是HBase或Vertica等数据存储系统。
+到目前为止，我们还没有做任何直接的基准测试来比较Kudu和Hudi(鉴于RTTable正在进行中)。
+但是，如果我们要使用<a href="https://db-blog.web.cern.ch/blog/zbigniew-baranowski/2017-01-performance-comparison-different-file-formats-and-storage-engines">CERN</a>，
+我们预期Hudi在摄取parquet上有更卓越的性能。</p>
 
-<h2 id="hive-transactions">Hive Transactions</h2>
+<h2 id="hive事务">Hive事务</h2>
 
-<p><a href="https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions">Hive Transactions/ACID</a> is another similar effort, which tries to implement storage like
-<code class="highlighter-rouge">merge-on-read</code>, on top of ORC file format. Understandably, this feature is heavily tied to Hive and other efforts like <a href="https://cwiki.apache.org/confluence/display/Hive/LLAP">LLAP</a>.
-Hive transactions does not offer the read-optimized storage option or the incremental pulling, that Hudi does. In terms of implementation choices, Hudi leverages
-the full power of a processing framework like Spark, while Hive transactions feature is implemented underneath by Hive tasks/queries kicked off by user or the Hive metastore.
-Based on our production experience, embedding Hudi as a library into existing Spark pipelines was much easier and less operationally heavy, compared with the other approach.
-Hudi is also designed to work with non-hive enginers like Presto/Spark and will incorporate file formats other than parquet over time.</p>
+<p><a href="https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions">Hive事务/ACID</a>是另一项类似的工作，它试图实现在ORC文件格式之上的存储<code class="highlighter-rouge">读取时合并</code>。
+可以理解，此功能与Hive以及<a href="https://cwiki.apache.org/confluence/display/Hive/LLAP">LLAP</a>之类的其他工作紧密相关。
+Hive事务不提供Hudi提供的读取优化存储选项或增量拉取。
+在实现选择方面，Hudi充分利用了类似Spark的处理框架的功能，而Hive事务特性则在用户或Hive Metastore启动的Hive任务/查询的下实现。
+根据我们的生产经验，与其他方法相比，将Hudi作为库嵌入到现有的Spark管道中要容易得多，并且操作不会太繁琐。
+Hudi还设计用于与Presto/Spark等非Hive引擎合作，并计划引入除parquet以外的文件格式。</p>
 
 <h2 id="hbase">HBase</h2>
 
-<p>Even though <a href="https://hbase.apache.org">HBase</a> is ultimately a key-value store for OLTP workloads, users often tend to associate HBase with analytics given the proximity to Hadoop.
-Given HBase is heavily write-optimized, it supports sub-second upserts out-of-box and Hive-on-HBase lets users query that data. However, in terms of actual performance for analytical workloads,
-hybrid columnar storage formats like Parquet/ORC handily beat HBase, since these workloads are predominantly read-heavy. Hudi bridges this gap between faster data and having
-analytical storage formats. From an operational perspective, arming users with a library that provides faster data, is more scalable, than managing a big farm of HBase region servers,
-just for analytics. Finally, HBase does not support incremental processing primitives like <code class="highlighter-rouge">commit times</code>, <code class="highlighter-rouge">incremental pull</code> as first class citizens like Hudi.</p>
-
-<h2 id="stream-processing">Stream Processing</h2>
-
-<p>A popular question, we get is : “How does Hudi relate to stream processing systems?”, which we will try to answer here. Simply put, Hudi can integrate with
-batch (<code class="highlighter-rouge">copy-on-write storage</code>) and streaming (<code class="highlighter-rouge">merge-on-read storage</code>) jobs of today, to store the computed results in Hadoop. For Spark apps, this can happen via direct
-integration of Hudi library with Spark/Spark streaming DAGs. In case of Non-Spark processing systems (eg: Flink, Hive), the processing can be done in the respective systems
-and later sent into a Hudi table via a Kafka topic/DFS intermediate file. In more conceptual level, data processing
-pipelines just consist of three components : <code class="highlighter-rouge">source</code>, <code class="highlighter-rouge">processing</code>, <code class="highlighter-rouge">sink</code>, with users ultimately running queries against the sink to use the results of the pipeline.
-Hudi can act as either a source or sink, that stores data on DFS. Applicability of Hudi to a given stream processing pipeline ultimately boils down to suitability
-of Presto/SparkSQL/Hive for your queries.</p>
-
-<p>More advanced use cases revolve around the concepts of <a href="https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop">incremental processing</a>, which effectively
-uses Hudi even inside the <code class="highlighter-rouge">processing</code> engine to speed up typical batch pipelines. For e.g: Hudi can be used as a state store inside a processing DAG (similar
-to how <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends.html#the-rocksdbstatebackend">rocksDB</a> is used by Flink). This is an item on the roadmap
-and will eventually happen as a <a href="https://issues.apache.org/jira/browse/HUDI-60">Beam Runner</a></p>
+<p>尽管<a href="https://hbase.apache.org">HBase</a>最终是OLTP工作负载的键值存储层，但由于与Hadoop的相似性，用户通常倾向于将HBase与分析相关联。
+鉴于HBase经过严格的写优化，它支持开箱即用的亚秒级更新，Hive-on-HBase允许用户查询该数据。 但是，就分析工作负载的实际性能而言，Parquet/ORC之类的混合列式存储格式可以轻松击败HBase，因为这些工作负载主要是读取繁重的工作。
+Hudi弥补了更快的数据与分析存储格式之间的差距。从运营的角度来看，与管理分析使用的HBase region服务器集群相比，为用户提供可更快给出数据的库更具可扩展性。
+最终，HBase不像Hudi这样重点支持<code class="highlighter-rouge">提交时间</code>、<code class="highlighter-rouge">增量拉取</code>之类的增量处理原语。</p>
+
+<h2 id="流式处理">流式处理</h2>
+
+<p>一个普遍的问题：”Hudi与流处理系统有何关系？”，我们将在这里尝试回答。简而言之，Hudi可以与当今的批处理(<code class="highlighter-rouge">写时复制存储</code>)和流处理(<code class="highlighter-rouge">读时合并存储</code>)作业集成，以将计算结果存储在Hadoop中。
+对于Spark应用程序，这可以通过将Hudi库与Spark/Spark流式DAG直接集成来实现。在非Spark处理系统(例如Flink、Hive)情况下，可以在相应的系统中进行处理，然后通过Kafka主题/DFS中间文件将其发送到Hudi表中。从概念上讲，数据处理
+管道仅由三个部分组成：<code class="highlighter-rouge">输入</code>，<code class="highlighter-rouge">处理</code>，<code class="highlighter-rouge">输出</code>，用户最终针对输出运行查询以便使用管道的结果。Hudi可以充当将数据存储在DFS上的输入或输出。Hudi在给定流处理管道上的适用性最终归结为你的查询在Presto/SparkSQL/Hive的适用性。</p>
+
+<p>更高级的用例围绕<a href="https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop">增量处理</a>的概念展开，
+甚至在<code class="highlighter-rouge">处理</code>引擎内部也使用Hudi来加速典型的批处理管道。例如：Hudi可用作DAG内的状态存储(类似Flink使用的[rocksDB(https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends.html#the-rocksdbstatebackend))。
+这是路线图上的一个项目并将最终以<a href="https://issues.apache.org/jira/browse/HUDI-60">Beam Runner</a>的形式呈现。</p>
 
 
     <div class="tags">
diff --git a/content/cn/index.html b/content/cn/index.html
index 9d2f976..ebd87fa 100644
--- a/content/cn/index.html
+++ b/content/cn/index.html
@@ -3,9 +3,9 @@
     <meta charset="utf-8">
 <meta http-equiv="X-UA-Compatible" content="IE=edge">
 <meta name="viewport" content="width=device-width, initial-scale=1">
-<meta name="description" content="Hudi brings stream processing to big data, providing fresh data while being an order of magnitude efficient over traditional batch processing.">
+<meta name="description" content="Hudi为大数据带来流处理，在提供新数据的同时，比传统的批处理效率高出一个数量级。">
 <meta name="keywords" content="big data, stream processing, cloud, hdfs, storage, upserts, change capture">
-<title>What is Hudi? | Hudi</title>
+<title>什么是Hudi? | Hudi</title>
 <link rel="stylesheet" href="/css/syntax.css">
 
 
@@ -164,7 +164,7 @@
 
 
 
-  <a class="email" title="Submit feedback" href="#" onclick="javascript:window.location='mailto:dev@hudi.apache.org?subject=Hudi Documentation feedback&body=I have some feedback about the What is Hudi? page: ' + window.location.href;"><i class="fa fa-envelope-o"></i> Feedback</a>
+  <a class="email" title="Submit feedback" href="#" onclick="javascript:window.location='mailto:dev@hudi.apache.org?subject=Hudi Documentation feedback&body=I have some feedback about the 什么是Hudi? page: ' + window.location.href;"><i class="fa fa-envelope-o"></i> Feedback</a>
 
 <li>
 
@@ -187,7 +187,7 @@
                                 searchInput: document.getElementById('search-input'),
                                 resultsContainer: document.getElementById('results-container'),
                                 dataSource: '/search.json',
-                                searchResultTemplate: '<li><a href="{url}" title="What is Hudi?">{title}</a></li>',
+                                searchResultTemplate: '<li><a href="{url}" title="什么是Hudi?">{title}</a></li>',
                     noResultsText: 'No results found.',
                             limit: 10,
                             fuzzy: true,
@@ -324,7 +324,7 @@
     <!-- Content Column -->
     <div class="col-md-9">
         <div class="post-header">
-   <h1 class="post-title-main">What is Hudi?</h1>
+   <h1 class="post-title-main">什么是Hudi?</h1>
 </div>
 
 
@@ -332,7 +332,7 @@
 <div class="post-content">
 
    
-    <div class="summary">Hudi brings stream processing to big data, providing fresh data while being an order of magnitude efficient over traditional batch processing.</div>
+    <div class="summary">Hudi为大数据带来流处理，在提供新数据的同时，比传统的批处理效率高出一个数量级。</div>
    
 
     
@@ -363,21 +363,21 @@ $('#toc').on('click', 'a', function() {
 
     
 
-  <p>Hudi (pronounced “Hoodie”) ingests &amp; manages storage of large analytical datasets over DFS (<a href="http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">HDFS</a> or cloud stores) and provides three logical views for query access.</p>
+  <p>Hudi（发音为“hoodie”）摄取与管理处于DFS(<a href="http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">HDFS</a> 或云存储)之上的大型分析数据集并为查询访问提供三个逻辑视图。</p>
 
 <ul>
-  <li><strong>Read Optimized View</strong> - Provides excellent query performance on pure columnar storage, much like plain <a href="https://parquet.apache.org/">Parquet</a> tables.</li>
-  <li><strong>Incremental View</strong> - Provides a change stream out of the dataset to feed downstream jobs/ETLs.</li>
-  <li><strong>Near-Real time Table</strong> - Provides queries on real-time data, using a combination of columnar &amp; row based storage (e.g Parquet + <a href="http://avro.apache.org/docs/current/mr.html">Avro</a>)</li>
+  <li><strong>读优化视图</strong> - 在纯列式存储上提供出色的查询性能，非常像<a href="https://parquet.apache.org/">parquet</a>表。</li>
+  <li><strong>增量视图</strong> - 在数据集之上提供一个变更流并提供给下游的作业或ETL任务。</li>
+  <li><strong>准实时的表</strong> - 使用基于列存储(例如 Parquet + <a href="http://avro.apache.org/docs/current/mr.html">Avro</a>)和行存储以提供对实时数据的查询</li>
 </ul>
 
 <figure>
     <img class="docimage" src="/images/hudi_intro_1.png" alt="hudi_intro_1.png" />
 </figure>
 
-<p>By carefully managing how data is laid out in storage &amp; how it’s exposed to queries, Hudi is able to power a rich data ecosystem where external sources can be ingested in near real-time and made available for interactive SQL Engines like <a href="https://prestodb.io">Presto</a> &amp; <a href="https://spark.apache.org/sql/">Spark</a>, while at the same time capable of being consumed incrementally from processing/ETL frameworks like <a href="https://hive.apache.org/">Hive</a> &amp;  [...]
+<p>通过仔细地管理数据在存储中的布局和如何将数据暴露给查询，Hudi支持丰富的数据生态系统，在该系统中，外部数据源可被近实时摄取并被用于<a href="https://prestodb.io">presto</a>和<a href="https://spark.apache.org/sql/">spark</a>等交互式SQL引擎，同时能够从处理/ETL框架（如<a href="https://hive.apache.org/">hive</a>&amp; <a href="https://spark.apache.org/docs/latest/">spark</a>中进行增量消费以构建派生（Hudi）数据集。</p>
 
-<p>Hudi broadly consists of a self contained Spark library to build datasets and integrations with existing query engines for data access. See <a href="quickstart.html">quickstart</a> for a demo.</p>
+<p>Hudi 大体上由一个自包含的Spark库组成，它用于构建数据集并与现有的数据访问查询引擎集成。有关演示，请参见<a href="quickstart.html">快速启动</a>。</p>
 
 
     <div class="tags">
diff --git a/content/cn/powered_by.html b/content/cn/powered_by.html
index 33d086c..d6c80a9 100644
--- a/content/cn/powered_by.html
+++ b/content/cn/powered_by.html
@@ -338,27 +338,27 @@
 
     
 
-  <h2 id="adoption">Adoption</h2>
+  <h2 id="已使用">已使用</h2>
 
 <h4 id="uber">Uber</h4>
 
-<p>Hudi was originally developed at <a href="https://uber.com">Uber</a>, to achieve <a href="http://www.slideshare.net/vinothchandar/hadoop-strata-talk-uber-your-hadoop-has-arrived/32">low latency database ingestion, with high efficiency</a>.
-It has been in production since Aug 2016, powering ~100 highly business critical tables on Hadoop, worth 100s of TBs(including top 10 including trips,riders,partners).
-It also powers several incremental Hive ETL pipelines and being currently integrated into Uber’s data dispersal system.</p>
+<p>Hudi最初由<a href="https://uber.com">Uber</a>开发，用于实现<a href="http://www.slideshare.net/vinothchandar/hadoop-strata-talk-uber-your-hadoop-has-arrived/32">低延迟、高效率的数据库摄取</a>。
+Hudi自2016年8月开始在生产环境上线，在Hadoop上驱动约100个非常关键的业务表，支撑约几百TB的数据规模(前10名包括行程、乘客、司机)。
+Hudi还支持几个增量的Hive ETL管道，并且目前已集成到Uber的数据分发系统中。</p>
 
 <h4 id="emis-health">EMIS Health</h4>
 
-<p>[EMIS Health][https://www.emishealth.com/] is the largest provider of Primary Care IT software in the UK with datasets including more than 500Bn healthcare records. HUDI is used to manage their analytics dataset in production and keeping them up-to-date with their upstream source. Presto is being used to query the data written in HUDI format.</p>
+<p><a href="https://www.emishealth.com/">EMIS Health</a>是英国最大的初级保健IT软件提供商，其数据集包括超过5000亿的医疗保健记录。HUDI用于管理生产中的分析数据集，并使其与上游源保持同步。Presto用于查询以HUDI格式写入的数据。</p>
 
 <h4 id="yieldsio">Yields.io</h4>
 
-<p>Yields.io is the first FinTech platform that uses AI for automated model validation and real-time monitoring on an enterprise-wide scale. Their data lake is managed by Hudi. They are also actively building their infrastructure for incremental, cross language/platform machine learning using Hudi.</p>
+<p>Yields.io是第一个使用AI在企业范围内进行自动模型验证和实时监控的金融科技平台。他们的数据湖由Hudi管理，他们还积极使用Hudi为增量式、跨语言/平台机器学习构建基础架构。</p>
 
 <h4 id="yotpo">Yotpo</h4>
 
-<p>Using Hudi at Yotpo for several usages. Firstly, integrated Hudi as a writer in their open source ETL framework https://github.com/YotpoLtd/metorikku and using as an output writer for a CDC pipeline, with events that are being generated from a database binlog streams to Kafka and then are written to S3.</p>
+<p>Hudi在Yotpo有不少用途。首先，在他们的<a href="https://github.com/YotpoLtd/metorikku">开源ETL框架</a>中集成了Hudi作为CDC管道的输出写入程序，即从数据库binlog生成的事件流到Kafka然后再写入S3。</p>
 
-<h2 id="talks--presentations">Talks &amp; Presentations</h2>
+<h2 id="演讲--报告">演讲 &amp; 报告</h2>
 
 <ol>
   <li>
@@ -394,7 +394,7 @@ September 2019, ApacheCon NA 19, Las Vegas, NV, USA</p>
   </li>
 </ol>
 
-<h2 id="articles">Articles</h2>
+<h2 id="文章">文章</h2>
 
 <ol>
   <li><a href="https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop">“The Case for incremental processing on Hadoop”</a> - O’reilly Ideas article by Vinoth Chandar</li>
diff --git a/content/cn/querying_data.html b/content/cn/querying_data.html
index 2018b01..1479186 100644
--- a/content/cn/querying_data.html
+++ b/content/cn/querying_data.html
@@ -348,8 +348,8 @@ bundle has been provided, the dataset can be queried by popular query engines li
 For e.g, if <code class="highlighter-rouge">table name = hudi_tbl</code>, then we get</p>
 
 <ul>
-  <li><code class="highlighter-rouge">hudi_tbl</code> realizes the read optimized view of the dataset backed by <code class="highlighter-rouge">HoodieInputFormat</code>, exposing purely columnar data.</li>
-  <li><code class="highlighter-rouge">hudi_tbl_rt</code> realizes the real time view of the dataset  backed by <code class="highlighter-rouge">HoodieRealtimeInputFormat</code>, exposing merged view of base and log data.</li>
+  <li><code class="highlighter-rouge">hudi_tbl</code> realizes the read optimized view of the dataset backed by <code class="highlighter-rouge">HoodieParquetInputFormat</code>, exposing purely columnar data.</li>
+  <li><code class="highlighter-rouge">hudi_tbl_rt</code> realizes the real time view of the dataset  backed by <code class="highlighter-rouge">HoodieParquetRealtimeInputFormat</code>, exposing merged view of base and log data.</li>
 </ul>
 
 <p>As discussed in the concepts section, the one key primitive needed for <a href="https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop">incrementally processing</a>,
@@ -368,7 +368,7 @@ classes with its dependencies are available for query planning &amp; execution.<
 
 <h3 id="hive-ro-view">Read Optimized table</h3>
 <p>In addition to setup above, for beeline cli access, the <code class="highlighter-rouge">hive.input.format</code> variable needs to be set to the  fully qualified path name of the 
-inputformat <code class="highlighter-rouge">org.apache.hudi.hadoop.HoodieInputFormat</code>. For Tez, additionally the <code class="highlighter-rouge">hive.tez.input.format</code> needs to be set 
+inputformat <code class="highlighter-rouge">org.apache.hudi.hadoop.HoodieParquetInputFormat</code>. For Tez, additionally the <code class="highlighter-rouge">hive.tez.input.format</code> needs to be set 
 to <code class="highlighter-rouge">org.apache.hadoop.hive.ql.io.HiveInputFormat</code></p>
 
 <h3 id="hive-rt-view">Real time table</h3>
diff --git a/content/configurations.html b/content/configurations.html
index a0d1bf9..ff6d598 100644
--- a/content/configurations.html
+++ b/content/configurations.html
@@ -692,7 +692,7 @@ HoodieWriteConfig can be built using a builder pattern as below.</p>
 <p>Property: <code class="highlighter-rouge">hoodie.commits.archival.batch</code> <br />
 <span style="color:grey">This controls the number of commit instants read in memory as a batch and archived together.</span></p>
 
-<h5 id="compactionSmallFileSize">compactionSmallFileSize(size = 0)</h5>
+<h5 id="compactionSmallFileSize">compactionSmallFileSize(size = 100MB)</h5>
 <p>Property: <code class="highlighter-rouge">hoodie.parquet.small.file.limit</code> <br />
 <span style="color:grey">This should be less &lt; maxFileSize and setting it to 0, turns off this feature. Small files can always happen because of the number of insert records in a partition in a batch. Hudi has an option to auto-resolve small files by masking inserts into this partition as updates to existing small files. The size here is the minimum file size considered as a “small file size”.</span></p>
 
diff --git a/content/contributing.html b/content/contributing.html
index 5dc2f6e..785d92f 100644
--- a/content/contributing.html
+++ b/content/contributing.html
@@ -370,6 +370,14 @@ These instructions have been tested on IntelliJ. We also recommend setting up th
   <li>[Optional] If you want to get involved, but don’t have a project in mind, please check JIRA for small, quick-starters.</li>
   <li>[Optional] Familiarize yourself with internals of Hudi using content on this page, as well as <a href="https://cwiki.apache.org/confluence/display/HUDI">wiki</a></li>
   <li>Once you finalize on a project/task, please open a new JIRA or assign an existing one to yourself. (If you don’t have perms to do this, please email the dev mailing list with your JIRA id and a small intro for yourself. We’d be happy to add you as a contributor)</li>
+  <li>While raising a new JIRA or updating an existing one, please make sure to do the following
+    <ul>
+      <li>The issue type and versions (when resolving the ticket) are set correctly</li>
+      <li>Summary should be descriptive enough to catch the essence of the problem/ feature</li>
+      <li>Capture the version of Hoodie/Spark/Hive/Hadoop/Cloud environments in the ticket</li>
+      <li>Whenever possible, provide steps to reproduce via sample code or on the <a href="https://hudi.apache.org/docker_demo.html">docker setup</a></li>
+    </ul>
+  </li>
   <li>Almost all PRs should be linked to a JIRA. Before you begin work, click “Start Progress” on the JIRA, which tells everyone that you are working on the issue actively.</li>
   <li>Make your code change
     <ul>
diff --git a/content/feed.xml b/content/feed.xml
index ec6ded0..c911613 100644
--- a/content/feed.xml
+++ b/content/feed.xml
@@ -5,8 +5,8 @@
         <description>Apache Hudi (pronounced “Hoodie”) provides upserts and incremental processing capaibilities on Big Data</description>
         <link>http://0.0.0.0:4000/</link>
         <atom:link href="http://0.0.0.0:4000/feed.xml" rel="self" type="application/rss+xml"/>
-        <pubDate>Mon, 16 Sep 2019 19:19:14 +0000</pubDate>
-        <lastBuildDate>Mon, 16 Sep 2019 19:19:14 +0000</lastBuildDate>
+        <pubDate>Thu, 10 Oct 2019 12:20:51 +0000</pubDate>
+        <lastBuildDate>Thu, 10 Oct 2019 12:20:51 +0000</lastBuildDate>
         <generator>Jekyll v3.7.2</generator>
         
     </channel>
diff --git a/content/querying_data.html b/content/querying_data.html
index fc16afd..379bd64 100644
--- a/content/querying_data.html
+++ b/content/querying_data.html
@@ -349,8 +349,8 @@ bundle has been provided, the dataset can be queried by popular query engines li
 For e.g, if <code class="highlighter-rouge">table name = hudi_tbl</code>, then we get</p>
 
 <ul>
-  <li><code class="highlighter-rouge">hudi_tbl</code> realizes the read optimized view of the dataset backed by <code class="highlighter-rouge">HoodieInputFormat</code>, exposing purely columnar data.</li>
-  <li><code class="highlighter-rouge">hudi_tbl_rt</code> realizes the real time view of the dataset  backed by <code class="highlighter-rouge">HoodieRealtimeInputFormat</code>, exposing merged view of base and log data.</li>
+  <li><code class="highlighter-rouge">hudi_tbl</code> realizes the read optimized view of the dataset backed by <code class="highlighter-rouge">HoodieParquetInputFormat</code>, exposing purely columnar data.</li>
+  <li><code class="highlighter-rouge">hudi_tbl_rt</code> realizes the real time view of the dataset  backed by <code class="highlighter-rouge">HoodieParquetRealtimeInputFormat</code>, exposing merged view of base and log data.</li>
 </ul>
 
 <p>As discussed in the concepts section, the one key primitive needed for <a href="https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop">incrementally processing</a>,
@@ -369,7 +369,7 @@ classes with its dependencies are available for query planning &amp; execution.<
 
 <h3 id="hive-ro-view">Read Optimized table</h3>
 <p>In addition to setup above, for beeline cli access, the <code class="highlighter-rouge">hive.input.format</code> variable needs to be set to the  fully qualified path name of the 
-inputformat <code class="highlighter-rouge">org.apache.hudi.hadoop.HoodieInputFormat</code>. For Tez, additionally the <code class="highlighter-rouge">hive.tez.input.format</code> needs to be set 
+inputformat <code class="highlighter-rouge">org.apache.hudi.hadoop.HoodieParquetInputFormat</code>. For Tez, additionally the <code class="highlighter-rouge">hive.tez.input.format</code> needs to be set 
 to <code class="highlighter-rouge">org.apache.hadoop.hive.ql.io.HiveInputFormat</code></p>
 
 <h3 id="hive-rt-view">Real time table</h3>
diff --git a/content/quickstart.html b/content/quickstart.html
index 05b50c6..b4d0bf3 100644
--- a/content/quickstart.html
+++ b/content/quickstart.html
@@ -339,215 +339,159 @@
 
     
 
-  <p><br />
-To get a quick peek at Hudi’s capabilities, we have put together a <a href="https://www.youtube.com/watch?v=VhNgUsxdrD0">demo video</a> 
-that showcases this on a docker based setup with all dependent systems running locally. We recommend you replicate the same setup 
-and run the demo yourself, by following steps <a href="docker_demo.html">here</a>. Also, if you are looking for ways to migrate your existing data to Hudi, 
-refer to <a href="migration_guide.html">migration guide</a>.</p>
+  <p><br /></p>
 
-<p>If you have Hive, Hadoop, Spark installed already &amp; prefer to do it on your own setup, read on.</p>
+<p>This guide provides a quick peek at Hudi’s capabilities using spark-shell. Using Spark datasources, we will walk through 
+code snippets that allows you to insert and update a Hudi dataset of default storage type: 
+<a href="https://hudi.apache.org/concepts.html#copy-on-write-storage">Copy on Write</a>. 
+After each write operation we will also show how to read the data both snapshot and incrementally.</p>
 
-<h2 id="download-hudi">Download Hudi</h2>
+<h2 id="build-hudi-spark-bundle-jar">Build Hudi spark bundle jar</h2>
 
-<p>Check out <a href="https://github.com/apache/incubator-hudi">code</a> and normally build the maven project, from command line</p>
+<p>Hudi requires Java 8 to be installed on a *nix system. Check out <a href="https://github.com/apache/incubator-hudi">code</a> and 
+normally build the maven project, from command line:</p>
 
-<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mvn clean install -DskipTests -DskipITs
-</code></pre></div></div>
-
-<p>To work with older version of Hive (pre Hive-1.2.1), use</p>
-<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mvn clean install -DskipTests -DskipITs -Dhive11
-</code></pre></div></div>
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># checkout and build
+git clone https://github.com/apache/incubator-hudi.git &amp;&amp; cd incubator-hudi
+mvn clean install -DskipTests -DskipITs
 
-<p>For IDE, you can pull in the code into IntelliJ as a normal maven project. 
-You might want to add your spark jars folder to project dependencies under ‘Module Setttings’, to be able to run from IDE.</p>
-
-<h3 id="version-compatibility">Version Compatibility</h3>
-
-<p>Hudi requires Java 8 to be installed on a *nix system. Hudi works with Spark-2.x versions. 
-Further, we have verified that Hudi works with the following combination of Hadoop/Hive/Spark.</p>
-
-<table>
-  <thead>
-    <tr>
-      <th>Hadoop</th>
-      <th>Hive</th>
-      <th>Spark</th>
-      <th>Instructions to Build Hudi</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td>2.6.0-cdh5.7.2</td>
-      <td>1.1.0-cdh5.7.2</td>
-      <td>spark-2.[1-3].x</td>
-      <td>Use “mvn clean install -DskipTests -Dhadoop.version=2.6.0-cdh5.7.2 -Dhive.version=1.1.0-cdh5.7.2”</td>
-    </tr>
-    <tr>
-      <td>Apache hadoop-2.8.4</td>
-      <td>Apache hive-2.3.3</td>
-      <td>spark-2.[1-3].x</td>
-      <td>Use “mvn clean install -DskipTests”</td>
-    </tr>
-    <tr>
-      <td>Apache hadoop-2.7.3</td>
-      <td>Apache hive-1.2.1</td>
-      <td>spark-2.[1-3].x</td>
-      <td>Use “mvn clean install -DskipTests”</td>
-    </tr>
-  </tbody>
-</table>
-
-<p>If your environment has other versions of hadoop/hive/spark, please try out Hudi 
-and let us know if there are any issues.</p>
-
-<h2 id="generate-sample-dataset">Generate Sample Dataset</h2>
-
-<h3 id="environment-variables">Environment Variables</h3>
-
-<p>Please set the following environment variables according to your setup. We have given an example setup with CDH version</p>
-
-<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd incubator-hudi 
-export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
-export HIVE_HOME=/var/hadoop/setup/apache-hive-1.1.0-cdh5.7.2-bin
-export HADOOP_HOME=/var/hadoop/setup/hadoop-2.6.0-cdh5.7.2
-export HADOOP_INSTALL=/var/hadoop/setup/hadoop-2.6.0-cdh5.7.2
-export HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop
-export SPARK_HOME=/var/hadoop/setup/spark-2.3.1-bin-hadoop2.7
-export SPARK_INSTALL=$SPARK_HOME
-export SPARK_CONF_DIR=$SPARK_HOME/conf
-export PATH=$JAVA_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$SPARK_INSTALL/bin:$PATH
+# Export the location of hudi-spark-bundle for later 
+mkdir -p /tmp/hudi &amp;&amp; cp packaging/hudi-spark-bundle/target/hudi-spark-bundle-*.*.*-SNAPSHOT.jar  /tmp/hudi/hudi-spark-bundle.jar 
+export HUDI_SPARK_BUNDLE_PATH=/tmp/hudi/hudi-spark-bundle.jar
 </code></pre></div></div>
 
-<h3 id="run-hoodiejavaapp">Run HoodieJavaApp</h3>
-
-<p>Run <strong>hudi-spark/src/test/java/HoodieJavaApp.java</strong> class, to place a two commits (commit 1 =&gt; 100 inserts, commit 2 =&gt; 100 updates to previously inserted 100 records) onto your DFS/local filesystem. Use the wrapper script
-to run from command-line</p>
-
-<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd hudi-spark
-./run_hoodie_app.sh --help
-Usage: &lt;main class&gt; [options]
-  Options:
-    --help, -h
-       Default: false
-    --table-name, -n
-       table name for Hudi sample table
-       Default: hoodie_rt
-    --table-path, -p
-       path for Hudi sample table
-       Default: file:///tmp/hoodie/sample-table
-    --table-type, -t
-       One of COPY_ON_WRITE or MERGE_ON_READ
-       Default: COPY_ON_WRITE
-</code></pre></div></div>
+<h2 id="setup-spark-shell">Setup spark-shell</h2>
+<p>Hudi works with Spark-2.x versions. You can follow instructions <a href="https://spark.apache.org/downloads.html">here</a> for 
+setting up spark.</p>
 
-<p>The class lets you choose table names, output paths and one of the storage types. In your own applications, be sure to include the <code class="highlighter-rouge">hudi-spark</code> module as dependency
-and follow a similar pattern to write/read datasets via the datasource.</p>
+<p>From the extracted directory run spark-shell with Hudi as:</p>
 
-<h2 id="query-a-hudi-dataset">Query a Hudi dataset</h2>
-
-<p>Next, we will register the sample dataset into Hive metastore and try to query using <a href="#hive">Hive</a>, <a href="#spark">Spark</a> &amp; <a href="#presto">Presto</a></p>
-
-<h3 id="start-hive-server-locally">Start Hive Server locally</h3>
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bin/spark-shell --jars $HUDI_SPARK_BUNDLE_PATH --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+</code></pre></div></div>
 
-<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hdfs namenode # start name node
-hdfs datanode # start data node
+<p>Setup table name, base path and a data generator to generate records for this guide.</p>
 
-bin/hive --service metastore  # start metastore
-bin/hiveserver2 \
-  --hiveconf hive.root.logger=INFO,console \
-  --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat \
-  --hiveconf hive.stats.autogather=false \
-  --hiveconf hive.aux.jars.path=/path/to/packaging/hudi-hive-bundle/target/hudi-hive-bundle-0.4.6-SNAPSHOT.jar
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>import org.apache.hudi.QuickstartUtils._
+import scala.collection.JavaConversions._
+import org.apache.spark.sql.SaveMode._
+import org.apache.hudi.DataSourceReadOptions._
+import org.apache.hudi.DataSourceWriteOptions._
+import org.apache.hudi.config.HoodieWriteConfig._
 
+val tableName = "hudi_cow_table"
+val basePath = "file:///tmp/hudi_cow_table"
+val dataGen = new DataGenerator
 </code></pre></div></div>
 
-<h3 id="run-hive-sync-tool">Run Hive Sync Tool</h3>
-<p>Hive Sync Tool will update/create the necessary metadata(schema and partitions) in hive metastore. This allows for schema evolution and incremental addition of new partitions written to.
-It uses an incremental approach by storing the last commit time synced in the TBLPROPERTIES and only syncing the commits from the last sync commit time stored.
-Both <a href="writing_data.html#datasource-writer">Spark Datasource</a> &amp; <a href="writing_data.html#deltastreamer">DeltaStreamer</a> have capability to do this, after each write.</p>
-
-<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd hudi-hive
-./run_sync_tool.sh
-  --user hive
-  --pass hive
-  --database default
-  --jdbc-url "jdbc:hive2://localhost:10010/"
-  --base-path tmp/hoodie/sample-table/
-  --table hoodie_test
-  --partitioned-by field1,field2
+<p>The <a href="https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java">DataGenerator</a> 
+can generate sample inserts and updates based on the the sample trip schema 
+<a href="https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L57">here</a></p>
+
+<h2 id="inserts">Insert data</h2>
+<p>Generate some new trips, load them into a DataFrame and write the DataFrame into the Hudi dataset as below.</p>
+
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>val inserts = convertToStringList(dataGen.generateInserts(10))
+val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
+df.write.format("org.apache.hudi").
+    options(getQuickstartWriteConfigs).
+    option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+    option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+    option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+    option(TABLE_NAME, tableName).
+    mode(Overwrite).
+    save(basePath);
+</code></pre></div></div>
 
+<p><code class="highlighter-rouge">mode(Overwrite)</code> overwrites and recreates the dataset if it already exists.
+You can check the data generated under <code class="highlighter-rouge">/tmp/hudi_cow_table/&lt;region&gt;/&lt;country&gt;/&lt;city&gt;/</code>. We provided a record key 
+(<code class="highlighter-rouge">uuid</code> in <a href="#sample-schema">schema</a>), partition field (<code class="highlighter-rouge">region/county/city</code>) and combine logic (<code class="highlighter-rouge">ts</code> in 
+<a href="#sample-schema">schema</a>) to ensure trip records are unique within each partition. For more info, refer to 
+<a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-HowdoImodelthedatastoredinHudi?">Modeling data stored in Hudi</a>
+and for info on ways to ingest data into Hudi, refer to <a href="https://hudi.apache.org/writing_data.html">Writing Hudi Datasets</a>.
+Here we are using the default write operation : <code class="highlighter-rouge">upsert</code>. If you have a workload without updates, you can also issue 
+<code class="highlighter-rouge">insert</code> or <code class="highlighter-rouge">bulk_insert</code> operations which could be faster. To know more, refer to 
+<a href="https://hudi.apache.org/writing_data.html#write-operations">Write operations</a></p>
+
+<h2 id="query">Query data</h2>
+<p>Load the data files into a DataFrame.</p>
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>val roViewDF = spark.
+    read.
+    format("org.apache.hudi").
+    load(basePath + "/*/*/*/*")
+roViewDF.registerTempTable("hudi_ro_table")
+spark.sql("select fare, begin_lon, begin_lat, ts from  hudi_ro_table where fare &gt; 20.0").show()
+spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, rider, driver, fare from  hudi_ro_table").show()
 </code></pre></div></div>
-<p>For some reason, if you want to do this by hand. Please 
-follow <a href="https://cwiki.apache.org/confluence/display/HUDI/Registering+sample+dataset+to+Hive+via+beeline">this</a>.</p>
-
-<h3 id="hive">HiveQL</h3>
-
-<p>Let’s first perform a query on the latest committed snapshot of the table</p>
-
-<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hive&gt; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
-hive&gt; set hive.stats.autogather=false;
-hive&gt; add jar file:///path/to/hudi-hive-bundle-0.4.6-SNAPSHOT.jar;
-hive&gt; select count(*) from hoodie_test;
-...
-OK
-100
-Time taken: 18.05 seconds, Fetched: 1 row(s)
-hive&gt;
+<p>This query provides a read optimized view of the ingested data. Since our partition path (<code class="highlighter-rouge">region/country/city</code>) is 3 levels nested 
+from base path we ve used <code class="highlighter-rouge">load(basePath + "/*/*/*/*")</code>. 
+Refer to <a href="https://hudi.apache.org/concepts.html#storage-types--views">Storage Types and Views</a> for more info on all storage types and views supported.</p>
+
+<h2 id="updates">Update data</h2>
+<p>This is similar to inserting new data. Generate updates to existing trips using the data generator, load into a DataFrame 
+and write DataFrame into the hudi dataset.</p>
+
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>val updates = convertToStringList(dataGen.generateUpdates(10))
+val df = spark.read.json(spark.sparkContext.parallelize(updates, 2));
+df.write.format("org.apache.hudi").
+    options(getQuickstartWriteConfigs).
+    option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+    option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+    option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+    option(TABLE_NAME, tableName).
+    mode(Append).
+    save(basePath);
 </code></pre></div></div>
 
-<h3 id="spark">SparkSQL</h3>
-
-<p>Spark is super easy, once you get Hive working as above. Just spin up a Spark Shell as below</p>
+<p>Notice that the save mode is now <code class="highlighter-rouge">Append</code>. In general, always use append mode unless you are trying to create the dataset for the first time.
+<a href="#query">Querying</a> the data again will now show updated trips. Each write operation generates a new <a href="http://hudi.incubator.apache.org/concepts.html">commit</a> 
+denoted by the timestamp. Look for changes in <code class="highlighter-rouge">_hoodie_commit_time</code>, <code class="highlighter-rouge">rider</code>, <code class="highlighter-rouge">driver</code> fields for the same <code class="highlighter-rouge">_hoodie_record_key</code>s in previous commit.</p>
 
-<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd $SPARK_INSTALL
-$ spark-shell --jars $HUDI_SRC/packaging/hudi-spark-bundle/target/hudi-spark-bundle-0.4.6-SNAPSHOT.jar --driver-class-path $HADOOP_CONF_DIR  --conf spark.sql.hive.convertMetastoreParquet=false --packages com.databricks:spark-avro_2.11:4.0.0
-
-scala&gt; val sqlContext = new org.apache.spark.sql.SQLContext(sc)
-scala&gt; sqlContext.sql("show tables").show(10000)
-scala&gt; sqlContext.sql("describe hoodie_test").show(10000)
-scala&gt; sqlContext.sql("describe hoodie_test_rt").show(10000)
-scala&gt; sqlContext.sql("select count(*) from hoodie_test").show(10000)
-</code></pre></div></div>
+<h2 id="incremental-query">Incremental query</h2>
 
-<h3 id="presto">Presto</h3>
+<p>Hudi also provides capability to obtain a stream of records that changed since given commit timestamp. 
+This can be achieved using Hudi’s incremental view and providing a begin time from which changes need to be streamed. 
+We do not need to specify endTime, if we want all changes after the given commit (as is the common case).</p>
 
-<p>Checkout the ‘master’ branch on OSS Presto, build it, and place your installation somewhere.</p>
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>val commits = spark.sql("select distinct(_hoodie_commit_time) as commitTime from  hudi_ro_table order by commitTime").map(k =&gt; k.getString(0)).take(50)
+val beginTime = commits(commits.length - 2) // commit time we are interested in
 
-<ul>
-  <li>Copy the hudi/packaging/hudi-presto-bundle/target/hudi-presto-bundle-*.jar into $PRESTO_INSTALL/plugin/hive-hadoop2/</li>
-  <li>Startup your server and you should be able to query the same Hive table via Presto</li>
-</ul>
-
-<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>show columns from hive.default.hoodie_test;
-select count(*) from hive.default.hoodie_test
+// incrementally query data
+val incViewDF = spark.
+    read.
+    format("org.apache.hudi").
+    option(VIEW_TYPE_OPT_KEY, VIEW_TYPE_INCREMENTAL_OPT_VAL).
+    option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
+    load(basePath);
+incViewDF.registerTempTable("hudi_incr_table")
+spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  hudi_incr_table where fare &gt; 20.0").show()
 </code></pre></div></div>
-
-<h3 id="incremental-hiveql">Incremental HiveQL</h3>
-
-<p>Let’s now perform a query, to obtain the <strong>ONLY</strong> changed rows since a commit in the past.</p>
-
-<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hive&gt; set hoodie.hoodie_test.consume.mode=INCREMENTAL;
-hive&gt; set hoodie.hoodie_test.consume.start.timestamp=001;
-hive&gt; set hoodie.hoodie_test.consume.max.commits=10;
-hive&gt; select `_hoodie_commit_time`, rider, driver from hoodie_test where `_hoodie_commit_time` &gt; '001' limit 10;
-OK
-All commits :[001, 002]
-002	rider-001	driver-001
-002	rider-001	driver-001
-002	rider-002	driver-002
-002	rider-001	driver-001
-002	rider-001	driver-001
-002	rider-002	driver-002
-002	rider-001	driver-001
-002	rider-002	driver-002
-002	rider-002	driver-002
-002	rider-001	driver-001
-Time taken: 0.056 seconds, Fetched: 10 row(s)
-hive&gt;
-hive&gt;
+<p>This will give all changes that happened after the beginTime commit with the filter of fare &gt; 20.0. The unique thing about this
+feature is that it now lets you author streaming pipelines on batch data.</p>
+
+<h2 id="point-in-time-query">Point in time query</h2>
+<p>Lets look at how to query data as of a specific time. The specific time can be represented by pointing endTime to a 
+specific commit time and beginTime to “000” (denoting earliest possible commit time).</p>
+
+<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>val beginTime = "000" // Represents all commits &gt; this time.
+val endTime = commits(commits.length - 2) // commit time we are interested in
+
+//incrementally query data
+val incViewDF = spark.read.format("org.apache.hudi").
+    option(VIEW_TYPE_OPT_KEY, VIEW_TYPE_INCREMENTAL_OPT_VAL).
+    option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
+    option(END_INSTANTTIME_OPT_KEY, endTime).
+    load(basePath);
+incViewDF.registerTempTable("hudi_incr_table")
+spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  hudi_incr_table where fare &gt; 20.0").show()
 </code></pre></div></div>
 
-<p>This is only supported for Read-optimized view for now.”</p>
+<h2 id="where-to-go-from-here">Where to go from here?</h2>
+<p>Here, we used Spark to show case the capabilities of Hudi. However, Hudi can support multiple storage types/views and 
+Hudi datasets can be queried from query engines like Hive, Spark, Presto and much more. We have put together a 
+<a href="https://www.youtube.com/watch?v=VhNgUsxdrD0">demo video</a> that showcases all of this on a docker based setup with all 
+dependent systems running locally. We recommend you replicate the same setup and run the demo yourself, by following 
+steps <a href="docker_demo.html">here</a> to get a taste for it. Also, if you are looking for ways to migrate your existing data 
+to Hudi, refer to <a href="migration_guide.html">migration guide</a>.</p>
 
 
     <div class="tags">
diff --git a/content/search.json b/content/search.json
index 100d57b..8d219dc 100644
--- a/content/search.json
+++ b/content/search.json
@@ -54,7 +54,7 @@
 "tags": "",
 "keywords": "hudi, administration, operation, devops",
 "url": "cnadmin_guide.html",
-"summary": "This section offers an overview of tools available to operate an ecosystem of Hudi datasets"
+"summary": "本节概述了可用于操作Hudi数据集生态系统的工具"
 }
 ,
 
@@ -228,11 +228,11 @@
 
 
 {
-"title": "What is Hudi?",
+"title": "什么是Hudi?",
 "tags": "getting_started",
 "keywords": "big data, stream processing, cloud, hdfs, storage, upserts, change capture",
 "url": "cnindex.html",
-"summary": "Hudi brings stream processing to big data, providing fresh data while being an order of magnitude efficient over traditional batch processing."
+"summary": "Hudi为大数据带来流处理，在提供新数据的同时，比传统的批处理效率高出一个数量级。"
 }
 ,
 
diff --git a/content/writing_data.html b/content/writing_data.html
index 4326c32..39e0d16 100644
--- a/content/writing_data.html
+++ b/content/writing_data.html
@@ -503,6 +503,8 @@ Usage: &lt;main class&gt; [options]
        Default: false
   * --jdbc-url
        Hive jdbc connect url
+  * --use-jdbc
+       Whether to use jdbc connection or hive metastore (via thrift)
   * --pass
        Hive password
   * --table