You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by bh...@apache.org on 2019/11/14 14:49:09 UTC

[incubator-hudi] 01/02: [MINOR] Cosmetic improvements to site

This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git

commit b57bbd0228a17b3571708f17c88ceed192d18b21
Author: vinothchandar <vi...@apache.org>
AuthorDate: Thu Nov 14 06:01:34 2019 -0800

    [MINOR] Cosmetic improvements to site
    
     - Clearer code highlighting, using black font, light blue background
     - Fix version number
     - Fix nav header scroll,scrambled characters issue
---
 docs/_data/sidebars/mydoc_sidebar.yml |  4 +--
 docs/admin_guide.cn.md                | 38 ++++++++++----------
 docs/admin_guide.md                   | 38 ++++++++++----------
 docs/configurations.cn.md             |  4 +--
 docs/configurations.md                |  4 +--
 docs/css/lavish-bootstrap.css         |  7 ++--
 docs/docker_demo.cn.md                | 44 +++++++++++------------
 docs/docker_demo.md                   | 66 +++++++++++++++--------------------
 docs/migration_guide.cn.md            |  9 +++--
 docs/migration_guide.md               | 15 ++++----
 docs/querying_data.cn.md              |  8 ++---
 docs/querying_data.md                 |  8 ++---
 docs/quickstart.cn.md                 | 24 ++++++-------
 docs/quickstart.md                    | 25 +++++++------
 docs/s3_filesystem.cn.md              |  4 +--
 docs/s3_filesystem.md                 |  4 +--
 docs/writing_data.cn.md               | 12 +++----
 docs/writing_data.md                  | 12 +++----
 18 files changed, 161 insertions(+), 165 deletions(-)

diff --git a/docs/_data/sidebars/mydoc_sidebar.yml b/docs/_data/sidebars/mydoc_sidebar.yml
index 9e4ec1e..040c4c4 100644
--- a/docs/_data/sidebars/mydoc_sidebar.yml
+++ b/docs/_data/sidebars/mydoc_sidebar.yml
@@ -2,8 +2,8 @@
 
 entries:
 - title: sidebar
-  product: Latest Version
-  version: 0.5.0-incubating
+  product: Version
+  version: (0.5.0-incubating)
   folders:
 
   - title: Getting Started
diff --git a/docs/admin_guide.cn.md b/docs/admin_guide.cn.md
index 2ba04a5..9e4f542 100644
--- a/docs/admin_guide.cn.md
+++ b/docs/admin_guide.cn.md
@@ -23,7 +23,7 @@ Hudi库使用.hoodie子文件夹跟踪所有元数据,从而有效地在内部
 
 初始化hudi表,可使用如下命令。
 
-```
+```Java
 18/09/06 15:56:52 INFO annotation.AutowiredAnnotationBeanPostProcessor: JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
 ============================================
 *                                          *
@@ -44,7 +44,7 @@ hudi->create --path /user/hive/warehouse/table1 --tableName hoodie_table_1 --tab
 
 To see the description of hudi table, use the command:
 
-```
+```Java
 hoodie:hoodie_table_1->desc
 18/09/06 15:57:19 INFO timeline.HoodieActiveTimeline: Loaded instants []
     _________________________________________________________
@@ -60,7 +60,7 @@ hoodie:hoodie_table_1->desc
 
 以下是连接到包含uber trips的Hudi数据集的示例命令。
 
-```
+```Java
 hoodie:trips->connect --path /app/uber/trips
 
 16/10/05 23:20:37 INFO model.HoodieTableMetadata: Attempting to load the commits under /app/uber/trips/.hoodie with suffix .commit
@@ -73,7 +73,7 @@ hoodie:trips->
 连接到数据集后,便可使用许多其他命令。该shell程序具有上下文自动完成帮助(按TAB键),下面是所有命令的列表,本节中对其中的一些命令进行了详细示例。
 
 
-```
+```Java
 hoodie:trips->help
 * ! - Allows execution of operating system (OS) commands
 * // - Inline comment markers (start of line only)
@@ -114,7 +114,7 @@ hoodie:trips->
 查看有关最近10次提交的一些基本信息,
 
 
-```
+```Java
 hoodie:trips->commits show --sortBy "Total Bytes Written" --desc true --limit 10
     ________________________________________________________________________________________________________________________________________________________________________
     | CommitTime    | Total Bytes Written| Total Files Added| Total Files Updated| Total Partitions Written| Total Records Written| Total Update Records Written| Total Errors|
@@ -127,7 +127,7 @@ hoodie:trips->
 
 在每次写入开始时,Hudi还将.inflight提交写入.hoodie文件夹。您可以使用那里的时间戳来估计正在进行的提交已经花费的时间
 
-```
+```Java
 $ hdfs dfs -ls /app/uber/trips/.hoodie/*.inflight
 -rw-r--r--   3 vinoth supergroup     321984 2016-10-05 23:18 /app/uber/trips/.hoodie/20161005225920.inflight
 ```
@@ -138,7 +138,7 @@ $ hdfs dfs -ls /app/uber/trips/.hoodie/*.inflight
 了解写入如何分散到特定分区,
 
 
-```
+```Java
 hoodie:trips->commit showpartitions --commit 20161005165855 --sortBy "Total Bytes Written" --desc true --limit 10
     __________________________________________________________________________________________________________________________________________
     | Partition Path| Total Files Added| Total Files Updated| Total Records Inserted| Total Records Updated| Total Bytes Written| Total Errors|
@@ -149,7 +149,7 @@ hoodie:trips->commit showpartitions --commit 20161005165855 --sortBy "Total Byte
 
 如果您需要文件级粒度,我们可以执行以下操作
 
-```
+```Java
 hoodie:trips->commit showfiles --commit 20161005165855 --sortBy "Partition Path"
     ________________________________________________________________________________________________________________________________________________________
     | Partition Path| File ID                             | Previous Commit| Total Records Updated| Total Records Written| Total Bytes Written| Total Errors|
@@ -163,7 +163,7 @@ hoodie:trips->commit showfiles --commit 20161005165855 --sortBy "Partition Path"
 
 Hudi将每个分区视为文件组的集合,每个文件组包含按提交顺序排列的文件切片列表(请参阅概念)。以下命令允许用户查看数据集的文件切片。
 
-```
+```Java
  hoodie:stock_ticks_mor->show fsview all
  ....
   _______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
@@ -189,7 +189,7 @@ Hudi将每个分区视为文件组的集合,每个文件组包含按提交顺
 由于Hudi直接管理DFS数据集的文件大小,这些信息会帮助你全面了解Hudi的运行状况
 
 
-```
+```Java
 hoodie:trips->stats filesizes --partitionPath 2016/09/01 --sortBy "95th" --desc true --limit 10
     ________________________________________________________________________________________________
     | CommitTime    | Min     | 10th    | 50th    | avg     | 95th    | Max     | NumFiles| StdDev  |
@@ -201,7 +201,7 @@ hoodie:trips->stats filesizes --partitionPath 2016/09/01 --sortBy "95th" --desc
 
 如果Hudi写入花费的时间更长,那么可以通过观察写放大指标来发现任何异常
 
-```
+```Java
 hoodie:trips->stats wa
     __________________________________________________________________________
     | CommitTime    | Total Upserted| Total Written| Write Amplifiation Factor|
@@ -220,7 +220,7 @@ hoodie:trips->stats wa
 
 要了解压缩和写程序之间的时滞,请使用以下命令列出所有待处理的压缩。
 
-```
+```Java
 hoodie:trips->compactions show all
      ___________________________________________________________________
     | Compaction Instant Time| State    | Total FileIds to be Compacted|
@@ -231,7 +231,7 @@ hoodie:trips->compactions show all
 
 要检查特定的压缩计划,请使用
 
-```
+```Java
 hoodie:trips->compaction show --instant <INSTANT_1>
     _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
     | Partition Path| File Id | Base Instant  | Data File Path                                    | Total Delta Files| getMetrics                                                                                                                    |
@@ -243,7 +243,7 @@ hoodie:trips->compaction show --instant <INSTANT_1>
 要手动调度或运行压缩,请使用以下命令。该命令使用spark启动器执行压缩操作。
 注意:确保没有其他应用程序正在同时调度此数据集的压缩
 
-```
+```Java
 hoodie:trips->help compaction schedule
 Keyword:                   compaction schedule
 Description:               Schedule Compaction
@@ -256,7 +256,7 @@ Description:               Schedule Compaction
 * compaction schedule - Schedule Compaction
 ```
 
-```
+```Java
 hoodie:trips->help compaction run
 Keyword:                   compaction run
 Description:               Run Compaction for given instant time
@@ -303,7 +303,7 @@ Description:               Run Compaction for given instant time
 
 验证压缩计划:检查压缩所需的所有文件是否都存在且有效
 
-```
+```Java
 hoodie:stock_ticks_mor->compaction validate --instant 20181005222611
 ...
 
@@ -336,7 +336,7 @@ hoodie:stock_ticks_mor->compaction validate --instant 20181005222601
 
 ##### 取消调度压缩
 
-```
+```Java
 hoodie:trips->compaction unscheduleFileId --fileId <FileUUID>
 ....
 No File renames needed to unschedule file from pending compaction. Operation successful.
@@ -344,7 +344,7 @@ No File renames needed to unschedule file from pending compaction. Operation suc
 
 在其他情况下,需要撤销整个压缩计划。以下CLI支持此功能
 
-```
+```Java
 hoodie:trips->compaction unschedule --compactionInstant <compactionInstant>
 .....
 No File renames needed to unschedule pending compaction. Operation successful.
@@ -357,7 +357,7 @@ No File renames needed to unschedule pending compaction. Operation successful.
 当您运行`压缩验证`时,您会注意到无效的压缩操作(如果有的话)。
 在这种情况下,修复命令将立即执行,它将重新排列文件切片,以使文件不丢失,并且文件切片与压缩计划一致
 
-```
+```Java
 hoodie:stock_ticks_mor->compaction repair --instant 20181005222611
 ......
 Compaction successfully repaired
diff --git a/docs/admin_guide.md b/docs/admin_guide.md
index 96ff639..4d267cb 100644
--- a/docs/admin_guide.md
+++ b/docs/admin_guide.md
@@ -23,7 +23,7 @@ Hudi library effectively manages this dataset internally, using .hoodie subfolde
 
 To initialize a hudi table, use the following command.
 
-```
+```Java
 18/09/06 15:56:52 INFO annotation.AutowiredAnnotationBeanPostProcessor: JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
 ============================================
 *                                          *
@@ -44,7 +44,7 @@ hudi->create --path /user/hive/warehouse/table1 --tableName hoodie_table_1 --tab
 
 To see the description of hudi table, use the command:
 
-```
+```Java
 hoodie:hoodie_table_1->desc
 18/09/06 15:57:19 INFO timeline.HoodieActiveTimeline: Loaded instants []
     _________________________________________________________
@@ -60,7 +60,7 @@ hoodie:hoodie_table_1->desc
 
 Following is a sample command to connect to a Hudi dataset contains uber trips.
 
-```
+```Java
 hoodie:trips->connect --path /app/uber/trips
 
 16/10/05 23:20:37 INFO model.HoodieTableMetadata: Attempting to load the commits under /app/uber/trips/.hoodie with suffix .commit
@@ -74,7 +74,7 @@ Once connected to the dataset, a lot of other commands become available. The she
 are reviewed
 
 
-```
+```Java
 hoodie:trips->help
 * ! - Allows execution of operating system (OS) commands
 * // - Inline comment markers (start of line only)
@@ -115,7 +115,7 @@ Each commit has a monotonically increasing string/number called the **commit num
 To view some basic information about the last 10 commits,
 
 
-```
+```Java
 hoodie:trips->commits show --sortBy "Total Bytes Written" --desc true --limit 10
     ________________________________________________________________________________________________________________________________________________________________________
     | CommitTime    | Total Bytes Written| Total Files Added| Total Files Updated| Total Partitions Written| Total Records Written| Total Update Records Written| Total Errors|
@@ -129,7 +129,7 @@ hoodie:trips->
 At the start of each write, Hudi also writes a .inflight commit to the .hoodie folder. You can use the timestamp there to estimate how long the commit has been inflight
 
 
-```
+```Java
 $ hdfs dfs -ls /app/uber/trips/.hoodie/*.inflight
 -rw-r--r--   3 vinoth supergroup     321984 2016-10-05 23:18 /app/uber/trips/.hoodie/20161005225920.inflight
 ```
@@ -140,7 +140,7 @@ $ hdfs dfs -ls /app/uber/trips/.hoodie/*.inflight
 To understand how the writes spread across specific partiions,
 
 
-```
+```Java
 hoodie:trips->commit showpartitions --commit 20161005165855 --sortBy "Total Bytes Written" --desc true --limit 10
     __________________________________________________________________________________________________________________________________________
     | Partition Path| Total Files Added| Total Files Updated| Total Records Inserted| Total Records Updated| Total Bytes Written| Total Errors|
@@ -152,7 +152,7 @@ hoodie:trips->commit showpartitions --commit 20161005165855 --sortBy "Total Byte
 If you need file level granularity , we can do the following
 
 
-```
+```Java
 hoodie:trips->commit showfiles --commit 20161005165855 --sortBy "Partition Path"
     ________________________________________________________________________________________________________________________________________________________
     | Partition Path| File ID                             | Previous Commit| Total Records Updated| Total Records Written| Total Bytes Written| Total Errors|
@@ -167,7 +167,7 @@ hoodie:trips->commit showfiles --commit 20161005165855 --sortBy "Partition Path"
 Hudi views each partition as a collection of file-groups with each file-group containing a list of file-slices in commit
 order (See Concepts). The below commands allow users to view the file-slices for a data-set.
 
-```
+```Java
  hoodie:stock_ticks_mor->show fsview all
  ....
   _______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
@@ -193,7 +193,7 @@ order (See Concepts). The below commands allow users to view the file-slices for
 Since Hudi directly manages file sizes for DFS dataset, it might be good to get an overall picture
 
 
-```
+```Java
 hoodie:trips->stats filesizes --partitionPath 2016/09/01 --sortBy "95th" --desc true --limit 10
     ________________________________________________________________________________________________
     | CommitTime    | Min     | 10th    | 50th    | avg     | 95th    | Max     | NumFiles| StdDev  |
@@ -206,7 +206,7 @@ hoodie:trips->stats filesizes --partitionPath 2016/09/01 --sortBy "95th" --desc
 In case of Hudi write taking much longer, it might be good to see the write amplification for any sudden increases
 
 
-```
+```Java
 hoodie:trips->stats wa
     __________________________________________________________________________
     | CommitTime    | Total Upserted| Total Written| Write Amplifiation Factor|
@@ -227,7 +227,7 @@ This is a sequence file that contains a mapping from commitNumber => json with r
 To get an idea of the lag between compaction and writer applications, use the below command to list down all
 pending compactions.
 
-```
+```Java
 hoodie:trips->compactions show all
      ___________________________________________________________________
     | Compaction Instant Time| State    | Total FileIds to be Compacted|
@@ -238,7 +238,7 @@ hoodie:trips->compactions show all
 
 To inspect a specific compaction plan, use
 
-```
+```Java
 hoodie:trips->compaction show --instant <INSTANT_1>
     _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
     | Partition Path| File Id | Base Instant  | Data File Path                                    | Total Delta Files| getMetrics                                                                                                                    |
@@ -250,7 +250,7 @@ hoodie:trips->compaction show --instant <INSTANT_1>
 To manually schedule or run a compaction, use the below command. This command uses spark launcher to perform compaction
 operations. NOTE : Make sure no other application is scheduling compaction for this dataset concurrently
 
-```
+```Java
 hoodie:trips->help compaction schedule
 Keyword:                   compaction schedule
 Description:               Schedule Compaction
@@ -263,7 +263,7 @@ Description:               Schedule Compaction
 * compaction schedule - Schedule Compaction
 ```
 
-```
+```Java
 hoodie:trips->help compaction run
 Keyword:                   compaction run
 Description:               Run Compaction for given instant time
@@ -310,7 +310,7 @@ Description:               Run Compaction for given instant time
 
 Validating a compaction plan : Check if all the files necessary for compactions are present and are valid
 
-```
+```Java
 hoodie:stock_ticks_mor->compaction validate --instant 20181005222611
 ...
 
@@ -344,7 +344,7 @@ so that are preserved. Hudi provides the following CLI to support it
 
 ##### UnScheduling Compaction
 
-```
+```Java
 hoodie:trips->compaction unscheduleFileId --fileId <FileUUID>
 ....
 No File renames needed to unschedule file from pending compaction. Operation successful.
@@ -352,7 +352,7 @@ No File renames needed to unschedule file from pending compaction. Operation suc
 
 In other cases, an entire compaction plan needs to be reverted. This is supported by the following CLI
 
-```
+```Java
 hoodie:trips->compaction unschedule --compactionInstant <compactionInstant>
 .....
 No File renames needed to unschedule pending compaction. Operation successful.
@@ -366,7 +366,7 @@ partial failures, the compaction operation could become inconsistent with the st
 command comes to the rescue, it will rearrange the file-slices so that there is no loss and the file-slices are
 consistent with the compaction plan
 
-```
+```Java
 hoodie:stock_ticks_mor->compaction repair --instant 20181005222611
 ......
 Compaction successfully repaired
diff --git a/docs/configurations.cn.md b/docs/configurations.cn.md
index 8dcb34a..7b7397d 100644
--- a/docs/configurations.cn.md
+++ b/docs/configurations.cn.md
@@ -37,7 +37,7 @@ summary: 在这里,我们列出了所有可能的配置及其含义。
 
 另外,您可以使用`options()`或`option(k,v)`方法直接传递任何WriteClient级别的配置。
 
-```
+```Java
 inputDF.write()
 .format("org.apache.hudi")
 .options(clientOpts) // 任何Hudi客户端选项都可以传入
@@ -159,7 +159,7 @@ inputDF.write()
 直接使用RDD级别api进行编程的Jobs可以构建一个`HoodieWriteConfig`对象,并将其传递给`HoodieWriteClient`构造函数。
 HoodieWriteConfig可以使用以下构建器模式构建。
 
-```
+```Java
 HoodieWriteConfig cfg = HoodieWriteConfig.newBuilder()
         .withPath(basePath)
         .forTable(tableName)
diff --git a/docs/configurations.md b/docs/configurations.md
index 3f16e3b..3e303c1 100644
--- a/docs/configurations.md
+++ b/docs/configurations.md
@@ -39,7 +39,7 @@ The actual datasource level configs are listed below.
 
 Additionally, you can pass down any of the WriteClient level configs directly using `options()` or `option(k,v)` methods.
 
-```
+```Java
 inputDF.write()
 .format("org.apache.hudi")
 .options(clientOpts) // any of the Hudi client opts can be passed in as well
@@ -164,7 +164,7 @@ Property: `hoodie.datasource.read.end.instanttime`, Default: latest instant (i.e
 Jobs programming directly against the RDD level apis can build a `HoodieWriteConfig` object and pass it in to the `HoodieWriteClient` constructor. 
 HoodieWriteConfig can be built using a builder pattern as below. 
 
-```
+```Java
 HoodieWriteConfig cfg = HoodieWriteConfig.newBuilder()
         .withPath(basePath)
         .forTable(tableName)
diff --git a/docs/css/lavish-bootstrap.css b/docs/css/lavish-bootstrap.css
index a050c9a..6a0f52f 100644
--- a/docs/css/lavish-bootstrap.css
+++ b/docs/css/lavish-bootstrap.css
@@ -600,7 +600,7 @@ code {
   padding: 2px 4px;
   font-size: 90%;
   color: #444;
-  background-color: #f0f0f0;
+  background-color: #04b3f90d;
   white-space: nowrap;
   border-radius: 4px;
 }
@@ -613,8 +613,8 @@ pre {
   line-height: 1.428571429;
   word-break: break-all;
   word-wrap: break-word;
-  color: #77777a;
-  background-color: #f5f5f5;
+  color: #000000;
+  background-color: #04b3f90d;
   border: 1px solid #cccccc;
   border-radius: 4px;
 }
@@ -3730,6 +3730,7 @@ textarea.input-group-sm > .input-group-btn > .btn {
   }
   .navbar-right {
     float: right !important;
+    background-color: white;
   }
 }
 .navbar-form {
diff --git a/docs/docker_demo.cn.md b/docs/docker_demo.cn.md
index 6f3d72b..83868fb 100644
--- a/docs/docker_demo.cn.md
+++ b/docs/docker_demo.cn.md
@@ -23,7 +23,7 @@ The steps have been tested on a Mac laptop
   * /etc/hosts : The demo references many services running in container by the hostname. Add the following settings to /etc/hosts
 
 
-```
+```Java
    127.0.0.1 adhoc-1
    127.0.0.1 adhoc-2
    127.0.0.1 namenode
@@ -44,7 +44,7 @@ Also, this has not been tested on some environments like Docker on Windows.
 #### Build Hudi
 
 The first step is to build hudi
-```
+```Java
 cd <HUDI_WORKSPACE>
 mvn package -DskipTests
 ```
@@ -54,7 +54,7 @@ mvn package -DskipTests
 The next step is to run the docker compose script and setup configs for bringing up the cluster.
 This should pull the docker images from docker hub and setup docker cluster.
 
-```
+```Java
 cd docker
 ./setup_demo.sh
 ....
@@ -107,7 +107,7 @@ The batches are windowed intentionally so that the second batch contains updates
 
 Upload the first batch to Kafka topic 'stock ticks'
 
-```
+```Java
 cat docker/demo/data/batch_1.json | kafkacat -b kafkabroker -t stock_ticks -P
 
 To check if the new topic shows up, use
@@ -158,7 +158,7 @@ pull changes and apply to Hudi dataset using upsert/insert primitives. Here, we
 json data from kafka topic and ingest to both COW and MOR tables we initialized in the previous step. This tool
 automatically initializes the datasets in the file-system if they do not exist yet.
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 
 # Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_cow dataset in HDFS
@@ -198,7 +198,7 @@ There will be a similar setup when you browse the MOR dataset
 At this step, the datasets are available in HDFS. We need to sync with Hive to create new Hive tables and add partitions
 inorder to run Hive queries against those datasets.
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 
 # THis command takes in HIveServer URL and COW Hudi Dataset location in HDFS and sync the HDFS state to Hive
@@ -229,7 +229,7 @@ Run a hive query to find the latest timestamp ingested for stock symbol 'GOOG'.
 (for both COW and MOR dataset)and realtime views (for MOR dataset)give the same value "10:29 a.m" as Hudi create a
 parquet file for the first batch of data.
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 beeline -u jdbc:hive2://hiveserver:10000 --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
 # List Tables
@@ -332,7 +332,7 @@ exit
 Hudi support Spark as query processor just like Hive. Here are the same hive queries
 running in spark-sql
 
-```
+```Java
 docker exec -it adhoc-1 /bin/bash
 $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --master local[2] --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --executor-memory 3G --num-executors 1  --packages com.databricks:spark-avro_2.11:4.0.0
 ...
@@ -432,7 +432,7 @@ scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close
 Upload the second batch of data and ingest this batch using delta-streamer. As this batch does not bring in any new
 partitions, there is no need to run hive-sync
 
-```
+```Java
 cat docker/demo/data/batch_2.json | kafkacat -b kafkabroker -t stock_ticks -P
 
 # Within Docker container, run the ingestion command
@@ -464,7 +464,7 @@ This is the time, when ReadOptimized and Realtime views will provide different r
 return "10:29 am" as it will only read from the Parquet file. Realtime View will do on-the-fly merge and return
 latest committed data which is "10:59 a.m".
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 beeline -u jdbc:hive2://hiveserver:10000 --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
 
@@ -535,7 +535,7 @@ exit
 
 Running the same queries in Spark-SQL:
 
-```
+```Java
 docker exec -it adhoc-1 /bin/bash
 bash-4.4# $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --master local[2] --executor-memory 3G --num-executors 1  --packages com.databricks:spark-avro_2.11:4.0.0
 
@@ -605,7 +605,7 @@ With 2 batches of data ingested, lets showcase the support for incremental queri
 
 Lets take the same projection query example
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 beeline -u jdbc:hive2://hiveserver:10000 --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
 
@@ -629,7 +629,7 @@ the commit time of the first batch (20180924064621) and run incremental query
 Hudi incremental mode provides efficient scanning for incremental queries by filtering out files that do not have any
 candidate rows using hudi-managed metadata.
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 beeline -u jdbc:hive2://hiveserver:10000 --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
 0: jdbc:hive2://hiveserver:10000> set hoodie.stock_ticks_cow.consume.mode=INCREMENTAL;
@@ -642,7 +642,7 @@ No rows affected (0.009 seconds)
 With the above setting, file-ids that do not have any updates from the commit 20180924065039 is filtered out without scanning.
 Here is the incremental query :
 
-```
+```Java
 0: jdbc:hive2://hiveserver:10000>
 0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_cow where  symbol = 'GOOG' and `_hoodie_commit_time` > '20180924064621';
 +----------------------+---------+----------------------+---------+------------+-----------+--+
@@ -655,7 +655,7 @@ Here is the incremental query :
 ```
 
 ##### Incremental Query with Spark SQL:
-```
+```Java
 docker exec -it adhoc-1 /bin/bash
 bash-4.4# $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --master local[2] --executor-memory 3G --num-executors 1  --packages com.databricks:spark-avro_2.11:4.0.0
 Welcome to
@@ -697,7 +697,7 @@ scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close
 Lets schedule and run a compaction to create a new version of columnar  file so that read-optimized readers will see fresher data.
 Again, You can use Hudi CLI to manually schedule and run compaction
 
-```
+```Java
 docker exec -it adhoc-1 /bin/bash
 root@adhoc-1:/opt#   /var/hoodie/ws/hudi-cli/hudi-cli.sh
 ============================================
@@ -790,7 +790,7 @@ Lets also run the incremental query for MOR table.
 From looking at the below query output, it will be clear that the fist commit time for the MOR table is 20180924064636
 and the second commit time is 20180924070031
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 beeline -u jdbc:hive2://hiveserver:10000 --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
 
@@ -851,7 +851,7 @@ exit
 
 ##### Read Optimized and Realtime Views for MOR with Spark-SQL after compaction
 
-```
+```Java
 docker exec -it adhoc-1 /bin/bash
 bash-4.4# $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --master local[2] --executor-memory 3G --num-executors 1  --packages com.databricks:spark-avro_2.11:4.0.0
 
@@ -895,7 +895,7 @@ This brings the demo to an end.
 ## Testing Hudi in Local Docker environment
 
 You can bring up a hadoop docker environment containing Hadoop, Hive and Spark services with support for hudi.
-```
+```Java
 $ mvn pre-integration-test -DskipTests
 ```
 The above command builds docker images for all the services with
@@ -903,13 +903,13 @@ current Hudi source installed at /var/hoodie/ws and also brings up the services
 currently use Hadoop (v2.8.4), Hive (v2.3.3) and Spark (v2.3.1) in docker images.
 
 To bring down the containers
-```
+```Java
 $ cd hudi-integ-test
 $ mvn docker-compose:down
 ```
 
 If you want to bring up the docker containers, use
-```
+```Java
 $ cd hudi-integ-test
 $  mvn docker-compose:up -DdetachedMode=true
 ```
@@ -937,7 +937,7 @@ run the script
 
 Here are the commands:
 
-```
+```Java
 cd docker
 ./build_local_docker_images.sh
 .....
diff --git a/docs/docker_demo.md b/docs/docker_demo.md
index 5628e5b..ef80794 100644
--- a/docs/docker_demo.md
+++ b/docs/docker_demo.md
@@ -23,7 +23,7 @@ The steps have been tested on a Mac laptop
   * /etc/hosts : The demo references many services running in container by the hostname. Add the following settings to /etc/hosts
 
 
-```
+```Java
    127.0.0.1 adhoc-1
    127.0.0.1 adhoc-2
    127.0.0.1 namenode
@@ -44,7 +44,7 @@ Also, this has not been tested on some environments like Docker on Windows.
 #### Build Hudi
 
 The first step is to build hudi
-```
+```Java
 cd <HUDI_WORKSPACE>
 mvn package -DskipTests
 ```
@@ -54,7 +54,7 @@ mvn package -DskipTests
 The next step is to run the docker compose script and setup configs for bringing up the cluster.
 This should pull the docker images from docker hub and setup docker cluster.
 
-```
+```Java
 cd docker
 ./setup_demo.sh
 ....
@@ -84,7 +84,7 @@ Creating spark-worker-1            ... done
 Copying spark default config and setting up configs
 Copying spark default config and setting up configs
 Copying spark default config and setting up configs
-varadarb-C02SG7Q3G8WP:docker varadarb$ docker ps
+$ docker ps
 ```
 
 At this point, the docker cluster will be up and running. The demo cluster brings up the following services
@@ -107,12 +107,10 @@ The batches are windowed intentionally so that the second batch contains updates
 
 #### Step 1 : Publish the first batch to Kafka
 
-Upload the first batch to Kafka topic 'stock ticks'
-
-```
-cat docker/demo/data/batch_1.json | kafkacat -b kafkabroker -t stock_ticks -P
+Upload the first batch to Kafka topic 'stock ticks' `cat docker/demo/data/batch_1.json | kafkacat -b kafkabroker -t stock_ticks -P`
 
 To check if the new topic shows up, use
+```Java
 kafkacat -b kafkabroker -L -J | jq .
 {
   "originating_broker": {
@@ -160,24 +158,16 @@ pull changes and apply to Hudi dataset using upsert/insert primitives. Here, we
 json data from kafka topic and ingest to both COW and MOR tables we initialized in the previous step. This tool
 automatically initializes the datasets in the file-system if they do not exist yet.
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 
 # Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_cow dataset in HDFS
 spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE --storage-type COPY_ON_WRITE --source-class org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  --target-base-path /user/hive/warehouse/stock_ticks_cow --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
-....
-....
-2018-09-24 22:20:00 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
-2018-09-24 22:20:00 INFO  SparkContext:54 - Successfully stopped SparkContext
-
 
 
 # Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_mor dataset in HDFS
 spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE --storage-type MERGE_ON_READ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  --target-base-path /user/hive/warehouse/stock_ticks_mor --target-table stock_ticks_mor --props /var/demo/config/kafka-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider --disable-compaction
-....
-2018-09-24 22:22:01 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
-2018-09-24 22:22:01 INFO  SparkContext:54 - Successfully stopped SparkContext
-....
+
 
 # As part of the setup (Look at setup_demo.sh), the configs needed for DeltaStreamer is uploaded to HDFS. The configs
 # contain mostly Kafa connectivity settings, the avro-schema to be used for ingesting along with key and partitioning fields.
@@ -200,7 +190,7 @@ There will be a similar setup when you browse the MOR dataset
 At this step, the datasets are available in HDFS. We need to sync with Hive to create new Hive tables and add partitions
 inorder to run Hive queries against those datasets.
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 
 # THis command takes in HIveServer URL and COW Hudi Dataset location in HDFS and sync the HDFS state to Hive
@@ -231,7 +221,7 @@ Run a hive query to find the latest timestamp ingested for stock symbol 'GOOG'.
 (for both COW and MOR dataset)and realtime views (for MOR dataset)give the same value "10:29 a.m" as Hudi create a
 parquet file for the first batch of data.
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 beeline -u jdbc:hive2://hiveserver:10000 --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
 # List Tables
@@ -334,7 +324,7 @@ exit
 Hudi support Spark as query processor just like Hive. Here are the same hive queries
 running in spark-sql
 
-```
+```Java
 docker exec -it adhoc-1 /bin/bash
 $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --master local[2] --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --executor-memory 3G --num-executors 1  --packages com.databricks:spark-avro_2.11:4.0.0
 ...
@@ -432,7 +422,7 @@ scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close
 
 Here are the Presto queries for similar Hive and Spark queries. Currently, Hudi does not support Presto queries on realtime views.
 
-```
+```Java
 docker exec -it presto-worker-1 presto --server presto-coordinator-1:8090
 presto> show catalogs;
   Catalog
@@ -524,7 +514,7 @@ presto:default> exit
 Upload the second batch of data and ingest this batch using delta-streamer. As this batch does not bring in any new
 partitions, there is no need to run hive-sync
 
-```
+```Java
 cat docker/demo/data/batch_2.json | kafkacat -b kafkabroker -t stock_ticks -P
 
 # Within Docker container, run the ingestion command
@@ -556,7 +546,7 @@ This is the time, when ReadOptimized and Realtime views will provide different r
 return "10:29 am" as it will only read from the Parquet file. Realtime View will do on-the-fly merge and return
 latest committed data which is "10:59 a.m".
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 beeline -u jdbc:hive2://hiveserver:10000 --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
 
@@ -627,7 +617,7 @@ exit
 
 Running the same queries in Spark-SQL:
 
-```
+```Java
 docker exec -it adhoc-1 /bin/bash
 bash-4.4# $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --master local[2] --executor-memory 3G --num-executors 1  --packages com.databricks:spark-avro_2.11:4.0.0
 
@@ -696,7 +686,7 @@ exit
 Running the same queries on Presto for ReadOptimized views. 
 
 
-```
+```Java
 docker exec -it presto-worker-1 presto --server presto-coordinator-1:8090
 presto> use hive.default;
 USE
@@ -761,7 +751,7 @@ With 2 batches of data ingested, lets showcase the support for incremental queri
 
 Lets take the same projection query example
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 beeline -u jdbc:hive2://hiveserver:10000 --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
 
@@ -785,7 +775,7 @@ the commit time of the first batch (20180924064621) and run incremental query
 Hudi incremental mode provides efficient scanning for incremental queries by filtering out files that do not have any
 candidate rows using hudi-managed metadata.
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 beeline -u jdbc:hive2://hiveserver:10000 --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
 0: jdbc:hive2://hiveserver:10000> set hoodie.stock_ticks_cow.consume.mode=INCREMENTAL;
@@ -798,7 +788,7 @@ No rows affected (0.009 seconds)
 With the above setting, file-ids that do not have any updates from the commit 20180924065039 is filtered out without scanning.
 Here is the incremental query :
 
-```
+```Java
 0: jdbc:hive2://hiveserver:10000>
 0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_cow where  symbol = 'GOOG' and `_hoodie_commit_time` > '20180924064621';
 +----------------------+---------+----------------------+---------+------------+-----------+--+
@@ -811,7 +801,7 @@ Here is the incremental query :
 ```
 
 ##### Incremental Query with Spark SQL:
-```
+```Java
 docker exec -it adhoc-1 /bin/bash
 bash-4.4# $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --master local[2] --executor-memory 3G --num-executors 1  --packages com.databricks:spark-avro_2.11:4.0.0
 Welcome to
@@ -853,7 +843,7 @@ scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close
 Lets schedule and run a compaction to create a new version of columnar  file so that read-optimized readers will see fresher data.
 Again, You can use Hudi CLI to manually schedule and run compaction
 
-```
+```Java
 docker exec -it adhoc-1 /bin/bash
 root@adhoc-1:/opt#   /var/hoodie/ws/hudi-cli/hudi-cli.sh
 ============================================
@@ -946,7 +936,7 @@ Lets also run the incremental query for MOR table.
 From looking at the below query output, it will be clear that the fist commit time for the MOR table is 20180924064636
 and the second commit time is 20180924070031
 
-```
+```Java
 docker exec -it adhoc-2 /bin/bash
 beeline -u jdbc:hive2://hiveserver:10000 --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
 
@@ -1007,7 +997,7 @@ exit
 
 ##### Step 10: Read Optimized and Realtime Views for MOR with Spark-SQL after compaction
 
-```
+```Java
 docker exec -it adhoc-1 /bin/bash
 bash-4.4# $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --master local[2] --executor-memory 3G --num-executors 1  --packages com.databricks:spark-avro_2.11:4.0.0
 
@@ -1047,7 +1037,7 @@ scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close
 
 ##### Step 11:  Presto queries over Read Optimized View on MOR dataset after compaction
 
-```
+```Java
 docker exec -it presto-worker-1 presto --server presto-coordinator-1:8090
 presto> use hive.default;
 USE
@@ -1084,7 +1074,7 @@ This brings the demo to an end.
 ## Testing Hudi in Local Docker environment
 
 You can bring up a hadoop docker environment containing Hadoop, Hive and Spark services with support for hudi.
-```
+```Java
 $ mvn pre-integration-test -DskipTests
 ```
 The above command builds docker images for all the services with
@@ -1092,13 +1082,13 @@ current Hudi source installed at /var/hoodie/ws and also brings up the services
 currently use Hadoop (v2.8.4), Hive (v2.3.3) and Spark (v2.3.1) in docker images.
 
 To bring down the containers
-```
+```Java
 $ cd hudi-integ-test
 $ mvn docker-compose:down
 ```
 
 If you want to bring up the docker containers, use
-```
+```Java
 $ cd hudi-integ-test
 $  mvn docker-compose:up -DdetachedMode=true
 ```
@@ -1126,7 +1116,7 @@ run the script
 
 Here are the commands:
 
-```
+```Java
 cd docker
 ./build_local_docker_images.sh
 .....
diff --git a/docs/migration_guide.cn.md b/docs/migration_guide.cn.md
index 6f3ed59..ba46781 100644
--- a/docs/migration_guide.cn.md
+++ b/docs/migration_guide.cn.md
@@ -42,16 +42,19 @@ Use the HDFSParquetImporter tool. As the name suggests, this only works if your
 This tool essentially starts a Spark Job to read the existing parquet dataset and converts it into a HUDI managed dataset by re-writing all the data.
 
 #### Option 2
-For huge datasets, this could be as simple as : for partition in [list of partitions in source dataset] {
+For huge datasets, this could be as simple as : 
+```java
+for partition in [list of partitions in source dataset] {
         val inputDF = spark.read.format("any_input_format").load("partition_path")
         inputDF.write.format("org.apache.hudi").option()....save("basePath")
-        }      
+}
+```      
 
 #### Option 3
 Write your own custom logic of how to load an existing dataset into a Hudi managed one. Please read about the RDD API
  [here](quickstart.html).
 
-```
+```Java
 Using the HDFSParquetImporter Tool. Once hudi has been built via `mvn clean install -DskipTests`, the shell can be
 fired by via `cd hudi-cli && ./hudi-cli.sh`.
 
diff --git a/docs/migration_guide.md b/docs/migration_guide.md
index 6f3ed59..75b65ae 100644
--- a/docs/migration_guide.md
+++ b/docs/migration_guide.md
@@ -42,19 +42,22 @@ Use the HDFSParquetImporter tool. As the name suggests, this only works if your
 This tool essentially starts a Spark Job to read the existing parquet dataset and converts it into a HUDI managed dataset by re-writing all the data.
 
 #### Option 2
-For huge datasets, this could be as simple as : for partition in [list of partitions in source dataset] {
+For huge datasets, this could be as simple as : 
+```java
+for partition in [list of partitions in source dataset] {
         val inputDF = spark.read.format("any_input_format").load("partition_path")
         inputDF.write.format("org.apache.hudi").option()....save("basePath")
-        }      
+}
+```  
 
 #### Option 3
 Write your own custom logic of how to load an existing dataset into a Hudi managed one. Please read about the RDD API
- [here](quickstart.html).
-
-```
-Using the HDFSParquetImporter Tool. Once hudi has been built via `mvn clean install -DskipTests`, the shell can be
+ [here](quickstart.html). Using the HDFSParquetImporter Tool. Once hudi has been built via `mvn clean install -DskipTests`, the shell can be
 fired by via `cd hudi-cli && ./hudi-cli.sh`.
 
+```Java
+
+
 hudi->hdfsparquetimport
         --upsert false
         --srcPath /user/parquet/dataset/basepath
diff --git a/docs/querying_data.cn.md b/docs/querying_data.cn.md
index c690385..6d12f3a 100644
--- a/docs/querying_data.cn.md
+++ b/docs/querying_data.cn.md
@@ -92,13 +92,13 @@ Spark可将Hudi jars和捆绑包轻松部署和管理到作业/笔记本中。
 要使用SparkSQL将RO表读取为Hive表,只需按如下所示将路径过滤器推入sparkContext。
 对于Hudi表,该方法保留了Spark内置的读取Parquet文件的优化功能,例如进行矢量化读取。
 
-```
+```Scala
 spark.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class", classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter], classOf[org.apache.hadoop.fs.PathFilter]);
 ```
 
 如果您希望通过数据源在DFS上使用全局路径,则只需执行以下类似操作即可得到Spark数据帧。
 
-```
+```Scala
 Dataset<Row> hoodieROViewDF = spark.read().format("org.apache.hudi")
 // pass any path glob, can include hudi & non-hudi datasets
 .load("/glob/path/pattern");
@@ -108,7 +108,7 @@ Dataset<Row> hoodieROViewDF = spark.read().format("org.apache.hudi")
 当前,实时表只能在Spark中作为Hive表进行查询。为了做到这一点,设置`spark.sql.hive.convertMetastoreParquet = false`,
 迫使Spark回退到使用Hive Serde读取数据(计划/执行仍然是Spark)。
 
-```
+```Scala
 $ spark-shell --jars hudi-spark-bundle-x.y.z-SNAPSHOT.jar --driver-class-path /etc/hive/conf  --packages com.databricks:spark-avro_2.11:4.0.0 --conf spark.sql.hive.convertMetastoreParquet=false --num-executors 10 --driver-memory 7g --executor-memory 2g  --master yarn-client
 
 scala> sqlContext.sql("select count(*) from hudi_rt where datestr = '2016-10-02'").show()
@@ -118,7 +118,7 @@ scala> sqlContext.sql("select count(*) from hudi_rt where datestr = '2016-10-02'
 `hudi-spark`模块提供了DataSource API,这是一种从Hudi数据集中提取数据并通过Spark处理数据的更优雅的方法。
 如下所示是一个示例增量拉取,它将获取自`beginInstantTime`以来写入的所有记录。
 
-```
+```Java
  Dataset<Row> hoodieIncViewDF = spark.read()
      .format("org.apache.hudi")
      .option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY(),
diff --git a/docs/querying_data.md b/docs/querying_data.md
index 1653b08..91836ac 100644
--- a/docs/querying_data.md
+++ b/docs/querying_data.md
@@ -92,13 +92,13 @@ Spark provides much easier deployment & management of Hudi jars and bundles into
 To read RO table as a Hive table using SparkSQL, simply push a path filter into sparkContext as follows. 
 This method retains Spark built-in optimizations for reading Parquet files like vectorized reading on Hudi tables.
 
-```
+```Scala
 spark.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class", classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter], classOf[org.apache.hadoop.fs.PathFilter]);
 ```
 
 If you prefer to glob paths on DFS via the datasource, you can simply do something like below to get a Spark dataframe to work with. 
 
-```
+```Java
 Dataset<Row> hoodieROViewDF = spark.read().format("org.apache.hudi")
 // pass any path glob, can include hudi & non-hudi datasets
 .load("/glob/path/pattern");
@@ -108,7 +108,7 @@ Dataset<Row> hoodieROViewDF = spark.read().format("org.apache.hudi")
 Currently, real time table can only be queried as a Hive table in Spark. In order to do this, set `spark.sql.hive.convertMetastoreParquet=false`, forcing Spark to fallback 
 to using the Hive Serde to read the data (planning/executions is still Spark). 
 
-```
+```Java
 $ spark-shell --jars hudi-spark-bundle-x.y.z-SNAPSHOT.jar --driver-class-path /etc/hive/conf  --packages com.databricks:spark-avro_2.11:4.0.0 --conf spark.sql.hive.convertMetastoreParquet=false --num-executors 10 --driver-memory 7g --executor-memory 2g  --master yarn-client
 
 scala> sqlContext.sql("select count(*) from hudi_rt where datestr = '2016-10-02'").show()
@@ -118,7 +118,7 @@ scala> sqlContext.sql("select count(*) from hudi_rt where datestr = '2016-10-02'
 The `hudi-spark` module offers the DataSource API, a more elegant way to pull data from Hudi dataset and process it via Spark.
 A sample incremental pull, that will obtain all records written since `beginInstantTime`, looks like below.
 
-```
+```Java
  Dataset<Row> hoodieIncViewDF = spark.read()
      .format("org.apache.hudi")
      .option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY(),
diff --git a/docs/quickstart.cn.md b/docs/quickstart.cn.md
index 410dd24..e614d57 100644
--- a/docs/quickstart.cn.md
+++ b/docs/quickstart.cn.md
@@ -10,22 +10,17 @@ permalink: quickstart.html
 本指南通过使用spark-shell简要介绍了Hudi功能。使用Spark数据源,我们将通过代码段展示如何插入和更新的Hudi默认存储类型数据集:
 [写时复制](https://hudi.apache.org/concepts.html#copy-on-write-storage)。每次写操作之后,我们还将展示如何读取快照和增量读取数据。
 
-**注意:**
-您也可以通过[自己构建hudi](https://github.com/apache/incubator-hudi#building-apache-hudi-from-source-building-hudi)来快速入门,
-并在spark-shell命令中使用`--jars <path to hudi_code>/packaging/hudi-spark-bundle/target/hudi-spark-bundle-*.*.*-SNAPSHOT.jar`,
-而不是`--packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating`
-
 ## 设置spark-shell
 Hudi适用于Spark-2.x版本。您可以按照[此处](https://spark.apache.org/downloads.html)的说明设置spark。
 在提取的目录中,使用spark-shell运行Hudi:
 
-```
+```Scala
 bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
 ```
 
 设置表名、基本路径和数据生成器来为本指南生成记录。
 
-```
+```Java
 import org.apache.hudi.QuickstartUtils._
 import scala.collection.JavaConversions._
 import org.apache.spark.sql.SaveMode._
@@ -45,7 +40,7 @@ val dataGen = new DataGenerator
 ## 插入数据 {#inserts}
 生成一些新的行程样本,将其加载到DataFrame中,然后将DataFrame写入Hudi数据集中,如下所示。
 
-```
+```Java
 val inserts = convertToStringList(dataGen.generateInserts(10))
 val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
 df.write.format("org.apache.hudi").
@@ -71,7 +66,7 @@ df.write.format("org.apache.hudi").
 
 将数据文件加载到数据帧中。
 
-```
+```Java
 val roViewDF = spark.
     read.
     format("org.apache.hudi").
@@ -89,7 +84,7 @@ spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_pat
 
 这类似于插入新数据。使用数据生成器生成对现有行程的更新,加载到数据帧并将数据帧写入hudi数据集。
 
-```
+```Java
 val updates = convertToStringList(dataGen.generateUpdates(10))
 val df = spark.read.json(spark.sparkContext.parallelize(updates, 2));
 df.write.format("org.apache.hudi").
@@ -112,7 +107,7 @@ Hudi还提供了获取给定提交时间戳以来已更改的记录流的功能
 这可以通过使用Hudi的增量视图并提供所需更改的开始时间来实现。
 如果我们需要给定提交之后的所有更改(这是常见的情况),则无需指定结束时间。
 
-```
+```Java
 val commits = spark.sql("select distinct(_hoodie_commit_time) as commitTime from  hudi_ro_table order by commitTime").map(k => k.getString(0)).take(50)
 val beginTime = commits(commits.length - 2) // commit time we are interested in
 
@@ -133,7 +128,7 @@ spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  hu
 
 让我们看一下如何查询特定时间的数据。可以通过将结束时间指向特定的提交时间,将开始时间指向"000"(表示最早的提交时间)来表示特定时间。
 
-```
+```Java
 val beginTime = "000" // Represents all commits > this time.
 val endTime = commits(commits.length - 2) // commit time we are interested in
 
@@ -149,6 +144,11 @@ spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  hu
 
 ## 从这开始下一步?
 
+您也可以通过[自己构建hudi](https://github.com/apache/incubator-hudi#building-apache-hudi-from-source-building-hudi)来快速入门,
+并在spark-shell命令中使用`--jars <path to hudi_code>/packaging/hudi-spark-bundle/target/hudi-spark-bundle-*.*.*-SNAPSHOT.jar`,
+而不是`--packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating`
+
+
 这里我们使用Spark演示了Hudi的功能。但是,Hudi可以支持多种存储类型/视图,并且可以从Hive,Spark,Presto等查询引擎中查询Hudi数据集。
 我们制作了一个基于Docker设置、所有依赖系统都在本地运行的[演示视频](https://www.youtube.com/watch?v=VhNgUsxdrD0),
 我们建议您复制相同的设置然后按照[这里](docker_demo.html)的步骤自己运行这个演示。
diff --git a/docs/quickstart.md b/docs/quickstart.md
index 121009e..3a17b83 100644
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -12,24 +12,19 @@ code snippets that allows you to insert and update a Hudi dataset of default sto
 [Copy on Write](https://hudi.apache.org/concepts.html#copy-on-write-storage). 
 After each write operation we will also show how to read the data both snapshot and incrementally.
 
-**NOTE:**
-You can also do the quickstart by [building hudi yourself](https://github.com/apache/incubator-hudi#building-apache-hudi-from-source-building-hudi), 
-and using `--jars <path to hudi_code>/packaging/hudi-spark-bundle/target/hudi-spark-bundle-*.*.*-SNAPSHOT.jar` in the spark-shell command
-instead of `--packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating`
-
 ## Setup spark-shell
 Hudi works with Spark-2.x versions. You can follow instructions [here](https://spark.apache.org/downloads.html) for 
 setting up spark. 
 
 From the extracted directory run spark-shell with Hudi as:
 
-```
+```Scala
 bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
 ```
 
 Setup table name, base path and a data generator to generate records for this guide.
 
-```
+```Scala
 import org.apache.hudi.QuickstartUtils._
 import scala.collection.JavaConversions._
 import org.apache.spark.sql.SaveMode._
@@ -50,7 +45,7 @@ can generate sample inserts and updates based on the the sample trip schema
 ## Insert data {#inserts}
 Generate some new trips, load them into a DataFrame and write the DataFrame into the Hudi dataset as below.
 
-```
+```Scala
 val inserts = convertToStringList(dataGen.generateInserts(10))
 val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
 df.write.format("org.apache.hudi").
@@ -75,7 +70,7 @@ Here we are using the default write operation : `upsert`. If you have a workload
  
 ## Query data {#query}
 Load the data files into a DataFrame.
-```
+```Scala
 val roViewDF = spark.
     read.
     format("org.apache.hudi").
@@ -92,7 +87,7 @@ Refer to [Storage Types and Views](https://hudi.apache.org/concepts.html#storage
 This is similar to inserting new data. Generate updates to existing trips using the data generator, load into a DataFrame 
 and write DataFrame into the hudi dataset.
 
-```
+```Scala
 val updates = convertToStringList(dataGen.generateUpdates(10))
 val df = spark.read.json(spark.sparkContext.parallelize(updates, 2));
 df.write.format("org.apache.hudi").
@@ -115,7 +110,7 @@ Hudi also provides capability to obtain a stream of records that changed since g
 This can be achieved using Hudi's incremental view and providing a begin time from which changes need to be streamed. 
 We do not need to specify endTime, if we want all changes after the given commit (as is the common case). 
 
-```
+```Scala
 val commits = spark.sql("select distinct(_hoodie_commit_time) as commitTime from  hudi_ro_table order by commitTime").map(k => k.getString(0)).take(50)
 val beginTime = commits(commits.length - 2) // commit time we are interested in
 
@@ -136,7 +131,7 @@ feature is that it now lets you author streaming pipelines on batch data.
 Lets look at how to query data as of a specific time. The specific time can be represented by pointing endTime to a 
 specific commit time and beginTime to "000" (denoting earliest possible commit time). 
 
-```
+```Scala
 val beginTime = "000" // Represents all commits > this time.
 val endTime = commits(commits.length - 2) // commit time we are interested in
 
@@ -151,7 +146,11 @@ spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  hu
 ``` 
 
 ## Where to go from here?
-Here, we used Spark to show case the capabilities of Hudi. However, Hudi can support multiple storage types/views and 
+You can also do the quickstart by [building hudi yourself](https://github.com/apache/incubator-hudi#building-apache-hudi-from-source-building-hudi), 
+and using `--jars <path to hudi_code>/packaging/hudi-spark-bundle/target/hudi-spark-bundle-*.*.*-SNAPSHOT.jar` in the spark-shell command above
+instead of `--packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating`
+
+Also, we used Spark here to show case the capabilities of Hudi. However, Hudi can support multiple storage types/views and 
 Hudi datasets can be queried from query engines like Hive, Spark, Presto and much more. We have put together a 
 [demo video](https://www.youtube.com/watch?v=VhNgUsxdrD0) that showcases all of this on a docker based setup with all 
 dependent systems running locally. We recommend you replicate the same setup and run the demo yourself, by following 
diff --git a/docs/s3_filesystem.cn.md b/docs/s3_filesystem.cn.md
index fe9a442..f662bda 100644
--- a/docs/s3_filesystem.cn.md
+++ b/docs/s3_filesystem.cn.md
@@ -21,7 +21,7 @@ Simplest way to use Hudi with S3, is to configure your `SparkSession` or `SparkC
 
 Alternatively, add the required configs in your core-site.xml from where Hudi can fetch them. Replace the `fs.defaultFS` with your S3 bucket name and Hudi should be able to read/write from the bucket.
 
-```
+```xml
   <property>
       <name>fs.defaultFS</name>
       <value>s3://ysharma</value>
@@ -57,7 +57,7 @@ Alternatively, add the required configs in your core-site.xml from where Hudi ca
 Utilities such as hudi-cli or deltastreamer tool, can pick up s3 creds via environmental variable prefixed with `HOODIE_ENV_`. For e.g below is a bash snippet to setup
 such variables and then have cli be able to work on datasets stored in s3
 
-```
+```Java
 export HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key=$accessKey
 export HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key=$secretKey
 export HOODIE_ENV_fs_DOT_s3_DOT_awsAccessKeyId=$accessKey
diff --git a/docs/s3_filesystem.md b/docs/s3_filesystem.md
index fe9a442..f662bda 100644
--- a/docs/s3_filesystem.md
+++ b/docs/s3_filesystem.md
@@ -21,7 +21,7 @@ Simplest way to use Hudi with S3, is to configure your `SparkSession` or `SparkC
 
 Alternatively, add the required configs in your core-site.xml from where Hudi can fetch them. Replace the `fs.defaultFS` with your S3 bucket name and Hudi should be able to read/write from the bucket.
 
-```
+```xml
   <property>
       <name>fs.defaultFS</name>
       <value>s3://ysharma</value>
@@ -57,7 +57,7 @@ Alternatively, add the required configs in your core-site.xml from where Hudi ca
 Utilities such as hudi-cli or deltastreamer tool, can pick up s3 creds via environmental variable prefixed with `HOODIE_ENV_`. For e.g below is a bash snippet to setup
 such variables and then have cli be able to work on datasets stored in s3
 
-```
+```Java
 export HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key=$accessKey
 export HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key=$secretKey
 export HOODIE_ENV_fs_DOT_s3_DOT_awsAccessKeyId=$accessKey
diff --git a/docs/writing_data.cn.md b/docs/writing_data.cn.md
index 58b6c99..bd7f646 100644
--- a/docs/writing_data.cn.md
+++ b/docs/writing_data.cn.md
@@ -39,7 +39,7 @@ summary: 这一页里,我们将讨论一些可用的工具,这些工具可
 
 命令行选项更详细地描述了这些功能:
 
-```
+```Java
 [hoodie]$ spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` --help
 Usage: <main class> [options]
   Options:
@@ -118,13 +118,13 @@ Usage: <main class> [options]
 ([impressions.avro](https://docs.confluent.io/current/ksql/docs/tutorials/generate-custom-test-data.html),
 由schema-registry代码库提供)
 
-```
+```Java
 [confluent-5.0.0]$ bin/ksql-datagen schema=../impressions.avro format=avro topic=impressions key=impressionid
 ```
 
 然后用如下命令摄取这些数据。
 
-```
+```Java
 [hoodie]$ spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
   --props file://${PWD}/hudi-utilities/src/test/resources/delta-streamer-config/kafka-source.properties \
   --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
@@ -142,7 +142,7 @@ Usage: <main class> [options]
 以下是在指定需要使用的字段名称的之后,如何插入更新数据帧的方法,这些字段包括
 `recordKey => _row_key`、`partitionPath => partition`和`precombineKey => timestamp`
 
-```
+```Java
 inputDF.write()
        .format("org.apache.hudi")
        .options(clientOpts) // 可以传入任何Hudi客户端参数
@@ -160,7 +160,7 @@ inputDF.write()
 如果需要从命令行或在独立的JVM中运行它,Hudi提供了一个`HiveSyncTool`,
 在构建了hudi-hive模块之后,可以按以下方式调用它。
 
-```
+```Java
 cd hudi-hive
 ./run_sync_tool.sh
  [hudi-hive]$ ./run_sync_tool.sh --help
@@ -192,7 +192,7 @@ Usage: <main class> [options]
  这可以通过触发一个带有自定义负载实现的插入更新来实现,这种实现可以使用总是返回Optional.Empty作为组合值的DataSource或DeltaStreamer。 
  Hudi附带了一个内置的`org.apache.hudi.EmptyHoodieRecordPayload`类,它就是实现了这一功能。
  
-```
+```Java
  deleteDF // 仅包含要删除的记录的数据帧
    .write().format("org.apache.hudi")
    .option(...) // 根据设置需要添加HUDI参数,例如记录键、分区路径和其他参数
diff --git a/docs/writing_data.md b/docs/writing_data.md
index 37bc0c9..5199382 100644
--- a/docs/writing_data.md
+++ b/docs/writing_data.md
@@ -40,7 +40,7 @@ The `HoodieDeltaStreamer` utility (part of hudi-utilities-bundle) provides the w
 
 Command line options describe capabilities in more detail
 
-```
+```Java
 [hoodie]$ spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` --help
 Usage: <main class> [options]
   Options:
@@ -117,13 +117,13 @@ provided under `hudi-utilities/src/test/resources/delta-streamer-config`.
 
 For e.g: once you have Confluent Kafka, Schema registry up & running, produce some test data using ([impressions.avro](https://docs.confluent.io/current/ksql/docs/tutorials/generate-custom-test-data.html) provided by schema-registry repo)
 
-```
+```Java
 [confluent-5.0.0]$ bin/ksql-datagen schema=../impressions.avro format=avro topic=impressions key=impressionid
 ```
 
 and then ingest it as follows.
 
-```
+```Java
 [hoodie]$ spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
   --props file://${PWD}/hudi-utilities/src/test/resources/delta-streamer-config/kafka-source.properties \
   --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
@@ -142,7 +142,7 @@ Following is how we can upsert a dataframe, while specifying the field names tha
 for `recordKey => _row_key`, `partitionPath => partition` and `precombineKey => timestamp`
 
 
-```
+```Java
 inputDF.write()
        .format("org.apache.hudi")
        .options(clientOpts) // any of the Hudi client opts can be passed in as well
@@ -160,7 +160,7 @@ Both tools above support syncing of the dataset's latest schema to Hive metastor
 In case, its preferable to run this from commandline or in an independent jvm, Hudi provides a `HiveSyncTool`, which can be invoked as below, 
 once you have built the hudi-hive module.
 
-```
+```Java
 cd hudi-hive
 ./run_sync_tool.sh
  [hudi-hive]$ ./run_sync_tool.sh --help
@@ -193,7 +193,7 @@ Hudi supports implementing two types of deletes on data stored in Hudi datasets,
  - **Hard Deletes** : A stronger form of delete is to physically remove any trace of the record from the dataset. This can be achieved by issuing an upsert with a custom payload implementation
  via either DataSource or DeltaStreamer which always returns Optional.Empty as the combined value. Hudi ships with a built-in `org.apache.hudi.EmptyHoodieRecordPayload` class that does exactly this.
  
-```
+```Java
  deleteDF // dataframe containing just records to be deleted
    .write().format("org.apache.hudi")
    .option(...) // Add HUDI options like record-key, partition-path and others as needed for your setup