You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by bh...@apache.org on 2020/01/31 05:46:37 UTC
[incubator-hudi] branch asf-site updated: [HUDI-577] update docker demo page and quick start pages (#1279)

This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new ace36af  [HUDI-577] update docker demo page and quick start pages (#1279)
ace36af is described below

commit ace36afff30e940f764e2ad8ddbfd194086a3bde
Author: Bhavani Sudha Saktheeswaran <bh...@uber.com>
AuthorDate: Thu Jan 30 21:46:25 2020 -0800

    [HUDI-577] update docker demo page and quick start pages (#1279)
    
    Summary:
    - contains changes that reflect renaming of terminologies to be in sync wth CWiki
    - contains doc changes pertaining to support of multiple scala versions
---
 docs/_docs/0_4_docker_demo.md       | 236 +++++++++++++++++++-----------------
 docs/_docs/1_1_quick_start_guide.md |  23 +++-
 docs/_docs/1_2_structure.md         |   2 +-
 docs/_docs/2_1_concepts.md          |   6 +-
 docs/_docs/2_2_writing_data.md      |  84 +++++++++----
 docs/_docs/2_3_querying_data.md     |  20 +--
 docs/_docs/2_5_performance.md       |   4 +-
 7 files changed, 217 insertions(+), 158 deletions(-)

diff --git a/docs/_docs/0_4_docker_demo.md b/docs/_docs/0_4_docker_demo.md
index 87c0716..3033371 100644
--- a/docs/_docs/0_4_docker_demo.md
+++ b/docs/_docs/0_4_docker_demo.md
@@ -40,7 +40,7 @@ Also, this has not been tested on some environments like Docker on Windows.
 
 ### Build Hudi
 
-The first step is to build hudi
+The first step is to build hudi. **Note** This step builds hudi on default supported scala version - 2.11.
 ```java
 cd <HUDI_WORKSPACE>
 mvn package -DskipTests
@@ -63,7 +63,10 @@ Stopping hivemetastore             ... done
 Stopping historyserver             ... done
 .......
 ......
-Creating network "hudi_demo" with the default driver
+Creating network "compose_default" with the default driver
+Creating volume "compose_namenode" with default driver
+Creating volume "compose_historyserver" with default driver
+Creating volume "compose_hive-metastore-postgresql" with default driver
 Creating hive-metastore-postgresql ... done
 Creating namenode                  ... done
 Creating zookeeper                 ... done
@@ -94,12 +97,12 @@ At this point, the docker cluster will be up and running. The demo cluster bring
 
 ## Demo
 
-Stock Tracker data will be used to showcase both different Hudi Views and the effects of Compaction.
+Stock Tracker data will be used to showcase different Hudi query types and the effects of Compaction.
 
 Take a look at the directory `docker/demo/data`. There are 2 batches of stock data - each at 1 minute granularity.
 The first batch contains stocker tracker data for some stock symbols during the first hour of trading window
 (9:30 a.m to 10:30 a.m). The second batch contains tracker data for next 30 mins (10:30 - 11 a.m). Hudi will
-be used to ingest these batches to a dataset which will contain the latest stock tracker data at hour level granularity.
+be used to ingest these batches to a table which will contain the latest stock tracker data at hour level granularity.
 The batches are windowed intentionally so that the second batch contains updates to some of the rows in the first batch.
 
 ### Step 1 : Publish the first batch to Kafka
@@ -151,19 +154,19 @@ kafkacat -b kafkabroker -L -J | jq .
 ### Step 2: Incrementally ingest data from Kafka topic
 
 Hudi comes with a tool named DeltaStreamer. This tool can connect to variety of data sources (including Kafka) to
-pull changes and apply to Hudi dataset using upsert/insert primitives. Here, we will use the tool to download
+pull changes and apply to Hudi table using upsert/insert primitives. Here, we will use the tool to download
 json data from kafka topic and ingest to both COW and MOR tables we initialized in the previous step. This tool
-automatically initializes the datasets in the file-system if they do not exist yet.
+automatically initializes the tables in the file-system if they do not exist yet.
 
 ```java
 docker exec -it adhoc-2 /bin/bash
 
-# Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_cow dataset in HDFS
-spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE --storage-type COPY_ON_WRITE --source-class org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  --target-base-path /user/hive/warehouse/stock_ticks_cow --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
+# Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_cow table in HDFS
+spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE --table-type COPY_ON_WRITE --source-class org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  --target-base-path /user/hive/warehouse/stock_ticks_cow --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
 
 
-# Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_mor dataset in HDFS
-spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE --storage-type MERGE_ON_READ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  --target-base-path /user/hive/warehouse/stock_ticks_mor --target-table stock_ticks_mor --props /var/demo/config/kafka-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider --disable-compaction
+# Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_mor table in HDFS
+spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE --table-type MERGE_ON_READ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  --target-base-path /user/hive/warehouse/stock_ticks_mor --target-table stock_ticks_mor --props /var/demo/config/kafka-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider --disable-compaction
 
 
 # As part of the setup (Look at setup_demo.sh), the configs needed for DeltaStreamer is uploaded to HDFS. The configs
@@ -172,50 +175,50 @@ spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
 exit
 ```
 
-You can use HDFS web-browser to look at the datasets
+You can use HDFS web-browser to look at the tables
 `http://namenode:50070/explorer.html#/user/hive/warehouse/stock_ticks_cow`.
 
-You can explore the new partition folder created in the dataset along with a "deltacommit"
+You can explore the new partition folder created in the table along with a "deltacommit"
 file under .hoodie which signals a successful commit.
 
-There will be a similar setup when you browse the MOR dataset
+There will be a similar setup when you browse the MOR table
 `http://namenode:50070/explorer.html#/user/hive/warehouse/stock_ticks_mor`
 
 
 ### Step 3: Sync with Hive
 
-At this step, the datasets are available in HDFS. We need to sync with Hive to create new Hive tables and add partitions
-inorder to run Hive queries against those datasets.
+At this step, the tables are available in HDFS. We need to sync with Hive to create new Hive tables and add partitions
+inorder to run Hive queries against those tables.
 
 ```java
 docker exec -it adhoc-2 /bin/bash
 
-# THis command takes in HIveServer URL and COW Hudi Dataset location in HDFS and sync the HDFS state to Hive
+# THis command takes in HIveServer URL and COW Hudi table location in HDFS and sync the HDFS state to Hive
 /var/hoodie/ws/hudi-hive/run_sync_tool.sh  --jdbc-url jdbc:hive2://hiveserver:10000 --user hive --pass hive --partitioned-by dt --base-path /user/hive/warehouse/stock_ticks_cow --database default --table stock_ticks_cow
 .....
-2018-09-24 22:22:45,568 INFO  [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(112)) - Sync complete for stock_ticks_cow
+2020-01-25 19:51:28,953 INFO  [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(129)) - Sync complete for stock_ticks_cow
 .....
 
-# Now run hive-sync for the second data-set in HDFS using Merge-On-Read (MOR storage)
+# Now run hive-sync for the second data-set in HDFS using Merge-On-Read (MOR table type)
 /var/hoodie/ws/hudi-hive/run_sync_tool.sh  --jdbc-url jdbc:hive2://hiveserver:10000 --user hive --pass hive --partitioned-by dt --base-path /user/hive/warehouse/stock_ticks_mor --database default --table stock_ticks_mor
 ...
-2018-09-24 22:23:09,171 INFO  [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(112)) - Sync complete for stock_ticks_mor
+2020-01-25 19:51:51,066 INFO  [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(129)) - Sync complete for stock_ticks_mor_ro
 ...
-2018-09-24 22:23:09,559 INFO  [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(112)) - Sync complete for stock_ticks_mor_rt
+2020-01-25 19:51:51,569 INFO  [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(129)) - Sync complete for stock_ticks_mor_rt
 ....
 exit
 ```
 After executing the above command, you will notice
 
-1. A hive table named `stock_ticks_cow` created which provides Read-Optimized view for the Copy On Write dataset.
-2. Two new tables `stock_ticks_mor` and `stock_ticks_mor_rt` created for the Merge On Read dataset. The former
-provides the ReadOptimized view for the Hudi dataset and the later provides the realtime-view for the dataset.
+1. A hive table named `stock_ticks_cow` created which supports Snapshot and Incremental queries on Copy On Write table.
+2. Two new tables `stock_ticks_mor_rt` and `stock_ticks_mor_ro` created for the Merge On Read table. The former
+supports Snapshot and Incremental queries (providing near-real time data) while the later supports ReadOptimized queries.
 
 
 ### Step 4 (a): Run Hive Queries
 
-Run a hive query to find the latest timestamp ingested for stock symbol 'GOOG'. You will notice that both read-optimized
-(for both COW and MOR dataset)and realtime views (for MOR dataset)give the same value "10:29 a.m" as Hudi create a
+Run a hive query to find the latest timestamp ingested for stock symbol 'GOOG'. You will notice that both snapshot 
+(for both COW and MOR _rt table) and read-optimized queries (for MOR _ro table) give the same value "10:29 a.m" as Hudi create a
 parquet file for the first batch of data.
 
 ```java
@@ -227,10 +230,10 @@ beeline -u jdbc:hive2://hiveserver:10000 --hiveconf hive.input.format=org.apache
 |      tab_name       |
 +---------------------+--+
 | stock_ticks_cow     |
-| stock_ticks_mor     |
+| stock_ticks_mor_ro  |
 | stock_ticks_mor_rt  |
 +---------------------+--+
-2 rows selected (0.801 seconds)
+3 rows selected (1.199 seconds)
 0: jdbc:hive2://hiveserver:10000>
 
 
@@ -269,11 +272,11 @@ Now, run a projection query:
 # Merge-On-Read Queries:
 ==========================
 
-Lets run similar queries against M-O-R dataset. Lets look at both
-ReadOptimized and Realtime views supported by M-O-R dataset
+Lets run similar queries against M-O-R table. Lets look at both 
+ReadOptimized and Snapshot(realtime data) queries supported by M-O-R table
 
-# Run against ReadOptimized View. Notice that the latest timestamp is 10:29
-0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor group by symbol HAVING symbol = 'GOOG';
+# Run ReadOptimized Query. Notice that the latest timestamp is 10:29
+0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
 +---------+----------------------+--+
 | symbol  |         _c1          |
@@ -283,7 +286,7 @@ WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the futu
 1 row selected (6.326 seconds)
 
 
-# Run against Realtime View. Notice that the latest timestamp is again 10:29
+# Run Snapshot Query. Notice that the latest timestamp is again 10:29
 
 0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG';
 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
@@ -295,9 +298,9 @@ WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the futu
 1 row selected (1.606 seconds)
 
 
-# Run projection query against Read Optimized and Realtime tables
+# Run Read Optimized and Snapshot project queries
 
-0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor where  symbol = 'GOOG';
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
 +----------------------+---------+----------------------+---------+------------+-----------+--+
 | _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
 +----------------------+---------+----------------------+---------+------------+-----------+--+
@@ -323,17 +326,17 @@ running in spark-sql
 
 ```java
 docker exec -it adhoc-1 /bin/bash
-$SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --master local[2] --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --executor-memory 3G --num-executors 1  --packages com.databricks:spark-avro_2.11:4.0.0
+$SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --master local[2] --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --executor-memory 3G --num-executors 1  --packages org.apache.spark:spark-avro_2.11:2.4.4
 ...
 
 Welcome to
       ____              __
      / __/__  ___ _____/ /__
     _\ \/ _ \/ _ `/ __/  '_/
-   /___/ .__/\_,_/_/ /_/\_\   version 2.3.1
+   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
       /_/
 
-Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
+Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)
 Type in expressions to have them evaluated.
 Type :help for more information.
 
@@ -343,7 +346,7 @@ scala> spark.sql("show tables").show(100, false)
 |database|tableName         |isTemporary|
 +--------+------------------+-----------+
 |default |stock_ticks_cow   |false      |
-|default |stock_ticks_mor   |false      |
+|default |stock_ticks_mor_ro|false      |
 |default |stock_ticks_mor_rt|false      |
 +--------+------------------+-----------+
 
@@ -374,11 +377,11 @@ scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close
 # Merge-On-Read Queries:
 ==========================
 
-Lets run similar queries against M-O-R dataset. Lets look at both
-ReadOptimized and Realtime views supported by M-O-R dataset
+Lets run similar queries against M-O-R table. Lets look at both
+ReadOptimized and Snapshot queries supported by M-O-R table
 
-# Run against ReadOptimized View. Notice that the latest timestamp is 10:29
-scala> spark.sql("select symbol, max(ts) from stock_ticks_mor group by symbol HAVING symbol = 'GOOG'").show(100, false)
+# Run ReadOptimized Query. Notice that the latest timestamp is 10:29
+scala> spark.sql("select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG'").show(100, false)
 +------+-------------------+
 |symbol|max(ts)            |
 +------+-------------------+
@@ -386,7 +389,7 @@ scala> spark.sql("select symbol, max(ts) from stock_ticks_mor group by symbol HA
 +------+-------------------+
 
 
-# Run against Realtime View. Notice that the latest timestamp is again 10:29
+# Run Snapshot Query. Notice that the latest timestamp is again 10:29
 
 scala> spark.sql("select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG'").show(100, false)
 +------+-------------------+
@@ -395,9 +398,9 @@ scala> spark.sql("select symbol, max(ts) from stock_ticks_mor_rt group by symbol
 |GOOG  |2018-08-31 10:29:00|
 +------+-------------------+
 
-# Run projection query against Read Optimized and Realtime tables
+# Run Read Optimized and Snapshot project queries
 
-scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor where  symbol = 'GOOG'").show(100, false)
+scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG'").show(100, false)
 +-------------------+------+-------------------+------+---------+--------+
 |_hoodie_commit_time|symbol|ts                 |volume|open     |close   |
 +-------------------+------+-------------------+------+---------+--------+
@@ -417,7 +420,7 @@ scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close
 
 ### Step 4 (c): Run Presto Queries
 
-Here are the Presto queries for similar Hive and Spark queries. Currently, Hudi does not support Presto queries on realtime views.
+Here are the Presto queries for similar Hive and Spark queries. Currently, Presto does not support snapshot or incremental queries on Hudi tables.
 
 ```java
 docker exec -it presto-worker-1 presto --server presto-coordinator-1:8090
@@ -440,7 +443,7 @@ presto:default> show tables;
        Table
 --------------------
  stock_ticks_cow
- stock_ticks_mor
+ stock_ticks_mor_ro
  stock_ticks_mor_rt
 (3 rows)
 
@@ -478,10 +481,10 @@ Splits: 17 total, 17 done (100.00%)
 # Merge-On-Read Queries:
 ==========================
 
-Lets run similar queries against M-O-R dataset. 
+Lets run similar queries against M-O-R table. 
 
-# Run against ReadOptimized View. Notice that the latest timestamp is 10:29
-presto:default> select symbol, max(ts) from stock_ticks_mor group by symbol HAVING symbol = 'GOOG';
+# Run ReadOptimized Query. Notice that the latest timestamp is 10:29
+    presto:default> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
  symbol |        _col1
 --------+---------------------
  GOOG   | 2018-08-31 10:29:00
@@ -492,7 +495,7 @@ Splits: 49 total, 49 done (100.00%)
 0:02 [197 rows, 613B] [110 rows/s, 343B/s]
 
 
-presto:default>  select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_mor where  symbol = 'GOOG';
+presto:default>  select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
  _hoodie_commit_time | symbol |         ts          | volume |   open    |  close
 ---------------------+--------+---------------------+--------+-----------+----------
  20190822180250      | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02
@@ -517,12 +520,12 @@ cat docker/demo/data/batch_2.json | kafkacat -b kafkabroker -t stock_ticks -P
 # Within Docker container, run the ingestion command
 docker exec -it adhoc-2 /bin/bash
 
-# Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_cow dataset in HDFS
-spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE --storage-type COPY_ON_WRITE --source-class org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  --target-base-path /user/hive/warehouse/stock_ticks_cow --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
+# Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_cow table in HDFS
+spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE --table-type COPY_ON_WRITE --source-class org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  --target-base-path /user/hive/warehouse/stock_ticks_cow --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
 
 
-# Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_mor dataset in HDFS
-spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE --storage-type MERGE_ON_READ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  --target-base-path /user/hive/warehouse/stock_ticks_mor --target-table stock_ticks_mor --props /var/demo/config/kafka-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider --disable-compaction
+# Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_mor table in HDFS
+spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE --table-type MERGE_ON_READ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  --target-base-path /user/hive/warehouse/stock_ticks_mor --target-table stock_ticks_mor --props /var/demo/config/kafka-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider --disable-compaction
 
 exit
 ```
@@ -535,12 +538,12 @@ Take a look at the HDFS filesystem to get an idea: `http://namenode:50070/explor
 
 ### Step 6(a): Run Hive Queries
 
-With Copy-On-Write table, the read-optimized view immediately sees the changes as part of second batch once the batch
+With Copy-On-Write table, the Snapshot query immediately sees the changes as part of second batch once the batch
 got committed as each ingestion creates newer versions of parquet files.
 
 With Merge-On-Read table, the second ingestion merely appended the batch to an unmerged delta (log) file.
-This is the time, when ReadOptimized and Realtime views will provide different results. ReadOptimized view will still
-return "10:29 am" as it will only read from the Parquet file. Realtime View will do on-the-fly merge and return
+This is the time, when ReadOptimized and Snapshot queries will provide different results. ReadOptimized query will still
+return "10:29 am" as it will only read from the Parquet file. Snapshot query will do on-the-fly merge and return
 latest committed data which is "10:59 a.m".
 
 ```java
@@ -571,8 +574,8 @@ As you can notice, the above queries now reflect the changes that came as part o
 
 # Merge On Read Table:
 
-# Read Optimized View
-0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor group by symbol HAVING symbol = 'GOOG';
+# Read Optimized Query
+0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
 +---------+----------------------+--+
 | symbol  |         _c1          |
@@ -581,7 +584,7 @@ WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the futu
 +---------+----------------------+--+
 1 row selected (1.6 seconds)
 
-0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor where  symbol = 'GOOG';
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
 +----------------------+---------+----------------------+---------+------------+-----------+--+
 | _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
 +----------------------+---------+----------------------+---------+------------+-----------+--+
@@ -589,7 +592,7 @@ WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the futu
 | 20180924222155       | GOOG    | 2018-08-31 10:29:00  | 3391    | 1230.1899  | 1230.085  |
 +----------------------+---------+----------------------+---------+------------+-----------+--+
 
-# Realtime View
+# Snapshot Query
 0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG';
 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
 +---------+----------------------+--+
@@ -616,7 +619,7 @@ Running the same queries in Spark-SQL:
 
 ```java
 docker exec -it adhoc-1 /bin/bash
-bash-4.4# $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --master local[2] --executor-memory 3G --num-executors 1  --packages com.databricks:spark-avro_2.11:4.0.0
+bash-4.4# $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --master local[2] --executor-memory 3G --num-executors 1  --packages org.apache.spark:spark-avro_2.11:2.4.4
 
 # Copy On Write Table:
 
@@ -641,8 +644,8 @@ As you can notice, the above queries now reflect the changes that came as part o
 
 # Merge On Read Table:
 
-# Read Optimized View
-scala> spark.sql("select symbol, max(ts) from stock_ticks_mor group by symbol HAVING symbol = 'GOOG'").show(100, false)
+# Read Optimized Query
+scala> spark.sql("select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG'").show(100, false)
 +---------+----------------------+--+
 | symbol  |         _c1          |
 +---------+----------------------+--+
@@ -650,7 +653,7 @@ scala> spark.sql("select symbol, max(ts) from stock_ticks_mor group by symbol HA
 +---------+----------------------+--+
 1 row selected (1.6 seconds)
 
-scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor where  symbol = 'GOOG'").show(100, false)
+scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG'").show(100, false)
 +----------------------+---------+----------------------+---------+------------+-----------+--+
 | _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
 +----------------------+---------+----------------------+---------+------------+-----------+--+
@@ -658,7 +661,7 @@ scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close
 | 20180924222155       | GOOG    | 2018-08-31 10:29:00  | 3391    | 1230.1899  | 1230.085  |
 +----------------------+---------+----------------------+---------+------------+-----------+--+
 
-# Realtime View
+# Snapshot Query
 scala> spark.sql("select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG'").show(100, false)
 +---------+----------------------+--+
 | symbol  |         _c1          |
@@ -680,7 +683,7 @@ exit
 
 ### Step 6(c): Run Presto Queries
 
-Running the same queries on Presto for ReadOptimized views. 
+Running the same queries on Presto for ReadOptimized queries. 
 
 
 ```java
@@ -716,8 +719,8 @@ As you can notice, the above queries now reflect the changes that came as part o
 
 # Merge On Read Table:
 
-# Read Optimized View
-presto:default> select symbol, max(ts) from stock_ticks_mor group by symbol HAVING symbol = 'GOOG';
+# Read Optimized Query
+presto:default> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
  symbol |        _col1
 --------+---------------------
  GOOG   | 2018-08-31 10:29:00
@@ -727,7 +730,7 @@ Query 20190822_181602_00009_segyw, FINISHED, 1 node
 Splits: 49 total, 49 done (100.00%)
 0:01 [197 rows, 613B] [139 rows/s, 435B/s]
 
-presto:default>select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_mor where  symbol = 'GOOG';
+presto:default>select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
  _hoodie_commit_time | symbol |         ts          | volume |   open    |  close
 ---------------------+--------+---------------------+--------+-----------+----------
  20190822180250      | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02
@@ -744,7 +747,7 @@ presto:default> exit
 
 ### Step 7 : Incremental Query for COPY-ON-WRITE Table
 
-With 2 batches of data ingested, lets showcase the support for incremental queries in Hudi Copy-On-Write datasets
+With 2 batches of data ingested, lets showcase the support for incremental queries in Hudi Copy-On-Write tables
 
 Lets take the same projection query example
 
@@ -800,15 +803,15 @@ Here is the incremental query :
 ### Incremental Query with Spark SQL:
 ```java
 docker exec -it adhoc-1 /bin/bash
-bash-4.4# $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --master local[2] --executor-memory 3G --num-executors 1  --packages com.databricks:spark-avro_2.11:4.0.0
+bash-4.4# $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --master local[2] --executor-memory 3G --num-executors 1  --packages org.apache.spark:spark-avro_2.11:2.4.4
 Welcome to
       ____              __
      / __/__  ___ _____/ /__
     _\ \/ _ \/ _ `/ __/  '_/
-   /___/ .__/\_,_/_/ /_/\_\   version 2.3.1
+   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
       /_/
 
-Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
+Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)
 Type in expressions to have them evaluated.
 Type :help for more information.
 
@@ -816,7 +819,7 @@ scala> import org.apache.hudi.DataSourceReadOptions
 import org.apache.hudi.DataSourceReadOptions
 
 # In the below query, 20180925045257 is the first commit's timestamp
-scala> val hoodieIncViewDF =  spark.read.format("org.apache.hudi").option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY, DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL).option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "20180924064621").load("/user/hive/warehouse/stock_ticks_cow")
+scala> val hoodieIncViewDF =  spark.read.format("org.apache.hudi").option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY, DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL).option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "20180924064621").load("/user/hive/warehouse/stock_ticks_cow")
 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
 SLF4J: Defaulting to no-operation (NOP) logger implementation
 SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
@@ -835,7 +838,7 @@ scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close
 ```
 
 
-### Step 8: Schedule and Run Compaction for Merge-On-Read dataset
+### Step 8: Schedule and Run Compaction for Merge-On-Read table
 
 Lets schedule and run a compaction to create a new version of columnar  file so that read-optimized readers will see fresher data.
 Again, You can use Hudi CLI to manually schedule and run compaction
@@ -843,24 +846,31 @@ Again, You can use Hudi CLI to manually schedule and run compaction
 ```java
 docker exec -it adhoc-1 /bin/bash
 root@adhoc-1:/opt#   /var/hoodie/ws/hudi-cli/hudi-cli.sh
-============================================
-*                                          *
-*     _    _           _   _               *
-*    | |  | |         | | (_)              *
-*    | |__| |       __| |  -               *
-*    |  __  ||   | / _` | ||               *
-*    | |  | ||   || (_| | ||               *
-*    |_|  |_|\___/ \____/ ||               *
-*                                          *
-============================================
-
-Welcome to Hoodie CLI. Please type help if you are looking for help.
+...
+Table command getting loaded
+HoodieSplashScreen loaded
+===================================================================
+*         ___                          ___                        *
+*        /\__\          ___           /\  \           ___         *
+*       / /  /         /\__\         /  \  \         /\  \        *
+*      / /__/         / /  /        / /\ \  \        \ \  \       *
+*     /  \  \ ___    / /  /        / /  \ \__\       /  \__\      *
+*    / /\ \  /\__\  / /__/  ___   / /__/ \ |__|     / /\/__/      *
+*    \/  \ \/ /  /  \ \  \ /\__\  \ \  \ / /  /  /\/ /  /         *
+*         \  /  /    \ \  / /  /   \ \  / /  /   \  /__/          *
+*         / /  /      \ \/ /  /     \ \/ /  /     \ \__\          *
+*        / /  /        \  /  /       \  /  /       \/__/          *
+*        \/__/          \/__/         \/__/    Apache Hudi CLI    *
+*                                                                 *
+===================================================================
+
+Welcome to Apache Hudi CLI. Please type help if you are looking for help.
 hudi->connect --path /user/hive/warehouse/stock_ticks_mor
 18/09/24 06:59:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 18/09/24 06:59:35 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
 18/09/24 06:59:35 INFO util.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://namenode:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1261652683_11, ugi=root (auth:SIMPLE)]]]
-18/09/24 06:59:35 INFO table.HoodieTableConfig: Loading dataset properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
-18/09/24 06:59:36 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ from /user/hive/warehouse/stock_ticks_mor
+18/09/24 06:59:35 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
+18/09/24 06:59:36 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
 Metadata for table stock_ticks_mor loaded
 
 # Ensure no compactions are present
@@ -884,8 +894,8 @@ Compaction successfully completed for 20180924070031
 hoodie:stock_ticks->connect --path /user/hive/warehouse/stock_ticks_mor
 18/09/24 07:01:16 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
 18/09/24 07:01:16 INFO util.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://namenode:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1261652683_11, ugi=root (auth:SIMPLE)]]]
-18/09/24 07:01:16 INFO table.HoodieTableConfig: Loading dataset properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
-18/09/24 07:01:16 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ from /user/hive/warehouse/stock_ticks_mor
+18/09/24 07:01:16 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
+18/09/24 07:01:16 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
 Metadata for table stock_ticks_mor loaded
 
 
@@ -911,8 +921,8 @@ Compaction successfully completed for 20180924070031
 hoodie:stock_ticks_mor->connect --path /user/hive/warehouse/stock_ticks_mor
 18/09/24 07:03:00 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
 18/09/24 07:03:00 INFO util.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://namenode:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1261652683_11, ugi=root (auth:SIMPLE)]]]
-18/09/24 07:03:00 INFO table.HoodieTableConfig: Loading dataset properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
-18/09/24 07:03:00 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ from /user/hive/warehouse/stock_ticks_mor
+18/09/24 07:03:00 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
+18/09/24 07:03:00 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
 Metadata for table stock_ticks_mor loaded
 
 
@@ -928,7 +938,7 @@ hoodie:stock_ticks->compactions show all
 
 ### Step 9: Run Hive Queries including incremental queries
 
-You will see that both ReadOptimized and Realtime Views will show the latest committed data.
+You will see that both ReadOptimized and Snapshot queries will show the latest committed data.
 Lets also run the incremental query for MOR table.
 From looking at the below query output, it will be clear that the fist commit time for the MOR table is 20180924064636
 and the second commit time is 20180924070031
@@ -937,8 +947,8 @@ and the second commit time is 20180924070031
 docker exec -it adhoc-2 /bin/bash
 beeline -u jdbc:hive2://hiveserver:10000 --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
 
-# Read Optimized View
-0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor group by symbol HAVING symbol = 'GOOG';
+# Read Optimized Query
+0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
 +---------+----------------------+--+
 | symbol  |         _c1          |
@@ -947,7 +957,7 @@ WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the futu
 +---------+----------------------+--+
 1 row selected (1.6 seconds)
 
-0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor where  symbol = 'GOOG';
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
 +----------------------+---------+----------------------+---------+------------+-----------+--+
 | _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
 +----------------------+---------+----------------------+---------+------------+-----------+--+
@@ -955,7 +965,7 @@ WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the futu
 | 20180924070031       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
 +----------------------+---------+----------------------+---------+------------+-----------+--+
 
-# Realtime View
+# Snapshot Query
 0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG';
 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
 +---------+----------------------+--+
@@ -972,7 +982,7 @@ WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the futu
 | 20180924070031       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
 +----------------------+---------+----------------------+---------+------------+-----------+--+
 
-# Incremental View:
+# Incremental Query:
 
 0: jdbc:hive2://hiveserver:10000> set hoodie.stock_ticks_mor.consume.mode=INCREMENTAL;
 No rows affected (0.008 seconds)
@@ -982,7 +992,7 @@ No rows affected (0.007 seconds)
 0: jdbc:hive2://hiveserver:10000> set hoodie.stock_ticks_mor.consume.start.timestamp=20180924064636;
 No rows affected (0.013 seconds)
 # Query:
-0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor where  symbol = 'GOOG' and `_hoodie_commit_time` > '20180924064636';
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG' and `_hoodie_commit_time` > '20180924064636';
 +----------------------+---------+----------------------+---------+------------+-----------+--+
 | _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
 +----------------------+---------+----------------------+---------+------------+-----------+--+
@@ -992,14 +1002,14 @@ exit
 exit
 ```
 
-### Step 10: Read Optimized and Realtime Views for MOR with Spark-SQL after compaction
+### Step 10: Read Optimized and Snapshot queries for MOR with Spark-SQL after compaction
 
 ```java
 docker exec -it adhoc-1 /bin/bash
-bash-4.4# $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --master local[2] --executor-memory 3G --num-executors 1  --packages com.databricks:spark-avro_2.11:4.0.0
+bash-4.4# $SPARK_INSTALL/bin/spark-shell --jars $HUDI_SPARK_BUNDLE --driver-class-path $HADOOP_CONF_DIR --conf spark.sql.hive.convertMetastoreParquet=false --deploy-mode client  --driver-memory 1G --master local[2] --executor-memory 3G --num-executors 1  --packages org.apache.spark:spark-avro_2.11:2.4.4
 
-# Read Optimized View
-scala> spark.sql("select symbol, max(ts) from stock_ticks_mor group by symbol HAVING symbol = 'GOOG'").show(100, false)
+# Read Optimized Query
+scala> spark.sql("select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG'").show(100, false)
 +---------+----------------------+--+
 | symbol  |         _c1          |
 +---------+----------------------+--+
@@ -1007,7 +1017,7 @@ scala> spark.sql("select symbol, max(ts) from stock_ticks_mor group by symbol HA
 +---------+----------------------+--+
 1 row selected (1.6 seconds)
 
-scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor where  symbol = 'GOOG'").show(100, false)
+scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG'").show(100, false)
 +----------------------+---------+----------------------+---------+------------+-----------+--+
 | _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
 +----------------------+---------+----------------------+---------+------------+-----------+--+
@@ -1015,7 +1025,7 @@ scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close
 | 20180924070031       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
 +----------------------+---------+----------------------+---------+------------+-----------+--+
 
-# Realtime View
+# Snapshot Query
 scala> spark.sql("select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG'").show(100, false)
 +---------+----------------------+--+
 | symbol  |         _c1          |
@@ -1032,15 +1042,15 @@ scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close
 +----------------------+---------+----------------------+---------+------------+-----------+--+
 ```
 
-### Step 11:  Presto queries over Read Optimized View on MOR dataset after compaction
+### Step 11:  Presto Read Optimized queries on MOR table after compaction
 
 ```java
 docker exec -it presto-worker-1 presto --server presto-coordinator-1:8090
 presto> use hive.default;
 USE
 
-# Read Optimized View
-resto:default> select symbol, max(ts) from stock_ticks_mor group by symbol HAVING symbol = 'GOOG';
+# Read Optimized Query
+resto:default> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
   symbol |        _col1
 --------+---------------------
  GOOG   | 2018-08-31 10:59:00
@@ -1050,7 +1060,7 @@ Query 20190822_182319_00011_segyw, FINISHED, 1 node
 Splits: 49 total, 49 done (100.00%)
 0:01 [197 rows, 613B] [133 rows/s, 414B/s]
 
-presto:default> select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_mor where  symbol = 'GOOG';
+presto:default> select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
  _hoodie_commit_time | symbol |         ts          | volume |   open    |  close
 ---------------------+--------+---------------------+--------+-----------+----------
  20190822180250      | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02
@@ -1076,7 +1086,7 @@ $ mvn pre-integration-test -DskipTests
 ```
 The above command builds docker images for all the services with
 current Hudi source installed at /var/hoodie/ws and also brings up the services using a compose file. We
-currently use Hadoop (v2.8.4), Hive (v2.3.3) and Spark (v2.3.1) in docker images.
+currently use Hadoop (v2.8.4), Hive (v2.3.3) and Spark (v2.4.4) in docker images.
 
 To bring down the containers
 ```java
diff --git a/docs/_docs/1_1_quick_start_guide.md b/docs/_docs/1_1_quick_start_guide.md
index 708bd2c..d7d645e 100644
--- a/docs/_docs/1_1_quick_start_guide.md
+++ b/docs/_docs/1_1_quick_start_guide.md
@@ -16,10 +16,20 @@ Hudi works with Spark-2.x versions. You can follow instructions [here](https://s
 From the extracted directory run spark-shell with Hudi as:
 
 ```scala
-bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating \
+spark-2.4.4-bin-hadoop2.7/bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \
     --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
 ```
 
+<div class="notice--info">
+  <h4>Please note the following: </h4>
+<ul>
+  <li>spark-avro module needs to be specified in --packages as it is not included with spark-shell by default</li>
+  <li>spark-avro and spark versions must match (we have used 2.4.4 for both above)</li>
+  <li>we have used hudi-spark-bundle built for scala 2.11 since the spark-avro module used also depends on 2.11. 
+         If spark-avro_2.12 is used, correspondingly hudi-spark-bundle_2.12 needs to be used. </li>
+</ul>
+</div>
+
 Setup table name, base path and a data generator to generate records for this guide.
 
 ```scala
@@ -83,7 +93,7 @@ spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_pat
 
 This query provides snapshot querying of the ingested data. Since our partition path (`region/country/city`) is 3 levels nested 
 from base path we ve used `load(basePath + "/*/*/*/*")`. 
-Refer to [Table types and queries](/docs/concepts#table-types--queries) for more info on all table types and querying types supported.
+Refer to [Table types and queries](/docs/concepts#table-types--queries) for more info on all table types and query types supported.
 {: .notice--info}
 
 ## Update data
@@ -133,7 +143,7 @@ val tripsIncrementalDF = spark.
     option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL).
     option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
     load(basePath);
-tripsIncrementalDF.registerTempTable("hudi_trips_incremental")
+tripsIncrementalDF.createOrReplaceTempView("hudi_trips_incremental")
 spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  hudi_trips_incremental where fare > 20.0").show()
 ``` 
 
@@ -156,7 +166,7 @@ val tripsPointInTimeDF = spark.read.format("org.apache.hudi").
     option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
     option(END_INSTANTTIME_OPT_KEY, endTime).
     load(basePath);
-tripsPointInTimeDF.registerTempTable("hudi_trips_point_in_time")
+tripsPointInTimeDF.createOrReplaceTempView("hudi_trips_point_in_time")
 spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  hudi_trips_point_in_time where fare > 20.0").show()
 ```
 
@@ -196,8 +206,9 @@ Note: Only `Append` mode is supported for delete operation.
 ## Where to go from here?
 
 You can also do the quickstart by [building hudi yourself](https://github.com/apache/incubator-hudi#building-apache-hudi-from-source), 
-and using `--jars <path to hudi_code>/packaging/hudi-spark-bundle/target/hudi-spark-bundle-*.*.*-SNAPSHOT.jar` in the spark-shell command above
-instead of `--packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating`
+and using `--jars <path to hudi_code>/packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` in the spark-shell command above
+instead of `--packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating`. Hudi also supports scala 2.12. Refer [build with scala 2.12](https://github.com/apache/incubator-hudi#build-with-scala-212)
+for more info.
 
 Also, we used Spark here to show case the capabilities of Hudi. However, Hudi can support multiple table types/query types and 
 Hudi tables can be queried from query engines like Hive, Spark, Presto and much more. We have put together a 
diff --git a/docs/_docs/1_2_structure.md b/docs/_docs/1_2_structure.md
index e080fcd..ddcdb1a 100644
--- a/docs/_docs/1_2_structure.md
+++ b/docs/_docs/1_2_structure.md
@@ -6,7 +6,7 @@ summary: "Hudi brings stream processing to big data, providing fresh data while
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
-Hudi (pronounced “Hoodie”) ingests & manages storage of large analytical tables over DFS ([HDFS](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) or cloud stores) and provides three types of querying.
+Hudi (pronounced “Hoodie”) ingests & manages storage of large analytical tables over DFS ([HDFS](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) or cloud stores) and provides three types of queries.
 
  * **Read Optimized query** - Provides excellent query performance on pure columnar storage, much like plain [Parquet](https://parquet.apache.org/) tables.
  * **Incremental query** - Provides a change stream out of the dataset to feed downstream jobs/ETLs.
diff --git a/docs/_docs/2_1_concepts.md b/docs/_docs/2_1_concepts.md
index c99aa41..cf61811 100644
--- a/docs/_docs/2_1_concepts.md
+++ b/docs/_docs/2_1_concepts.md
@@ -141,11 +141,11 @@ The intention of copy on write table, is to fundamentally improve how tables are
 
 Merge on read table is a superset of copy on write, in the sense it still supports read optimized queries of the table by exposing only the base/columnar files in latest file slices.
 Additionally, it stores incoming upserts for each file group, onto a row based delta log, to support snapshot queries by applying the delta log, 
-onto the latest version of each file id on-the-fly during query time. Thus, this table type attempts to balance read and write amplication intelligently, to provide near real-time data.
+onto the latest version of each file id on-the-fly during query time. Thus, this table type attempts to balance read and write amplification intelligently, to provide near real-time data.
 The most significant change here, would be to the compactor, which now carefully chooses which delta log files need to be compacted onto
 their columnar base file, to keep the query performance in check (larger delta log files would incur longer merge times with merge data on query side)
 
-Following illustrates how the table works, and shows two types of querying - snapshot querying and read optimized querying.
+Following illustrates how the table works, and shows two types of queries - snapshot query and read optimized query.
 
 <figure>
     <img class="docimage" src="/assets/images/hudi_mor.png" alt="hudi_mor.png" style="max-width: 100%" />
@@ -158,7 +158,7 @@ There are lot of interesting things happening in this example, which bring out t
  all the data from 10:05 to 10:10. The base columnar files are still versioned with the commit, as before.
  Thus, if one were to simply look at base files alone, then the table layout looks exactly like a copy on write table.
  - A periodic compaction process reconciles these changes from the delta log and produces a new version of base file, just like what happened at 10:05 in the example.
- - There are two ways of querying the same underlying table: Read Optimized querying and Snapshot querying, depending on whether we chose query performance or freshness of data.
+ - There are two ways of querying the same underlying table: Read Optimized query and Snapshot query, depending on whether we chose query performance or freshness of data.
  - The semantics around when data from a commit is available to a query changes in a subtle way for a read optimized query. Note, that such a query
  running at 10:10, wont see data after 10:05 above, while a snapshot query always sees the freshest data.
  - When we trigger compaction & what it decides to compact hold all the key to solving these hard problems. By implementing a compacting
diff --git a/docs/_docs/2_2_writing_data.md b/docs/_docs/2_2_writing_data.md
index b407111..52ba503 100644
--- a/docs/_docs/2_2_writing_data.md
+++ b/docs/_docs/2_2_writing_data.md
@@ -43,23 +43,56 @@ Command line options describe capabilities in more detail
 ```java
 [hoodie]$ spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` --help
 Usage: <main class> [options]
-  Options:
+Options:
+    --checkpoint
+      Resume Delta Streamer from this checkpoint.
     --commit-on-errors
-        Commit even when some records failed to be written
+      Commit even when some records failed to be written
+      Default: false
+    --compact-scheduling-minshare
+      Minshare for compaction as defined in
+      https://spark.apache.org/docs/latest/job-scheduling.html
+      Default: 0
+    --compact-scheduling-weight
+      Scheduling weight for compaction as defined in
+      https://spark.apache.org/docs/latest/job-scheduling.html
+      Default: 1
+    --continuous
+      Delta Streamer runs in continuous mode running source-fetch -> Transform
+      -> Hudi Write in loop
+      Default: false
+    --delta-sync-scheduling-minshare
+      Minshare for delta sync as defined in
+      https://spark.apache.org/docs/latest/job-scheduling.html
+      Default: 0
+    --delta-sync-scheduling-weight
+      Scheduling weight for delta sync as defined in
+      https://spark.apache.org/docs/latest/job-scheduling.html
+      Default: 1
+    --disable-compaction
+      Compaction is enabled for MoR table by default. This flag disables it
       Default: false
     --enable-hive-sync
-          Enable syncing to hive
-       Default: false
+      Enable syncing to hive
+      Default: false
     --filter-dupes
-          Should duplicate records from source be dropped/filtered outbefore 
-          insert/bulk-insert 
+      Should duplicate records from source be dropped/filtered out before
+      insert/bulk-insert
       Default: false
     --help, -h
-    --hudi-conf
-          Any configuration that can be set in the properties file (using the CLI 
-          parameter "--propsFilePath") can also be passed command line using this 
-          parameter 
-          Default: []
+
+    --hoodie-conf
+      Any configuration that can be set in the properties file (using the CLI
+      parameter "--propsFilePath") can also be passed command line using this
+      parameter
+      Default: []
+    --max-pending-compactions
+      Maximum number of outstanding inflight/requested compactions. Delta Sync
+      will not happen unlessoutstanding compactions is less than this number
+      Default: 5
+    --min-sync-interval-seconds
+      the min sync interval of each sync in continuous mode
+      Default: 0
     --op
       Takes one of these values : UPSERT (default), INSERT (use when input is
       purely new data/inserts to gain speed)
@@ -69,19 +102,22 @@ Usage: <main class> [options]
       subclass of HoodieRecordPayload, that works off a GenericRecord.
       Implement your own, if you want to do something other than overwriting
       existing value
-      Default: org.apache.hudi.OverwriteWithLatestAvroPayload
+      Default: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
     --props
       path to properties file on localfs or dfs, with configurations for
-      Hudi client, schema provider, key generator and data source. For
-      Hudi client props, sane defaults are used, but recommend use to
+      hoodie client, schema provider, key generator and data source. For
+      hoodie client props, sane defaults are used, but recommend use to
       provide basic things like metrics endpoints, hive configs etc. For
       sources, referto individual classes, for supported properties.
       Default: file:///Users/vinoth/bin/hoodie/src/test/resources/delta-streamer-config/dfs-source.properties
     --schemaprovider-class
       subclass of org.apache.hudi.utilities.schema.SchemaProvider to attach
       schemas to input & target table data, built in options:
-      FilebasedSchemaProvider
-      Default: org.apache.hudi.utilities.schema.FilebasedSchemaProvider
+      org.apache.hudi.utilities.schema.FilebasedSchemaProvider.Source (See
+      org.apache.hudi.utilities.sources.Source) implementation can implement
+      their own SchemaProvider. For Sources that return Dataset<Row>, the
+      schema is obtained implicitly. However, this CLI option allows
+      overriding the schemaprovider returned by Source.
     --source-class
       Subclass of org.apache.hudi.utilities.sources to read data. Built-in
       options: org.apache.hudi.utilities.sources.{JsonDFSSource (default),
@@ -89,7 +125,7 @@ Usage: <main class> [options]
       Default: org.apache.hudi.utilities.sources.JsonDFSSource
     --source-limit
       Maximum amount of data to read from source. Default: No limit For e.g:
-      DFSSource => max bytes to read, KafkaSource => max events to read
+      DFS-Source => max bytes to read, Kafka-Source => max events to read
       Default: 9223372036854775807
     --source-ordering-field
       Field within source record to decide how to break ties between records
@@ -99,17 +135,19 @@ Usage: <main class> [options]
     --spark-master
       spark master to use.
       Default: local[2]
+  * --table-type
+      Type of table. COPY_ON_WRITE (or) MERGE_ON_READ
   * --target-base-path
-      base path for the target Hudi table. (Will be created if did not
-      exist first time around. If exists, expected to be a Hudi table)
+      base path for the target hoodie table. (Will be created if did not exist
+      first time around. If exists, expected to be a hoodie table)
   * --target-table
       name of the target table in Hive
     --transformer-class
-      subclass of org.apache.hudi.utilities.transform.Transformer. UDF to
-      transform raw source dataset to a target dataset (conforming to target
-      schema) before writing. Default : Not set. E:g -
+      subclass of org.apache.hudi.utilities.transform.Transformer. Allows
+      transforming raw source Dataset to a target Dataset (conforming to
+      target schema) before writing. Default : Not set. E:g -
       org.apache.hudi.utilities.transform.SqlQueryBasedTransformer (which
-      allows a SQL query template to be passed as a transformation function)
+      allows a SQL query templated to be passed as a transformation function)
 ```
 
 The tool takes a hierarchically composed property file and has pluggable interfaces for extracting data, key generation and providing schema. Sample configs for ingesting from kafka and dfs are
diff --git a/docs/_docs/2_3_querying_data.md b/docs/_docs/2_3_querying_data.md
index 8c6d357..2d97e2b 100644
--- a/docs/_docs/2_3_querying_data.md
+++ b/docs/_docs/2_3_querying_data.md
@@ -15,12 +15,12 @@ Specifically, following Hive tables are registered based off [table name](/docs/
 and [table type](/docs/configurations.html#TABLE_TYPE_OPT_KEY) passed during write.   
 
 If `table name = hudi_trips` and `table type = COPY_ON_WRITE`, then we get: 
- - `hudi_trips` supports snapshot querying and incremental querying of the table backed by `HoodieParquetInputFormat`, exposing purely columnar data.
+ - `hudi_trips` supports snapshot query and incremental query on the table backed by `HoodieParquetInputFormat`, exposing purely columnar data.
 
 
 If `table name = hudi_trips` and `table type = MERGE_ON_READ`, then we get:
- - `hudi_trips_rt` supports snapshot querying and incremental querying (providing near-real time data) of the table  backed by `HoodieParquetRealtimeInputFormat`, exposing merged view of base and log data.
- - `hudi_trips_ro` supports read optimized querying of the table backed by `HoodieParquetInputFormat`, exposing purely columnar data.
+ - `hudi_trips_rt` supports snapshot query and incremental query (providing near-real time data) on the table  backed by `HoodieParquetRealtimeInputFormat`, exposing merged view of base and log data.
+ - `hudi_trips_ro` supports read optimized query on the table backed by `HoodieParquetInputFormat`, exposing purely columnar data.
  
 
 As discussed in the concepts section, the one key primitive needed for [incrementally processing](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop),
@@ -89,11 +89,11 @@ separated) and calls InputFormat.listStatus() only once with all those partition
 Spark provides much easier deployment & management of Hudi jars and bundles into jobs/notebooks. At a high level, there are two ways to access Hudi tables in Spark.
 
  - **Hudi DataSource** : Supports Read Optimized, Incremental Pulls similar to how standard datasources (e.g: `spark.read.parquet`) work.
- - **Read as Hive tables** : Supports all three query types, including the snapshot querying, relying on the custom Hudi input formats again like Hive.
+ - **Read as Hive tables** : Supports all three query types, including the snapshot queries, relying on the custom Hudi input formats again like Hive.
  
- In general, your spark job needs a dependency to `hudi-spark` or `hudi-spark-bundle-x.y.z.jar` needs to be on the class path of driver & executors (hint: use `--jars` argument)
+ In general, your spark job needs a dependency to `hudi-spark` or `hudi-spark-bundle_2.*-x.y.z.jar` needs to be on the class path of driver & executors (hint: use `--jars` argument)
  
-### Read optimized querying
+### Read optimized query
 
 Pushing a path filter into sparkContext as follows allows for read optimized querying of a Hudi hive table using SparkSQL. 
 This method retains Spark built-in optimizations for reading Parquet files like vectorized reading on Hudi tables.
@@ -110,12 +110,12 @@ Dataset<Row> hoodieROViewDF = spark.read().format("org.apache.hudi")
 .load("/glob/path/pattern");
 ```
  
-### Snapshot querying {#spark-snapshot-querying}
-Currently, near-real time data can only be queried as a Hive table in Spark using snapshot querying mode. In order to do this, set `spark.sql.hive.convertMetastoreParquet=false`, forcing Spark to fallback 
+### Snapshot query {#spark-snapshot-query}
+Currently, near-real time data can only be queried as a Hive table in Spark using snapshot query mode. In order to do this, set `spark.sql.hive.convertMetastoreParquet=false`, forcing Spark to fallback 
 to using the Hive Serde to read the data (planning/executions is still Spark). 
 
 ```java
-$ spark-shell --jars hudi-spark-bundle-x.y.z-SNAPSHOT.jar --driver-class-path /etc/hive/conf  --packages org.apache.spark:spark-avro_2.11:2.4.4 --conf spark.sql.hive.convertMetastoreParquet=false --num-executors 10 --driver-memory 7g --executor-memory 2g  --master yarn-client
+$ spark-shell --jars hudi-spark-bundle_2.11-x.y.z-SNAPSHOT.jar --driver-class-path /etc/hive/conf  --packages org.apache.spark:spark-avro_2.11:2.4.4 --conf spark.sql.hive.convertMetastoreParquet=false --num-executors 10 --driver-memory 7g --executor-memory 2g  --master yarn-client
 
 scala> sqlContext.sql("select count(*) from hudi_trips_rt where datestr = '2016-10-02'").show()
 scala> sqlContext.sql("select count(*) from hudi_trips_rt where datestr = '2016-10-02'").show()
@@ -148,5 +148,5 @@ Additionally, `HoodieReadClient` offers the following functionality using Hudi's
 
 ## Presto
 
-Presto is a popular query engine, providing interactive query performance. Presto currently supports only read optimized querying on Hudi tables. 
+Presto is a popular query engine, providing interactive query performance. Presto currently supports only read optimized queries on Hudi tables. 
 This requires the `hudi-presto-bundle` jar to be placed into `<presto_install>/plugin/hive-hadoop2/`, across the installation.
diff --git a/docs/_docs/2_5_performance.md b/docs/_docs/2_5_performance.md
index 6f489fc..580d180 100644
--- a/docs/_docs/2_5_performance.md
+++ b/docs/_docs/2_5_performance.md
@@ -41,9 +41,9 @@ For e.g , with 100M timestamp prefixed keys (5% updates, 95% inserts) on a event
 **~7X (2880 secs vs 440 secs) speed up** over vanilla spark join. Even for a challenging workload like an '100% update' database ingestion workload spanning 
 3.25B UUID keys/30 partitions/6180 files using 300 cores, Hudi indexing offers a **80-100% speedup**.
 
-## Read Optimized Queries
+## Snapshot Queries
 
-The major design goal for read optimized querying is to achieve the latency reduction & efficiency gains in previous section,
+The major design goal for snapshot queries is to achieve the latency reduction & efficiency gains in previous section,
 with no impact on queries. Following charts compare the Hudi vs non-Hudi tables across Hive/Presto/Spark queries and demonstrate this.
 
 **Hive**