You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by yi...@apache.org on 2022/06/21 05:39:23 UTC
[hudi] branch asf-site updated: [HUDI-4288] Cut docs for 0.11.1 release (#5914)

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 4796ce0115 [HUDI-4288] Cut docs for 0.11.1 release (#5914)
4796ce0115 is described below

commit 4796ce0115568468ab2a046646f945166985c838
Author: Y Ethan Guo <et...@gmail.com>
AuthorDate: Mon Jun 20 22:39:15 2022 -0700

    [HUDI-4288] Cut docs for 0.11.1 release (#5914)
---
 website/docs/deployment.md                         |    8 +-
 website/docs/gcp_bigquery.md                       |    4 +-
 website/docs/hoodie_deltastreamer.md               |    2 +-
 website/docs/metadata.md                           |    2 +-
 website/docs/quick-start-guide.md                  |   24 +-
 website/docs/syncing_datahub.md                    |    6 +-
 website/docusaurus.config.js                       |   10 +-
 .../versioned_docs/version-0.11.1/azure_hoodie.md  |   50 +
 .../version-0.11.1/basic_configurations.md         |  744 ++++
 .../versioned_docs/version-0.11.1/bos_hoodie.md    |   57 +
 website/versioned_docs/version-0.11.1/cli.md       |  422 ++
 website/versioned_docs/version-0.11.1/cloud.md     |   29 +
 .../versioned_docs/version-0.11.1/clustering.md    |  267 ++
 .../versioned_docs/version-0.11.1/compaction.md    |  140 +
 .../versioned_docs/version-0.11.1/comparison.md    |   56 +
 website/versioned_docs/version-0.11.1/concepts.md  |  172 +
 .../version-0.11.1/concurrency_control.md          |  167 +
 .../version-0.11.1/configurations.md               | 4271 ++++++++++++++++++++
 .../versioned_docs/version-0.11.1/cos_hoodie.md    |   71 +
 .../version-0.11.1}/deployment.md                  |    8 +-
 .../version-0.11.1/disaster_recovery.md            |  296 ++
 .../versioned_docs/version-0.11.1/docker_demo.md   | 1429 +++++++
 .../versioned_docs/version-0.11.1/encryption.md    |   73 +
 website/versioned_docs/version-0.11.1/faq.md       |  520 +++
 .../versioned_docs/version-0.11.1/file_layouts.md  |   16 +
 .../versioned_docs/version-0.11.1/file_sizing.md   |   53 +
 .../version-0.11.1/flink-quick-start-guide.md      |  191 +
 .../version-0.11.1/flink_configuration.md          |  117 +
 .../version-0.11.1}/gcp_bigquery.md                |    4 +-
 .../versioned_docs/version-0.11.1/gcs_hoodie.md    |   60 +
 .../version-0.11.1/hoodie_cleaner.md               |   57 +
 .../version-0.11.1}/hoodie_deltastreamer.md        |    2 +-
 .../version-0.11.1/ibm_cos_hoodie.md               |   77 +
 website/versioned_docs/version-0.11.1/indexing.md  |   95 +
 .../versioned_docs/version-0.11.1/jfs_hoodie.md    |   96 +
 .../version-0.11.1/key_generation.md               |  209 +
 website/versioned_docs/version-0.11.1/markers.md   |   90 +
 .../version-0.11.1}/metadata.md                    |    2 +-
 .../version-0.11.1/metadata_indexing.md            |  193 +
 website/versioned_docs/version-0.11.1/metrics.md   |  203 +
 .../version-0.11.1/migration_guide.md              |   70 +
 .../versioned_docs/version-0.11.1/oss_hoodie.md    |   70 +
 website/versioned_docs/version-0.11.1/overview.md  |   69 +
 .../versioned_docs/version-0.11.1/performance.md   |  104 +
 .../version-0.11.1/precommit_validator.md          |   74 +
 website/versioned_docs/version-0.11.1/privacy.md   |   22 +
 .../versioned_docs/version-0.11.1/procedures.md    |  452 +++
 .../version-0.11.1/query_engine_setup.md           |   73 +
 .../versioned_docs/version-0.11.1/querying_data.md |  238 ++
 .../version-0.11.1}/quick-start-guide.md           |   24 +-
 website/versioned_docs/version-0.11.1/s3_hoodie.md |   87 +
 .../version-0.11.1/schema_evolution.md             |  365 ++
 .../version-0.11.1/snapshot_exporter.md            |  115 +
 website/versioned_docs/version-0.11.1/structure.md |   20 +
 .../syncing_aws_glue_data_catalog.md               |   18 +
 .../version-0.11.1}/syncing_datahub.md             |    6 +-
 .../version-0.11.1/syncing_metastore.md            |  263 ++
 .../version-0.11.1/table_management.md             |  262 ++
 .../versioned_docs/version-0.11.1/table_types.md   |  110 +
 website/versioned_docs/version-0.11.1/timeline.md  |   42 +
 .../versioned_docs/version-0.11.1/transforms.md    |   66 +
 .../version-0.11.1/troubleshooting.md              |  161 +
 .../versioned_docs/version-0.11.1/tuning-guide.md  |   57 +
 website/versioned_docs/version-0.11.1/use_cases.md |  138 +
 .../version-0.11.1/write_operations.md             |   60 +
 .../versioned_docs/version-0.11.1/writing_data.md  |  505 +++
 .../version-0.11.1-sidebars.json                   |  127 +
 website/versions.json                              |    1 +
 68 files changed, 13841 insertions(+), 51 deletions(-)

diff --git a/website/docs/deployment.md b/website/docs/deployment.md
index a4a57fb6b0..3236fd5657 100644
--- a/website/docs/deployment.md
+++ b/website/docs/deployment.md
@@ -29,11 +29,11 @@ With Merge_On_Read Table, Hudi ingestion needs to also take care of compacting d
 from varied sources such as DFS, Kafka and DB Changelogs and ingest them to hudi tables.  It runs as a spark application in two modes.
 
 To use DeltaStreamer in Spark, the `hudi-utilities-bundle` is required, by adding
-`--packages org.apache.hudi:hudi-utilities-bundle_2.11:0.11.0` to the `spark-submit` command. From 0.11.0 release, we start
+`--packages org.apache.hudi:hudi-utilities-bundle_2.11:0.11.1` to the `spark-submit` command. From 0.11.0 release, we start
 to provide a new `hudi-utilities-slim-bundle` which aims to exclude dependencies that can cause conflicts and compatibility
 issues with different versions of Spark.  The `hudi-utilities-slim-bundle` should be used along with a Hudi Spark bundle 
 corresponding to the Spark version used, e.g., 
-`--packages org.apache.hudi:hudi-utilities-slim-bundle_2.12:0.11.0,org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0`,
+`--packages org.apache.hudi:hudi-utilities-slim-bundle_2.12:0.11.1,org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1`,
 if using `hudi-utilities-bundle` solely in Spark encounters compatibility issues.
 
  - **Run Once Mode** : In this mode, Deltastreamer performs one ingestion round which includes incrementally pulling events from upstream sources and ingesting them to hudi table. Background operations like cleaning old file versions and archiving hoodie timeline are automatically executed as part of the run. For Merge-On-Read tables, Compaction is also run inline as part of ingestion unless disabled by passing the flag "--disable-compaction". By default, Compaction is run inline for eve [...]
@@ -41,7 +41,7 @@ if using `hudi-utilities-bundle` solely in Spark encounters compatibility issues
 Here is an example invocation for reading from kafka topic in a single-run mode and writing to Merge On Read table type in a yarn cluster.
 
 ```java
-[hoodie]$ spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.11.0 \
+[hoodie]$ spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.11.1 \
  --master yarn \
  --deploy-mode cluster \
  --num-executors 10 \
@@ -89,7 +89,7 @@ Here is an example invocation for reading from kafka topic in a single-run mode
 Here is an example invocation for reading from kafka topic in a continuous mode and writing to Merge On Read table type in a yarn cluster.
 
 ```java
-[hoodie]$ spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.11.0 \
+[hoodie]$ spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.11.1 \
  --master yarn \
  --deploy-mode cluster \
  --num-executors 10 \
diff --git a/website/docs/gcp_bigquery.md b/website/docs/gcp_bigquery.md
index 8583182042..3651f01d15 100644
--- a/website/docs/gcp_bigquery.md
+++ b/website/docs/gcp_bigquery.md
@@ -38,9 +38,9 @@ Below shows an example for running `BigQuerySyncTool` with `HoodieDeltaStreamer`
 ```shell
 spark-submit --master yarn \
 --packages com.google.cloud:google-cloud-bigquery:2.10.4 \
---jars /opt/hudi-gcp-bundle-0.11.0.jar \
+--jars /opt/hudi-gcp-bundle-0.11.1.jar \
 --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
-/opt/hudi-utilities-bundle_2.12-0.11.0.jar \
+/opt/hudi-utilities-bundle_2.12-0.11.1.jar \
 --target-base-path gs://my-hoodie-table/path \
 --target-table mytable \
 --table-type COPY_ON_WRITE \
diff --git a/website/docs/hoodie_deltastreamer.md b/website/docs/hoodie_deltastreamer.md
index 531c412860..938127da31 100644
--- a/website/docs/hoodie_deltastreamer.md
+++ b/website/docs/hoodie_deltastreamer.md
@@ -161,7 +161,7 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.
 From 0.11.0 release, we start to provide a new `hudi-utilities-slim-bundle` which aims to exclude dependencies that can
 cause conflicts and compatibility issues with different versions of Spark.  The `hudi-utilities-slim-bundle` should be
 used along with a Hudi Spark bundle corresponding the Spark version used to make utilities work with Spark, e.g.,
-`--packages org.apache.hudi:hudi-utilities-slim-bundle_2.12:0.11.0,org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0`,
+`--packages org.apache.hudi:hudi-utilities-slim-bundle_2.12:0.11.1,org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1`,
 if using `hudi-utilities-bundle` solely to run `HoodieDeltaStreamer` in Spark encounters compatibility issues.
 
 ### MultiTableDeltaStreamer
diff --git a/website/docs/metadata.md b/website/docs/metadata.md
index 662f24c2a5..75cf3b0e0c 100644
--- a/website/docs/metadata.md
+++ b/website/docs/metadata.md
@@ -32,7 +32,7 @@ writer and the reader, in query planning in Spark for example.  Multi-modal inde
 containing the indices in the metadata table.
 
 ## Enable Hudi Metadata Table and Multi-Modal Index
-In 0.11.0, the metadata table with synchronous updates and metadata-table-based file listing are enabled by default.
+Since 0.11.0, the metadata table with synchronous updates and metadata-table-based file listing are enabled by default.
 There are prerequisite configurations and steps in [Deployment considerations](#deployment-considerations) to
 safely use this feature.  The metadata table and related file listing functionality can still be turned off by setting
 [`hoodie.metadata.enable`](/docs/configurations#hoodiemetadataenable) to `false`.  For 0.10.1 and prior releases, metadata
diff --git a/website/docs/quick-start-guide.md b/website/docs/quick-start-guide.md
index 529b1a0e53..49f133c6f0 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -20,8 +20,8 @@ Hudi works with Spark-2.4.3+ & Spark 3.x versions. You can follow instructions [
 
 | Hudi            | Supported Spark 3 version                       |
 |:----------------|:------------------------------------------------|
-| 0.11.0          | 3.2.x (default build, Spark bundle only), 3.1.x |
-| 0.10.0          | 3.1.x (default build), 3.0.x                    |
+| 0.11.x          | 3.2.x (default build, Spark bundle only), 3.1.x |
+| 0.10.x          | 3.1.x (default build), 3.0.x                    |
 | 0.7.0 - 0.9.0   | 3.0.x                                           |
 | 0.6.0 and prior | not supported                                   |
 
@@ -48,7 +48,7 @@ From the extracted directory run spark-shell with Hudi:
 ```shell
 # Spark 3.2
 spark-shell \
-  --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 \
+  --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.1 \
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
   --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \
   --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
@@ -56,13 +56,13 @@ spark-shell \
 ```shell
 # Spark 3.1
 spark-shell \
-  --packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0 \
+  --packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1 \
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
 ```
 ```shell
 # Spark 2.4
 spark-shell \
-  --packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.0 \
+  --packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.1 \
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
 ```
 </TabItem>
@@ -75,7 +75,7 @@ From the extracted directory run pyspark with Hudi:
 # Spark 3.2
 export PYSPARK_PYTHON=$(which python3)
 pyspark \
---packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 \
+--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.1 \
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
 --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \
 --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
@@ -84,14 +84,14 @@ pyspark \
 # Spark 3.1
 export PYSPARK_PYTHON=$(which python3)
 pyspark \
---packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0 \
+--packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1 \
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
 ```
 ```shell
 # Spark 2.4
 export PYSPARK_PYTHON=$(which python3)
 pyspark \
---packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.0 \
+--packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.1 \
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
 ```
 </TabItem>
@@ -103,20 +103,20 @@ From the extracted directory run Spark SQL with Hudi:
 
 ```shell
 # Spark 3.2
-spark-sql --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 \
+spark-sql --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.1 \
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
 --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
 --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
 ```
 ```shell
 # Spark 3.1
-spark-sql --packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0 \
+spark-sql --packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1 \
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
 --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
 ```
 ```shell
 # Spark 2.4
-spark-sql --packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.0 \
+spark-sql --packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.11.1 \
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
 --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
 ```
@@ -1199,7 +1199,7 @@ more details please refer to [procedures](procedures).
 
 You can also do the quickstart by [building hudi yourself](https://github.com/apache/hudi#building-apache-hudi-from-source), 
 and using `--jars <path to hudi_code>/packaging/hudi-spark-bundle/target/hudi-spark3.2-bundle_2.1?-*.*.*-SNAPSHOT.jar` in the spark-shell command above
-instead of `--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0`. Hudi also supports scala 2.12. Refer [build with scala 2.12](https://github.com/apache/hudi#build-with-different-spark-versions)
+instead of `--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.1`. Hudi also supports scala 2.12. Refer [build with scala 2.12](https://github.com/apache/hudi#build-with-different-spark-versions)
 for more info.
 
 Also, we used Spark here to show case the capabilities of Hudi. However, Hudi can support multiple table types/query types and 
diff --git a/website/docs/syncing_datahub.md b/website/docs/syncing_datahub.md
index 75f4ba10bc..eacc8a7e80 100644
--- a/website/docs/syncing_datahub.md
+++ b/website/docs/syncing_datahub.md
@@ -6,7 +6,7 @@ keywords: [hudi, datahub, sync]
 [DataHub](https://datahubproject.io/) is a rich metadata platform that supports features like data discovery, data
 obeservability, federated governance, etc.
 
-In Hudi 0.11.0, you can now sync to a DataHub instance by setting `DataHubSyncTool` as one of the sync tool classes
+Since Hudi 0.11.0, you can now sync to a DataHub instance by setting `DataHubSyncTool` as one of the sync tool classes
 for `HoodieDeltaStreamer`.
 
 The target Hudi table will be sync'ed to DataHub as a `Dataset`. The Hudi table's avro schema will be sync'ed, along
@@ -36,9 +36,9 @@ the classpath.
 
 ```shell
 spark-submit --master yarn \
---jars /opt/hudi-datahub-sync-bundle-0.11.0.jar \
+--jars /opt/hudi-datahub-sync-bundle-0.11.1.jar \
 --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
-/opt/hudi-utilities-bundle_2.12-0.11.0.jar \
+/opt/hudi-utilities-bundle_2.12-0.11.1.jar \
 --target-table mytable \
 # ... other HoodieDeltaStreamer's configs
 --enable-sync \
diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js
index 0311a6ade2..5556a21a24 100644
--- a/website/docusaurus.config.js
+++ b/website/docusaurus.config.js
@@ -98,11 +98,11 @@ module.exports = {
           },
           {
             from: ['/docs/releases', '/docs/next/releases'],
-            to: '/releases/release-0.11.0',
+            to: '/releases/release-0.11.1',
           },
           {
             from: ['/releases'],
-            to: '/releases/release-0.11.0',
+            to: '/releases/release-0.11.1',
           },
           {
             from: ['/docs/learn'],
@@ -252,7 +252,7 @@ module.exports = {
             },
             {
               label: 'Releases',
-              to: '/releases/release-0.11.0',
+              to: '/releases/release-0.11.1',
             },
             {
               label: 'Download',
@@ -420,8 +420,8 @@ module.exports = {
               path: 'next',
               banner: 'unreleased',
             },
-            '0.11.0': {
-              label: '0.11.0',
+            '0.11.1': {
+              label: '0.11.1',
               path: '',
             }
           },
diff --git a/website/versioned_docs/version-0.11.1/azure_hoodie.md b/website/versioned_docs/version-0.11.1/azure_hoodie.md
new file mode 100644
index 0000000000..f28ec609c7
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/azure_hoodie.md
@@ -0,0 +1,50 @@
+---
+title: Microsoft Azure
+keywords: [ hudi, hive, azure, spark, presto]
+summary: In this page, we go over how to configure Hudi with Azure filesystem.
+last_modified_at: 2020-05-25T19:00:57-04:00
+---
+In this page, we explain how to use Hudi on Microsoft Azure.
+
+## Disclaimer
+
+This page is maintained by the Hudi community.
+If the information is inaccurate or you have additional information to add.
+Please feel free to create a JIRA ticket. Contribution is highly appreciated.
+
+## Supported Storage System
+
+There are two storage systems support Hudi .
+
+- Azure Blob Storage
+- Azure Data Lake Gen 2
+
+## Verified Combination of Spark and storage system
+
+#### HDInsight Spark2.4 on Azure Data Lake Storage Gen 2
+This combination works out of the box. No extra config needed.
+
+#### Databricks Spark2.4 on Azure Data Lake Storage Gen 2
+- Import Hudi jar to databricks workspace
+
+- Mount the file system to dbutils.
+  ```scala
+  dbutils.fs.mount(
+    source = "abfss://xxx@xxx.dfs.core.windows.net",
+    mountPoint = "/mountpoint",
+    extraConfigs = configs)
+  ```
+- When writing Hudi dataset, use abfss URL
+  ```scala
+  inputDF.write
+    .format("org.apache.hudi")
+    .options(opts)
+    .mode(SaveMode.Append)
+    .save("abfss://<<storage-account>>.dfs.core.windows.net/hudi-tables/customer")
+  ```
+- When reading Hudi dataset, use the mounting point
+  ```scala
+  spark.read
+    .format("org.apache.hudi")
+    .load("/mountpoint/hudi-tables/customer")
+  ```
diff --git a/website/versioned_docs/version-0.11.1/basic_configurations.md b/website/versioned_docs/version-0.11.1/basic_configurations.md
new file mode 100644
index 0000000000..09c6259616
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/basic_configurations.md
@@ -0,0 +1,744 @@
+---
+title: Basic Configurations
+toc: true
+---
+
+This page covers the basic configurations you may use to write/read Hudi tables. This page only features a subset of the
+most frequently used configurations. For a full list of all configs, please visit the [All Configurations](/docs/configurations) page.
+
+- [**Spark Datasource Configs**](#SPARK_DATASOURCE): These configs control the Hudi Spark Datasource, providing ability to define keys/partitioning, pick out the write operation, specify how to merge records or choosing query type to read.
+- [**Flink Sql Configs**](#FLINK_SQL): These configs control the Hudi Flink SQL source/sink connectors, providing ability to define record keys, pick out the write operation, specify how to merge records, enable/disable asynchronous compaction or choosing query type to read.
+- [**Write Client Configs**](#WRITE_CLIENT): Internally, the Hudi datasource uses a RDD based HoodieWriteClient API to actually perform writes to storage. These configs provide deep control over lower level aspects like file sizing, compression, parallelism, compaction, write schema, cleaning etc. Although Hudi provides sane defaults, from time-time these configs may need to be tweaked to optimize for specific workloads.
+- [**Metrics Configs**](#METRICS): These set of configs are used to enable monitoring and reporting of key Hudi stats and metrics.
+- [**Record Payload Config**](#RECORD_PAYLOAD): This is the lowest level of customization offered by Hudi. Record payloads define how to produce new values to upsert based on incoming new record and stored old record. Hudi provides default implementations such as OverwriteWithLatestAvroPayload which simply update table with the latest/last-written record. This can be overridden to a custom class extending HoodieRecordPayload class, on both datasource and WriteClient levels.
+
+## Spark Datasource Configs {#SPARK_DATASOURCE}
+These configs control the Hudi Spark Datasource, providing ability to define keys/partitioning, pick out the write operation, specify how to merge records or choosing query type to read.
+
+### Read Options {#Read-Options}
+
+Options useful for reading tables via `read.format.option(...)`
+
+
+`Config Class`: org.apache.hudi.DataSourceOptions.scala<br></br>
+> #### hoodie.datasource.query.type
+> Whether data needs to be read, in incremental mode (new data since an instantTime) (or) Read Optimized mode (obtain latest view, based on base files) (or) Snapshot mode (obtain latest view, by merging base and (if any) log files)<br></br>
+> **Default Value**: snapshot (Optional)<br></br>
+> `Config Param: QUERY_TYPE`<br></br>
+
+---
+
+### Write Options {#Write-Options}
+
+You can pass down any of the WriteClient level configs directly using `options()` or `option(k,v)` methods.
+
+```java
+inputDF.write()
+.format("org.apache.hudi")
+.options(clientOpts) // any of the Hudi client opts can be passed in as well
+.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "_row_key")
+.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), "partition")
+.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "timestamp")
+.option(HoodieWriteConfig.TABLE_NAME, tableName)
+.mode(SaveMode.Append)
+.save(basePath);
+```
+
+Options useful for writing tables via `write.format.option(...)`
+
+
+`Config Class`: org.apache.hudi.DataSourceOptions.scala<br></br>
+
+> #### hoodie.datasource.write.operation
+> Whether to do upsert, insert or bulkinsert for the write operation. Use bulkinsert to load new data into a table, and there after use upsert/insert. bulk insert uses a disk based write path to scale to load large inputs without need to cache it.<br></br>
+> **Default Value**: upsert (Optional)<br></br>
+> `Config Param: OPERATION`<br></br>
+
+---
+
+> #### hoodie.datasource.write.table.type
+> The table type for the underlying data, for this write. This can’t change between writes.<br></br>
+> **Default Value**: COPY_ON_WRITE (Optional)<br></br>
+> `Config Param: TABLE_TYPE`<br></br>
+
+---
+
+> #### hoodie.datasource.write.table.name
+> Table name for the datasource write. Also used to register the table into meta stores.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TABLE_NAME`<br></br>
+
+---
+
+> #### hoodie.datasource.write.recordkey.field
+> Record key field. Value to be used as the `recordKey` component of `HoodieKey`.
+Actual value will be obtained by invoking .toString() on the field value. Nested fields can be specified using
+the dot notation eg: `a.b.c`<br></br>
+> **Default Value**: uuid (Optional)<br></br>
+> `Config Param: RECORDKEY_FIELD`<br></br>
+
+---
+
+> #### hoodie.datasource.write.partitionpath.field
+> Partition path field. Value to be used at the partitionPath component of HoodieKey. Actual value ontained by invoking .toString()<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITIONPATH_FIELD`<br></br>
+
+---
+
+> #### hoodie.datasource.write.keygenerator.class
+> Key generator class, that implements `org.apache.hudi.keygen.KeyGenerator`<br></br>
+> **Default Value**: org.apache.hudi.keygen.SimpleKeyGenerator (Optional)<br></br>
+> `Config Param: KEYGENERATOR_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.datasource.write.precombine.field
+> Field used in preCombining before actual write. When two records have the same key value, we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..)<br></br>
+> **Default Value**: ts (Optional)<br></br>
+> `Config Param: PRECOMBINE_FIELD`<br></br>
+
+---
+
+> #### hoodie.datasource.write.payload.class
+> Payload class used. Override this, if you like to roll your own merge logic, when upserting/inserting. This will render any value set for PRECOMBINE_FIELD_OPT_VAL in-effective<br></br>
+> **Default Value**: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload (Optional)<br></br>
+> `Config Param: PAYLOAD_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.datasource.write.partitionpath.urlencode
+> Should we url encode the partition path value, before creating the folder structure.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: URL_ENCODE_PARTITIONING`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.enable
+> When set to true, register/sync the table to Apache Hive metastore<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SYNC_ENABLED`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.mode
+> Mode to choose for Hive ops. Valid values are hms, jdbc and hiveql.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HIVE_SYNC_MODE`<br></br>
+
+---
+
+> #### hoodie.datasource.write.hive_style_partitioning
+> Flag to indicate whether to use Hive style partitioning.
+If set true, the names of partition folders follow <partition_column_name>=<partition_value> format.
+By default false (the names of partition folders are only partition values)<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_STYLE_PARTITIONING`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.partition_fields
+> Field in the table to use for determining hive partition columns.<br></br>
+> **Default Value**: (Optional)<br></br>
+> `Config Param: HIVE_PARTITION_FIELDS`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.partition_extractor_class
+> Class which implements PartitionValueExtractor to extract the partition values, default 'SlashEncodedDayPartitionValueExtractor'.<br></br>
+> **Default Value**: org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor (Optional)<br></br>
+> `Config Param: HIVE_PARTITION_EXTRACTOR_CLASS`<br></br>
+
+---
+
+## Flink Sql Configs {#FLINK_SQL}
+These configs control the Hudi Flink SQL source/sink connectors, providing ability to define record keys, pick out the write operation, specify how to merge records, enable/disable asynchronous compaction or choosing query type to read.
+
+### Flink Options {#Flink-Options}
+
+> #### path
+> Base path for the target hoodie table.
+The path would be created if it does not exist,
+otherwise a Hoodie table expects to be initialized successfully<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PATH`<br></br>
+
+---
+
+> #### hoodie.table.name
+> Table name to register to Hive metastore<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TABLE_NAME`<br></br>
+
+---
+
+
+> #### table.type
+> Type of table to write. COPY_ON_WRITE (or) MERGE_ON_READ<br></br>
+> **Default Value**: COPY_ON_WRITE (Optional)<br></br>
+> `Config Param: TABLE_TYPE`<br></br>
+
+---
+
+> #### write.operation
+> The write operation, that this write should do<br></br>
+> **Default Value**: upsert (Optional)<br></br>
+> `Config Param: OPERATION`<br></br>
+
+---
+
+> #### write.tasks
+> Parallelism of tasks that do actual write, default is 4<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: WRITE_TASKS`<br></br>
+
+---
+
+> #### write.bucket_assign.tasks
+> Parallelism of tasks that do bucket assign, default is the parallelism of the execution environment<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BUCKET_ASSIGN_TASKS`<br></br>
+
+---
+
+> #### write.precombine
+> Flag to indicate whether to drop duplicates before insert/upsert.
+By default these cases will accept duplicates, to gain extra performance:
+1) insert operation;
+2) upsert for MOR table, the MOR table deduplicate on reading<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: PRE_COMBINE`<br></br>
+
+---
+
+> #### read.tasks
+> Parallelism of tasks that do actual read, default is 4<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: READ_TASKS`<br></br>
+
+---
+
+> #### read.start-commit
+> Start commit instant for reading, the commit time format should be 'yyyyMMddHHmmss', by default reading from the latest instant for streaming read<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: READ_START_COMMIT`<br></br>
+
+---
+
+> #### read.streaming.enabled
+> Whether to read as streaming source, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: READ_AS_STREAMING`<br></br>
+
+---
+
+> #### compaction.tasks
+> Parallelism of tasks that do actual compaction, default is 4<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: COMPACTION_TASKS`<br></br>
+
+---
+
+> #### hoodie.datasource.write.hive_style_partitioning
+> Whether to use Hive style partitioning.
+If set true, the names of partition folders follow &lt;partition_column_name&gt;=&lt;partition_value&gt; format.
+By default false (the names of partition folders are only partition values)<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_STYLE_PARTITIONING`<br></br>
+
+---
+
+> #### hive_sync.enable
+> Asynchronously sync Hive meta to HMS, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SYNC_ENABLED`<br></br>
+
+---
+
+> #### hive_sync.mode
+> Mode to choose for Hive ops. Valid values are hms, jdbc and hiveql, default 'jdbc'<br></br>
+> **Default Value**: jdbc (Optional)<br></br>
+> `Config Param: HIVE_SYNC_MODE`<br></br>
+
+---
+
+> #### hive_sync.table
+> Table name for hive sync, default 'unknown'<br></br>
+> **Default Value**: unknown (Optional)<br></br>
+> `Config Param: HIVE_SYNC_TABLE`<br></br>
+
+---
+
+> #### hive_sync.db
+> Database name for hive sync, default 'default'<br></br>
+> **Default Value**: default (Optional)<br></br>
+> `Config Param: HIVE_SYNC_DB`<br></br>
+
+---
+
+> #### hive_sync.partition_extractor_class
+> Tool to extract the partition value from HDFS path, default 'SlashEncodedDayPartitionValueExtractor'<br></br>
+> **Default Value**: org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor (Optional)<br></br>
+> `Config Param: HIVE_SYNC_PARTITION_EXTRACTOR_CLASS_NAME`<br></br>
+
+---
+> #### hive_sync.metastore.uris
+> Metastore uris for hive sync, default ''<br></br>
+> **Default Value**: (Optional)<br></br>
+> `Config Param: HIVE_SYNC_METASTORE_URIS`<br></br>
+
+---
+
+
+## Write Client Configs {#WRITE_CLIENT}
+Internally, the Hudi datasource uses a RDD based HoodieWriteClient API to actually perform writes to storage. These configs provide deep control over lower level aspects like file sizing, compression, parallelism, compaction, write schema, cleaning etc. Although Hudi provides sane defaults, from time-time these configs may need to be tweaked to optimize for specific workloads.
+
+### Storage Configs
+
+Configurations that control aspects around writing, sizing, reading base and log files.
+
+`Config Class`: org.apache.hudi.config.HoodieStorageConfig<br></br>
+
+> #### write.parquet.block.size
+> Parquet RowGroup size. It's recommended to make this large enough that scan costs can be amortized by packing enough column values into a single row group.<br></br>
+> **Default Value**: 120 (Optional)<br></br>
+> `Config Param: WRITE_PARQUET_BLOCK_SIZE`<br></br>
+
+---
+
+> #### write.parquet.max.file.size
+> Target size for parquet files produced by Hudi write phases. For DFS, this needs to be aligned with the underlying filesystem block size for optimal performance.<br></br>
+> **Default Value**: 120 (Optional)<br></br>
+> `Config Param: WRITE_PARQUET_MAX_FILE_SIZE`<br></br>
+
+---
+
+### Metadata Configs
+
+Configurations used by the Hudi Metadata Table. This table maintains the metadata about a given Hudi table (e.g file listings) to avoid overhead of accessing cloud storage, during queries.
+
+`Config Class`: org.apache.hudi.common.config.HoodieMetadataConfig<br></br>
+
+> #### hoodie.metadata.enable
+> Enable the internal metadata table which serves table metadata like level file listings<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ENABLE`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+### Write Configurations
+
+Configurations that control write behavior on Hudi tables. These can be directly passed down from even higher level frameworks (e.g Spark datasources, Flink sink) and utilities (e.g DeltaStreamer).
+
+`Config Class`: org.apache.hudi.config.HoodieWriteConfig<br></br>
+
+> #### hoodie.combine.before.upsert
+> When upserted records share same key, controls whether they should be first combined (i.e de-duplicated) before writing to storage. This should be turned off only if you are absolutely certain that there are no duplicates incoming, otherwise it can lead to duplicate keys and violate the uniqueness guarantees.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: COMBINE_BEFORE_UPSERT`<br></br>
+
+---
+
+> #### hoodie.write.markers.type
+> Marker type to use. Two modes are supported: - DIRECT: individual marker file corresponding to each data file is directly created by the writer. - TIMELINE_SERVER_BASED: marker operations are all handled at the timeline service which serves as a proxy. New marker entries are batch processed and stored in a limited number of underlying files for efficiency. If HDFS is used or timeline server is disabled, DIRECT markers are used as fallback even if this is configured. For Spark structure [...]
+> **Default Value**: TIMELINE_SERVER_BASED (Optional)<br></br>
+> `Config Param: MARKERS_TYPE`<br></br>
+> `Since Version: 0.9.0`<br></br>
+
+---
+
+> #### hoodie.insert.shuffle.parallelism
+> Parallelism for inserting records into the table. Inserts can shuffle data before writing to tune file sizes and optimize the storage layout.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: INSERT_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.rollback.parallelism
+> Parallelism for rollback of commits. Rollbacks perform delete of files or logging delete blocks to file groups on storage in parallel.<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: ROLLBACK_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.combine.before.delete
+> During delete operations, controls whether we should combine deletes (and potentially also upserts) before writing to storage.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: COMBINE_BEFORE_DELETE`<br></br>
+
+---
+
+> #### hoodie.combine.before.insert
+> When inserted records share same key, controls whether they should be first combined (i.e de-duplicated) before writing to storage. When set to true the 
+> precombine field value is used to reduce all records that share the same key. <br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: COMBINE_BEFORE_INSERT`<br></br>
+
+---
+
+> #### hoodie.bulkinsert.shuffle.parallelism
+> For large initial imports using bulk_insert operation, controls the parallelism to use for sort modes or custom partitioning done before writing records to the table.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: BULKINSERT_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.delete.shuffle.parallelism
+> Parallelism used for “delete” operation. Delete operations also perform shuffles, similar to upsert operation.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: DELETE_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.bulkinsert.sort.mode
+> Sorting modes to use for sorting records for bulk insert. This is used when user hoodie.bulkinsert.user.defined.partitioner.class is not configured. Available values are - GLOBAL_SORT: this ensures best file sizes, with lowest memory overhead at cost of sorting. PARTITION_SORT: Strikes a balance by only sorting within a partition, still keeping the memory overhead of writing lowest and best effort file sizing. NONE: No sorting. Fastest and matches `spark.write.parquet()` in terms of nu [...]
+> **Default Value**: GLOBAL_SORT (Optional)<br></br>
+> `Config Param: BULK_INSERT_SORT_MODE`<br></br>
+
+---
+
+> #### hoodie.embed.timeline.server
+> When true, spins up an instance of the timeline server (meta server that serves cached file listings, statistics),running on each writer's driver process, accepting requests during the write from executors.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: EMBEDDED_TIMELINE_SERVER_ENABLE`<br></br>
+
+---
+
+> #### hoodie.upsert.shuffle.parallelism
+> Parallelism to use for upsert operation on the table. Upserts can shuffle data to perform index lookups, file sizing, bin packing records optimally into file groups.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: UPSERT_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.rollback.using.markers
+> Enables a more efficient mechanism for rollbacks based on the marker files generated during the writes. Turned on by default.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ROLLBACK_USING_MARKERS_ENABLE`<br></br>
+
+---
+
+> #### hoodie.finalize.write.parallelism
+> Parallelism for the write finalization internal operation, which involves removing any partially written files from lake storage, before committing the write. Reduce this value, if the high number of tasks incur delays for smaller tables or low latency writes.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: FINALIZE_WRITE_PARALLELISM_VALUE`<br></br>
+
+---
+
+### Compaction Configs {#Compaction-Configs}
+
+Configurations that control compaction (merging of log files onto new base files) as well as cleaning (reclamation of older/unused file groups/slices).
+
+`Config Class`: org.apache.hudi.config.HoodieCompactionConfig<br></br>
+
+> #### hoodie.cleaner.policy
+> Cleaning policy to be used. The cleaner service deletes older file slices files to re-claim space. By default, cleaner spares the file slices written by the last N commits, determined by hoodie.cleaner.commits.retained Long running query plans may often refer to older file slices and will break if those are cleaned, before the query has had a chance to run. So, it is good to make sure that the data is retained for more than the maximum query execution time<br></br>
+> **Default Value**: KEEP_LATEST_COMMITS (Optional)<br></br>
+> `Config Param: CLEANER_POLICY`<br></br>
+
+---
+
+> #### hoodie.copyonwrite.record.size.estimate
+> The average record size. If not explicitly specified, hudi will compute the record size estimate dynamically based on commit metadata. This is critical in computing the insert parallelism and bin-packing inserts into small files.<br></br>
+> **Default Value**: 1024 (Optional)<br></br>
+> `Config Param: COPY_ON_WRITE_RECORD_SIZE_ESTIMATE`<br></br>
+
+---
+
+> #### hoodie.compact.inline.max.delta.seconds
+> Number of elapsed seconds after the last compaction, before scheduling a new one.<br></br>
+> **Default Value**: 3600 (Optional)<br></br>
+> `Config Param: INLINE_COMPACT_TIME_DELTA_SECONDS`<br></br>
+
+---
+
+> #### hoodie.cleaner.commits.retained
+> Number of commit to retain when cleaner is triggered with KEEP_LATEST_COMMITS cleaning policy. Make sure to configure this property properly so that the longest running query is able to succeed. This also directly translates into how much data retention the table supports for incremental queries.
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: CLEANER_COMMITS_RETAINED`<br></br>
+
+---
+
+> #### hoodie.clean.async
+> Only applies when hoodie.clean.automatic is turned on. When turned on runs cleaner async with writing, which can speed up overall write performance.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ASYNC_CLEAN`<br></br>
+
+---
+
+> #### hoodie.clean.automatic
+> When enabled, the cleaner table service is invoked immediately after each commit, to delete older file slices. It's recommended to enable this, to ensure metadata and data storage growth is bounded.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: AUTO_CLEAN`<br></br>
+
+---
+
+> #### hoodie.commits.archival.batch
+> Archiving of instants is batched in best-effort manner, to pack more instants into a single archive log. This config controls such archival batch size.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: COMMITS_ARCHIVAL_BATCH_SIZE`<br></br>
+
+---
+
+> #### hoodie.compact.inline
+> When set to true, compaction service is triggered after each write. While being simpler operationally, this adds extra latency on the write path.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: INLINE_COMPACT`<br></br>
+
+---
+
+> #### hoodie.parquet.small.file.limit
+> During upsert operation, we opportunistically expand existing small files on storage, instead of writing new files, to keep number of files to an optimum. This config sets the file size limit below which a file on storage becomes a candidate to be selected as such a `small file`. By default, treat any file <= 100MB as a small file.<br></br>
+> **Default Value**: 104857600 (Optional)<br></br>
+> `Config Param: PARQUET_SMALL_FILE_LIMIT`<br></br>
+
+---
+
+> #### hoodie.compaction.strategy
+> Compaction strategy decides which file groups are picked up for compaction during each compaction run. By default, Hudi picks the log file with most accumulated unmerged data<br></br>
+> **Default Value**: org.apache.hudi.table.action.compact.strategy.LogFileSizeBasedCompactionStrategy (Optional)<br></br>
+> `Config Param: COMPACTION_STRATEGY`<br></br>
+
+---
+
+> #### hoodie.archive.automatic
+> When enabled, the archival table service is invoked immediately after each commit, to archive commits if we cross a maximum value of commits. It's recommended to enable this, to ensure number of active commits is bounded.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: AUTO_ARCHIVE`<br></br>
+
+---
+
+> #### hoodie.copyonwrite.insert.auto.split
+> Config to control whether we control insert split sizes automatically based on average record sizes. It's recommended to keep this turned on, since hand tuning is otherwise extremely cumbersome.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: COPY_ON_WRITE_AUTO_SPLIT_INSERTS`<br></br>
+
+---
+
+> #### hoodie.compact.inline.max.delta.commits
+> Number of delta commits after the last compaction, before scheduling of a new compaction is attempted. This is used when the [compaction trigger strategy](/docs/configurations/#hoodiecompactinlinetriggerstrategy) involves number of commits. For example NUM_COMMITS,NUM_AND_TIME,NUM_OR_TIME <br></br>
+> **Default Value**: 5 (Optional)<br></br>
+> `Config Param: INLINE_COMPACT_NUM_DELTA_COMMITS`<br></br>
+
+---
+
+> #### hoodie.keep.min.commits
+> Similar to hoodie.keep.max.commits, but controls the minimum number of instants to retain in the active timeline.<br></br>
+> **Default Value**: 20 (Optional)<br></br>
+> `Config Param: MIN_COMMITS_TO_KEEP`<br></br>
+
+---
+
+> #### hoodie.cleaner.parallelism
+> Parallelism for the cleaning operation. Increase this if cleaning becomes slow.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: CLEANER_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.record.size.estimation.threshold
+> We use the previous commits' metadata to calculate the estimated record size and use it to bin pack records into partitions. If the previous commit is too small to make an accurate estimation, Hudi will search commits in the reverse order, until we find a commit that has totalBytesWritten larger than (PARQUET_SMALL_FILE_LIMIT_BYTES * this_threshold)<br></br>
+> **Default Value**: 1.0 (Optional)<br></br>
+> `Config Param: RECORD_SIZE_ESTIMATION_THRESHOLD`<br></br>
+
+---
+
+> #### hoodie.compact.inline.trigger.strategy
+> Controls how compaction scheduling is triggered, by time or num delta commits or combination of both. Valid options: NUM_COMMITS,TIME_ELAPSED,NUM_AND_TIME,NUM_OR_TIME<br></br>
+> **Default Value**: NUM_COMMITS (Optional)<br></br>
+> `Config Param: INLINE_COMPACT_TRIGGER_STRATEGY`<br></br>
+
+---
+
+> #### hoodie.keep.max.commits
+> Archiving service moves older entries from timeline into an archived log after each write, to keep the metadata overhead constant, even as the table size grows.This config controls the maximum number of instants to retain in the active timeline. <br></br>
+> **Default Value**: 30 (Optional)<br></br>
+> `Config Param: MAX_COMMITS_TO_KEEP`<br></br>
+
+---
+
+> #### hoodie.copyonwrite.insert.split.size
+> Number of inserts assigned for each partition/bucket for writing. We based the default on writing out 100MB files, with at least 1kb records (100K records per file), and over provision to 500K. As long as auto-tuning of splits is turned on, this only affects the first write, where there is no history to learn record sizes from.<br></br>
+> **Default Value**: 500000 (Optional)<br></br>
+> `Config Param: COPY_ON_WRITE_INSERT_SPLIT_SIZE`<br></br>
+
+---
+
+### File System View Storage Configurations {#File-System-View-Storage-Configurations}
+
+Configurations that control how file metadata is stored by Hudi, for transaction processing and queries.
+
+`Config Class`: org.apache.hudi.common.table.view.FileSystemViewStorageConfig<br></br>
+
+> #### hoodie.filesystem.view.type
+> File system view provides APIs for viewing the files on the underlying lake storage, as file groups and file slices. This config controls how such a view is held. Options include MEMORY,SPILLABLE_DISK,EMBEDDED_KV_STORE,REMOTE_ONLY,REMOTE_FIRST which provide different trade offs for memory usage and API request performance.<br></br>
+> **Default Value**: MEMORY (Optional)<br></br>
+> `Config Param: VIEW_TYPE`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.secondary.type
+> Specifies the secondary form of storage for file system view, if the primary (e.g timeline server) is unavailable.<br></br>
+> **Default Value**: MEMORY (Optional)<br></br>
+> `Config Param: SECONDARY_VIEW_TYPE`<br></br>
+
+---
+
+### Index Configs
+
+Configurations that control indexing behavior, which tags incoming records as either inserts or updates to older records.
+
+`Config Class`: org.apache.hudi.config.HoodieIndexConfig<br></br>
+
+> #### hoodie.index.type
+> Type of index to use. Default is Bloom filter. Possible options are [BLOOM | GLOBAL_BLOOM |SIMPLE | GLOBAL_SIMPLE | INMEMORY | HBASE | BUCKET]. Bloom filters removes the dependency on a external system and is stored in the footer of the Parquet Data Files<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: INDEX_TYPE`<br></br>
+
+---
+
+> #### hoodie.index.bloom.fpp
+> Only applies if index type is BLOOM. Error rate allowed given the number of entries. This is used to calculate how many bits should be assigned for the bloom filter and the number of hash functions. This is usually set very low (default: 0.000000001), we like to tradeoff disk space for lower false positives. If the number of entries added to bloom filter exceeds the configured value (hoodie.index.bloom.num_entries), then this fpp may not be honored.<br></br>
+> **Default Value**: 0.000000001 (Optional)<br></br>
+> `Config Param: BLOOM_FILTER_FPP_VALUE`<br></br>
+
+---
+
+> #### hoodie.index.bloom.num_entries
+> Only applies if index type is BLOOM. This is the number of entries to be stored in the bloom filter. The rationale for the default: Assume the maxParquetFileSize is 128MB and averageRecordSize is 1kb and hence we approx a total of 130K records in a file. The default (60000) is roughly half of this approximation. Warning: Setting this very low, will generate a lot of false positives and index lookup will have to scan a lot more files than it has to and setting this to a very high number [...]
+> **Default Value**: 60000 (Optional)<br></br>
+> `Config Param: BLOOM_FILTER_NUM_ENTRIES_VALUE`<br></br>
+
+---
+
+> #### hoodie.bloom.index.update.partition.path
+> Only applies if index type is GLOBAL_BLOOM. When set to true, an update including the partition path of a record that already exists will result in inserting the incoming record into the new partition and deleting the original record in the old partition. When set to false, the original record will only be updated in the old partition<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_UPDATE_PARTITION_PATH_ENABLE`<br></br>
+
+---
+
+> #### hoodie.bloom.index.use.caching
+> Only applies if index type is BLOOM. When true, the input RDD will cached to speed up index lookup by reducing IO for computing parallelism or affected partitions<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_USE_CACHING`<br></br>
+
+---
+
+> #### hoodie.bloom.index.parallelism
+> Only applies if index type is BLOOM. This is the amount of parallelism for index lookup, which involves a shuffle. By default, this is auto computed based on input workload characteristics.<br></br>
+> **Default Value**: 0 (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_PARALLELISM`<br></br>
+
+---
+
+> #### hoodie.bloom.index.prune.by.ranges
+> Only applies if index type is BLOOM. When true, range information from files to leveraged speed up index lookups. Particularly helpful, if the key has a monotonously increasing prefix, such as timestamp. If the record key is completely random, it is better to turn this off, since range pruning will only add extra overhead to the index lookup.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_PRUNE_BY_RANGES`<br></br>
+
+---
+
+> #### hoodie.bloom.index.filter.type
+> Filter type used. Default is BloomFilterTypeCode.DYNAMIC_V0. Available values are [BloomFilterTypeCode.SIMPLE , BloomFilterTypeCode.DYNAMIC_V0]. Dynamic bloom filters auto size themselves based on number of keys.<br></br>
+> **Default Value**: DYNAMIC_V0 (Optional)<br></br>
+> `Config Param: BLOOM_FILTER_TYPE`<br></br>
+
+---
+
+> #### hoodie.simple.index.parallelism
+> Only applies if index type is SIMPLE. This is the amount of parallelism for index lookup, which involves a Spark Shuffle<br></br>
+> **Default Value**: 50 (Optional)<br></br>
+> `Config Param: SIMPLE_INDEX_PARALLELISM`<br></br>
+
+---
+
+> #### hoodie.simple.index.use.caching
+> Only applies if index type is SIMPLE. When true, the incoming writes will cached to speed up index lookup by reducing IO for computing parallelism or affected partitions<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: SIMPLE_INDEX_USE_CACHING`<br></br>
+
+---
+
+> #### hoodie.global.simple.index.parallelism
+> Only applies if index type is GLOBAL_SIMPLE. This is the amount of parallelism for index lookup, which involves a Spark Shuffle<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: GLOBAL_SIMPLE_INDEX_PARALLELISM`<br></br>
+
+---
+
+> #### hoodie.simple.index.update.partition.path
+> Similar to Key: 'hoodie.bloom.index.update.partition.path' , default: true but for simple index. Since version: 0.6.0<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: SIMPLE_INDEX_UPDATE_PARTITION_PATH_ENABLE`<br></br>
+
+---
+
+### Common Configurations {#Common-Configurations}
+
+The following set of configurations are common across Hudi.
+
+`Config Class`: org.apache.hudi.common.config.HoodieCommonConfig<br></br>
+
+> #### hoodie.common.spillable.diskmap.type
+> When handling input data that cannot be held in memory, to merge with a file on storage, a spillable diskmap is employed. By default, we use a persistent hashmap based loosely on bitcask, that offers O(1) inserts, lookups. Change this to `ROCKS_DB` to prefer using rocksDB, for handling the spill.<br></br>
+> **Default Value**: BITCASK (Optional)<br></br>
+> `Config Param: SPILLABLE_DISK_MAP_TYPE`<br></br>
+
+---
+
+## Metrics Configs {#METRICS}
+These set of configs are used to enable monitoring and reporting of key Hudi stats and metrics.
+
+### Metrics Configurations for Datadog reporter {#Metrics-Configurations-for-Datadog-reporter}
+
+Enables reporting on Hudi metrics using the Datadog reporter type. Hudi publishes metrics on every commit, clean, rollback etc.
+
+`Config Class`: org.apache.hudi.config.metrics.HoodieMetricsDatadogConfig<br></br>
+
+> #### hoodie.metrics.on
+> Turn on/off metrics reporting. off by default.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: TURN_METRICS_ON`<br></br>
+> `Since Version: 0.5.0`<br></br>
+
+---
+
+> #### hoodie.metrics.reporter.type
+> Type of metrics reporter.<br></br>
+> **Default Value**: GRAPHITE (Optional)<br></br>
+> `Config Param: METRICS_REPORTER_TYPE_VALUE`<br></br>
+> `Since Version: 0.5.0`<br></br>
+
+---
+
+> #### hoodie.metrics.reporter.class
+> <br></br>
+> **Default Value**: (Optional)<br></br>
+> `Config Param: METRICS_REPORTER_CLASS_NAME`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+## Record Payload Config {#RECORD_PAYLOAD}
+This is the lowest level of customization offered by Hudi. Record payloads define how to produce new values to upsert based on incoming new record and stored old record. Hudi provides default implementations such as OverwriteWithLatestAvroPayload which simply update table with the latest/last-written record. This can be overridden to a custom class extending HoodieRecordPayload class, on both datasource and WriteClient levels.
+
+### Payload Configurations {#Payload-Configurations}
+
+Payload related configs, that can be leveraged to control merges based on specific business fields in the data.
+
+`Config Class`: org.apache.hudi.config.HoodiePayloadConfig<br></br>
+> #### hoodie.payload.event.time.field
+> Table column/field name to derive timestamp associated with the records. This can be useful for e.g, determining the freshness of the table.<br></br>
+> **Default Value**: ts (Optional)<br></br>
+> `Config Param: EVENT_TIME_FIELD`<br></br>
+
+---
+
+> #### hoodie.payload.ordering.field
+> Table column/field name to order records that have the same key, before merging and writing to storage.<br></br>
+> **Default Value**: ts (Optional)<br></br>
+> `Config Param: ORDERING_FIELD`<br></br>
+
+---
diff --git a/website/versioned_docs/version-0.11.1/bos_hoodie.md b/website/versioned_docs/version-0.11.1/bos_hoodie.md
new file mode 100644
index 0000000000..2a6cde81c8
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/bos_hoodie.md
@@ -0,0 +1,57 @@
+---
+title: Baidu Cloud
+keywords: [ hudi, hive, baidu, bos, spark, presto]
+summary: In this page, we go over how to configure Hudi with bos filesystem.
+last_modified_at: 2021-06-09T11:38:24-10:00
+---
+In this page, we explain how to get your Hudi job to store into Baidu BOS.
+
+## Baidu BOS configs
+
+There are two configurations required for Hudi-BOS compatibility:
+
+- Adding Baidu BOS Credentials for Hudi
+- Adding required Jars to classpath
+
+### Baidu BOS Credentials
+
+Add the required configs in your core-site.xml from where Hudi can fetch them. Replace the `fs.defaultFS` with your BOS bucket name, replace `fs.bos.endpoint` with your bos endpoint, replace `fs.bos.access.key` with your bos key, replace `fs.bos.secret.access.key` with your bos secret key. Hudi should be able to read/write from the bucket.
+
+```xml
+<property>
+  <name>fs.defaultFS</name>
+  <value>bos://bucketname/</value>
+</property>
+
+<property>
+  <name>fs.bos.endpoint</name>
+  <value>bos-endpoint-address</value>
+  <description>Baidu bos endpoint to connect to,for example : http://bj.bcebos.com</description>
+</property>
+
+<property>
+  <name>fs.bos.access.key</name>
+  <value>bos-key</value>
+  <description>Baidu access key</description>
+</property>
+
+<property>
+  <name>fs.bos.secret.access.key</name>
+  <value>bos-secret-key</value>
+  <description>Baidu secret key.</description>
+</property>
+
+<property>
+  <name>fs.bos.impl</name>
+  <value>org.apache.hadoop.fs.bos.BaiduBosFileSystem</value>
+</property>
+```
+
+### Baidu bos Libs
+
+Baidu hadoop libraries jars to add to our classpath
+
+- com.baidubce:bce-java-sdk:0.10.165
+- bos-hdfs-sdk-1.0.2-community.jar 
+
+You can  download the bos-hdfs-sdk jar from [here](https://sdk.bce.baidu.com/console-sdk/bos-hdfs-sdk-1.0.2-community.jar.zip) , and then unzip it.
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.11.1/cli.md b/website/versioned_docs/version-0.11.1/cli.md
new file mode 100644
index 0000000000..c971181545
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/cli.md
@@ -0,0 +1,422 @@
+---
+title: CLI
+keywords: [hudi, cli]
+last_modified_at: 2021-08-18T15:59:57-04:00
+---
+
+### Local set up
+Once hudi has been built, the shell can be fired by via  `cd hudi-cli && ./hudi-cli.sh`. A hudi table resides on DFS, in a location referred to as the `basePath` and
+we would need this location in order to connect to a Hudi table. Hudi library effectively manages this table internally, using `.hoodie` subfolder to track all metadata.
+
+
+### Using Hudi-cli in S3
+If you are using hudi that comes packaged with AWS EMR, you can find instructions to use hudi-cli [here](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-cli.html).
+If you are not using EMR, or would like to use latest hudi-cli from master, you can follow the below steps to access S3 dataset in your local environment (laptop).  
+
+Build Hudi with corresponding Spark version, for eg, -Dspark3.1.x
+
+Set the following environment variables. 
+```
+export AWS_REGION=us-east-2
+export AWS_ACCESS_KEY_ID=<key_id>
+export AWS_SECRET_ACCESS_KEY=<secret_key>
+export SPARK_HOME=<spark_home>
+```
+Ensure you set the SPARK_HOME to your local spark home compatible to compiled hudi spark version above.
+
+Apart from these, we might need to add aws jars to class path so that accessing S3 is feasible from local. 
+We need two jars, namely, aws-java-sdk-bundle jar and hadoop-aws jar which you can find online.
+For eg:
+```
+wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar -o /lib/spark-3.2.0-bin-hadoop3.2/jars/hadoop-aws-3.2.0.jar
+wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar -o /lib/spark-3.2.0-bin-hadoop3.2/jars/aws-java-sdk-bundle-1.11.375.jar
+```
+
+#### Note: These AWS jar versions below are specific to Spark 3.2.0
+```
+export CLIENT_JAR=/lib/spark-3.2.0-bin-hadoop3.2/jars/aws-java-sdk-bundle-1.12.48.jar:/lib/spark-3.2.0-bin-hadoop3.2/jars/hadoop-aws-3.3.1.jar
+```
+Once these are set, you are good to launch hudi-cli and access S3 dataset. 
+```
+./hudi-cli/hudi-cli.sh
+```
+
+## Using hudi-cli
+
+To initialize a hudi table, use the following command.
+
+```java
+===================================================================
+*         ___                          ___                        *
+*        /\__\          ___           /\  \           ___         *
+*       / /  /         /\__\         /  \  \         /\  \        *
+*      / /__/         / /  /        / /\ \  \        \ \  \       *
+*     /  \  \ ___    / /  /        / /  \ \__\       /  \__\      *
+*    / /\ \  /\__\  / /__/  ___   / /__/ \ |__|     / /\/__/      *
+*    \/  \ \/ /  /  \ \  \ /\__\  \ \  \ / /  /  /\/ /  /         *
+*         \  /  /    \ \  / /  /   \ \  / /  /   \  /__/          *
+*         / /  /      \ \/ /  /     \ \/ /  /     \ \__\          *
+*        / /  /        \  /  /       \  /  /       \/__/          *
+*        \/__/          \/__/         \/__/    Apache Hudi CLI    *
+*                                                                 *
+===================================================================
+
+hudi->create --path /user/hive/warehouse/table1 --tableName hoodie_table_1 --tableType COPY_ON_WRITE
+.....
+```
+
+To see the description of hudi table, use the command:
+
+```java
+hudi:hoodie_table_1->desc
+18/09/06 15:57:19 INFO timeline.HoodieActiveTimeline: Loaded instants []
+    _________________________________________________________
+    | Property                | Value                        |
+    |========================================================|
+    | basePath                | ...                          |
+    | metaPath                | ...                          |
+    | fileSystem              | hdfs                         |
+    | hoodie.table.name       | hoodie_table_1               |
+    | hoodie.table.type       | COPY_ON_WRITE                |
+    | hoodie.archivelog.folder|                              |
+```
+
+Following is a sample command to connect to a Hudi table contains uber trips.
+
+```java
+hudi:trips->connect --path /app/uber/trips
+
+16/10/05 23:20:37 INFO model.HoodieTableMetadata: All commits :HoodieCommits{commitList=[20161002045850, 20161002052915, 20161002055918, 20161002065317, 20161002075932, 20161002082904, 20161002085949, 20161002092936, 20161002105903, 20161002112938, 20161002123005, 20161002133002, 20161002155940, 20161002165924, 20161002172907, 20161002175905, 20161002190016, 20161002192954, 20161002195925, 20161002205935, 20161002215928, 20161002222938, 20161002225915, 20161002232906, 20161003003028, 201 [...]
+Metadata for table trips loaded
+```
+
+Once connected to the table, a lot of other commands become available. The shell has contextual autocomplete help (press TAB) and below is a list of all commands, few of which are reviewed in this section
+are reviewed
+
+```java
+hudi:trips->help
+* ! - Allows execution of operating system (OS) commands
+* // - Inline comment markers (start of line only)
+* ; - Inline comment markers (start of line only)
+* addpartitionmeta - Add partition metadata to a table, if not present
+* clear - Clears the console
+* cls - Clears the console
+* commit rollback - Rollback a commit
+* commits compare - Compare commits with another Hoodie table
+* commit showfiles - Show file level details of a commit
+* commit showpartitions - Show partition level details of a commit
+* commits refresh - Refresh the commits
+* commits show - Show the commits
+* commits sync - Compare commits with another Hoodie table
+* connect - Connect to a hoodie table
+* date - Displays the local date and time
+* exit - Exits the shell
+* help - List all commands usage
+* quit - Exits the shell
+* records deduplicate - De-duplicate a partition path contains duplicates & produce repaired files to replace with
+* script - Parses the specified resource file and executes its commands
+* stats filesizes - File Sizes. Display summary stats on sizes of files
+* stats wa - Write Amplification. Ratio of how many records were upserted to how many records were actually written
+* sync validate - Validate the sync by counting the number of records
+* system properties - Shows the shell's properties
+* utils loadClass - Load a class
+* version - Displays shell version
+
+hudi:trips->
+```
+
+
+### Inspecting Commits
+
+The task of upserting or inserting a batch of incoming records is known as a **commit** in Hudi. A commit provides basic atomicity guarantees such that only committed data is available for querying.
+Each commit has a monotonically increasing string/number called the **commit number**. Typically, this is the time at which we started the commit.
+
+To view some basic information about the last 10 commits,
+
+
+```java
+hudi:trips->commits show --sortBy "Total Bytes Written" --desc true --limit 10
+    ________________________________________________________________________________________________________________________________________________________________________
+    | CommitTime    | Total Bytes Written| Total Files Added| Total Files Updated| Total Partitions Written| Total Records Written| Total Update Records Written| Total Errors|
+    |=======================================================================================================================================================================|
+    ....
+    ....
+    ....
+```
+
+At the start of each write, Hudi also writes a .inflight commit to the .hoodie folder. You can use the timestamp there to estimate how long the commit has been inflight
+
+
+```java
+$ hdfs dfs -ls /app/uber/trips/.hoodie/*.inflight
+-rw-r--r--   3 vinoth supergroup     321984 2016-10-05 23:18 /app/uber/trips/.hoodie/20161005225920.inflight
+```
+
+
+### Drilling Down to a specific Commit
+
+To understand how the writes spread across specific partiions,
+
+
+```java
+hudi:trips->commit showpartitions --commit 20161005165855 --sortBy "Total Bytes Written" --desc true --limit 10
+    __________________________________________________________________________________________________________________________________________
+    | Partition Path| Total Files Added| Total Files Updated| Total Records Inserted| Total Records Updated| Total Bytes Written| Total Errors|
+    |=========================================================================================================================================|
+     ....
+     ....
+```
+
+If you need file level granularity , we can do the following
+
+
+```java
+hudi:trips->commit showfiles --commit 20161005165855 --sortBy "Partition Path"
+    ________________________________________________________________________________________________________________________________________________________
+    | Partition Path| File ID                             | Previous Commit| Total Records Updated| Total Records Written| Total Bytes Written| Total Errors|
+    |=======================================================================================================================================================|
+    ....
+    ....
+```
+
+
+### FileSystem View
+
+Hudi views each partition as a collection of file-groups with each file-group containing a list of file-slices in commit order (See concepts).
+The below commands allow users to view the file-slices for a data-set.
+
+```java
+hudi:stock_ticks_mor->show fsview all
+ ....
+  _______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
+ | Partition | FileId | Base-Instant | Data-File | Data-File Size| Num Delta Files| Total Delta File Size| Delta Files |
+ |==============================================================================================================================================================================================================================================================================================================================================================================================================|
+ | 2018/08/31| 111415c3-f26d-4639-86c8-f9956f245ac3| 20181002180759| hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/111415c3-f26d-4639-86c8-f9956f245ac3_0_20181002180759.parquet| 432.5 KB | 1 | 20.8 KB | [HoodieLogFile {hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/.111415c3-f26d-4639-86c8-f9956f245ac3_20181002180759.log.1}]|
+
+
+
+hudi:stock_ticks_mor->show fsview latest --partitionPath "2018/08/31"
+ ......
+ ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ [...]
+ | Partition | FileId | Base-Instant | Data-File | Data-File Size| Num Delta Files| Total Delta Size| Delta Size - compaction scheduled| Delta Size - compaction unscheduled| Delta To Base Ratio - compaction scheduled| Delta To Base Ratio - compaction unscheduled| Delta Files - compaction scheduled | Delta Files - compaction unscheduled|
+ |========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================== [...]
+ | 2018/08/31| 111415c3-f26d-4639-86c8-f9956f245ac3| 20181002180759| hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/111415c3-f26d-4639-86c8-f9956f245ac3_0_20181002180759.parquet| 432.5 KB | 1 | 20.8 KB | 20.8 KB | 0.0 B | 0.0 B | 0.0 B | [HoodieLogFile {hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/.111415c3-f26d-4639-86c8-f9956f245ac3_20181002180759.log.1}]| [] |
+
+```
+
+
+### Statistics
+
+Since Hudi directly manages file sizes for DFS table, it might be good to get an overall picture
+
+
+```java
+hudi:trips->stats filesizes --partitionPath 2016/09/01 --sortBy "95th" --desc true --limit 10
+    ________________________________________________________________________________________________
+    | CommitTime    | Min     | 10th    | 50th    | avg     | 95th    | Max     | NumFiles| StdDev  |
+    |===============================================================================================|
+    | <COMMIT_ID>   | 93.9 MB | 93.9 MB | 93.9 MB | 93.9 MB | 93.9 MB | 93.9 MB | 2       | 2.3 KB  |
+    ....
+    ....
+```
+
+In case of Hudi write taking much longer, it might be good to see the write amplification for any sudden increases
+
+
+```java
+hudi:trips->stats wa
+    __________________________________________________________________________
+    | CommitTime    | Total Upserted| Total Written| Write Amplifiation Factor|
+    |=========================================================================|
+    ....
+    ....
+```
+
+
+### Archived Commits
+
+In order to limit the amount of growth of .commit files on DFS, Hudi archives older .commit files (with due respect to the cleaner policy) into a commits.archived file.
+This is a sequence file that contains a mapping from commitNumber => json with raw information about the commit (same that is nicely rolled up above).
+
+
+### Compactions
+
+To get an idea of the lag between compaction and writer applications, use the below command to list down all
+pending compactions.
+
+```java
+hudi:trips->compactions show all
+     ___________________________________________________________________
+    | Compaction Instant Time| State    | Total FileIds to be Compacted|
+    |==================================================================|
+    | <INSTANT_1>            | REQUESTED| 35                           |
+    | <INSTANT_2>            | INFLIGHT | 27                           |
+```
+
+To inspect a specific compaction plan, use
+
+```java
+hudi:trips->compaction show --instant <INSTANT_1>
+    _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
+    | Partition Path| File Id | Base Instant  | Data File Path                                    | Total Delta Files| getMetrics                                                                                                                    |
+    |================================================================================================================================================================================================================================================
+    | 2018/07/17    | <UUID>  | <INSTANT_1>   | viewfs://ns-default/.../../UUID_<INSTANT>.parquet | 1                | {TOTAL_LOG_FILES=1.0, TOTAL_IO_READ_MB=1230.0, TOTAL_LOG_FILES_SIZE=2.51255751E8, TOTAL_IO_WRITE_MB=991.0, TOTAL_IO_MB=2221.0}|
+
+```
+
+To manually schedule or run a compaction, use the below command. This command uses spark launcher to perform compaction
+operations.
+
+**NOTE:** Make sure no other application is scheduling compaction for this table concurrently
+{: .notice--info}
+
+```java
+hudi:trips->help compaction schedule
+Keyword:                   compaction schedule
+Description:               Schedule Compaction
+ Keyword:                  sparkMemory
+   Help:                   Spark executor memory
+   Mandatory:              false
+   Default if specified:   '__NULL__'
+   Default if unspecified: '1G'
+
+* compaction schedule - Schedule Compaction
+```
+
+```java
+hudi:trips->help compaction run
+Keyword:                   compaction run
+Description:               Run Compaction for given instant time
+ Keyword:                  tableName
+   Help:                   Table name
+   Mandatory:              true
+   Default if specified:   '__NULL__'
+   Default if unspecified: '__NULL__'
+
+ Keyword:                  parallelism
+   Help:                   Parallelism for hoodie compaction
+   Mandatory:              true
+   Default if specified:   '__NULL__'
+   Default if unspecified: '__NULL__'
+
+ Keyword:                  schemaFilePath
+   Help:                   Path for Avro schema file
+   Mandatory:              true
+   Default if specified:   '__NULL__'
+   Default if unspecified: '__NULL__'
+
+ Keyword:                  sparkMemory
+   Help:                   Spark executor memory
+   Mandatory:              true
+   Default if specified:   '__NULL__'
+   Default if unspecified: '__NULL__'
+
+ Keyword:                  retry
+   Help:                   Number of retries
+   Mandatory:              true
+   Default if specified:   '__NULL__'
+   Default if unspecified: '__NULL__'
+
+ Keyword:                  compactionInstant
+   Help:                   Base path for the target hoodie table
+   Mandatory:              true
+   Default if specified:   '__NULL__'
+   Default if unspecified: '__NULL__'
+
+* compaction run - Run Compaction for given instant time
+```
+
+### Validate Compaction
+
+Validating a compaction plan : Check if all the files necessary for compactions are present and are valid
+
+```java
+hudi:stock_ticks_mor->compaction validate --instant 20181005222611
+...
+
+   COMPACTION PLAN VALID
+
+    ___________________________________________________________________________________________________________________________________________________________________________________________________________________________
+    | File Id                             | Base Instant Time| Base Data File                                                                                                                   | Num Delta Files| Valid| Error|
+    |==========================================================================================================================================================================================================================|
+    | 05320e98-9a57-4c38-b809-a6beaaeb36bd| 20181005222445   | hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/05320e98-9a57-4c38-b809-a6beaaeb36bd_0_20181005222445.parquet| 1              | true |      |
+
+
+
+hudi:stock_ticks_mor->compaction validate --instant 20181005222601
+
+   COMPACTION PLAN INVALID
+
+    _______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
+    | File Id                             | Base Instant Time| Base Data File                                                                                                                   | Num Delta Files| Valid| Error                                                                           |
+    |=====================================================================================================================================================================================================================================================================================================|
+    | 05320e98-9a57-4c38-b809-a6beaaeb36bd| 20181005222445   | hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/2018/08/31/05320e98-9a57-4c38-b809-a6beaaeb36bd_0_20181005222445.parquet| 1              | false| All log files specified in compaction operation is not present. Missing ....    |
+```
+
+**NOTE:** The following commands must be executed without any other writer/ingestion application running.
+{: .notice--warning}
+
+Sometimes, it becomes necessary to remove a fileId from a compaction-plan inorder to speed-up or unblock compaction
+operation. Any new log-files that happened on this file after the compaction got scheduled will be safely renamed
+so that are preserved. Hudi provides the following CLI to support it
+
+
+### Unscheduling Compaction
+
+```java
+hudi:trips->compaction unscheduleFileId --fileId <FileUUID>
+....
+No File renames needed to unschedule file from pending compaction. Operation successful.
+```
+
+In other cases, an entire compaction plan needs to be reverted. This is supported by the following CLI
+
+```java
+hudi:trips->compaction unschedule --compactionInstant <compactionInstant>
+.....
+No File renames needed to unschedule pending compaction. Operation successful.
+```
+
+### Repair Compaction
+
+The above compaction unscheduling operations could sometimes fail partially (e:g -> DFS temporarily unavailable). With
+partial failures, the compaction operation could become inconsistent with the state of file-slices. When you run
+`compaction validate`, you can notice invalid compaction operations if there is one.  In these cases, the repair
+command comes to the rescue, it will rearrange the file-slices so that there is no loss and the file-slices are
+consistent with the compaction plan
+
+```java
+hudi:stock_ticks_mor->compaction repair --instant 20181005222611
+......
+Compaction successfully repaired
+.....
+```
+
+## Savepoint and Restore 
+As the name suggest, "savepoint" saves the table as of the commit time, so that it lets you restore the table to this 
+savepoint at a later point in time if need be. You can read more about savepoints and restore [here](/docs/next/disaster_recovery)
+
+To trigger savepoint for a hudi table
+```java
+connect --path /tmp/hudi_trips_cow/
+commits show
+set --conf SPARK_HOME=<SPARK_HOME>
+savepoint create --commit 20220128160245447 --sparkMaster local[2]
+```
+
+To restore the table to one of the savepointed commit:
+
+```java
+connect --path /tmp/hudi_trips_cow/
+commits show
+set --conf SPARK_HOME=<SPARK_HOME>
+savepoints show
+╔═══════════════════╗
+║ SavepointTime     ║
+╠═══════════════════╣
+║ 20220128160245447 ║
+╚═══════════════════╝
+savepoint rollback --savepoint 20220128160245447 --sparkMaster local[2]
+```
+
+
diff --git a/website/versioned_docs/version-0.11.1/cloud.md b/website/versioned_docs/version-0.11.1/cloud.md
new file mode 100644
index 0000000000..818baa1e2e
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/cloud.md
@@ -0,0 +1,29 @@
+---
+title: Cloud Storage
+keywords: [hudi, aws, gcp, oss, azure, cloud, juicefs]
+summary: "In this page, we introduce how Hudi work with different Cloud providers."
+toc: true
+last_modified_at: 2021-10-12T10:50:00+08:00
+---
+
+## Talking to Cloud Storage
+
+Immaterial of whether RDD/WriteClient APIs or Datasource is used, the following information helps configure access
+to cloud stores.
+
+* [AWS S3](/docs/s3_hoodie) <br/>
+   Configurations required for S3 and Hudi co-operability.
+* [Google Cloud Storage](/docs/gcs_hoodie) <br/>
+   Configurations required for GCS and Hudi co-operability.
+* [Alibaba Cloud OSS](/docs/oss_hoodie) <br/>
+   Configurations required for OSS and Hudi co-operability.
+* [Microsoft Azure](/docs/azure_hoodie) <br/>
+   Configurations required for Azure and Hudi co-operability.
+* [Tencent Cloud Object Storage](/docs/cos_hoodie) <br/>
+   Configurations required for COS and Hudi co-operability.
+* [IBM Cloud Object Storage](/docs/ibm_cos_hoodie) <br/>
+   Configurations required for IBM Cloud Object Storage and Hudi co-operability.
+* [Baidu Cloud Object Storage](bos_hoodie) <br/>
+   Configurations required for BOS and Hudi co-operability.
+* [JuiceFS](jfs_hoodie) <br/>
+   Configurations required for JuiceFS and Hudi co-operability.
diff --git a/website/versioned_docs/version-0.11.1/clustering.md b/website/versioned_docs/version-0.11.1/clustering.md
new file mode 100644
index 0000000000..9e157de785
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/clustering.md
@@ -0,0 +1,267 @@
+---
+title: Clustering
+summary: "In this page, we describe async compaction in Hudi."
+toc: true
+last_modified_at:
+---
+
+## Background
+
+Apache Hudi brings stream processing to big data, providing fresh data while being an order of magnitude efficient over traditional batch processing. In a data lake/warehouse, one of the key trade-offs is between ingestion speed and query performance. Data ingestion typically prefers small files to improve parallelism and make data available to queries as soon as possible. However, query performance degrades poorly with a lot of small files. Also, during ingestion, data is typically co-l [...]
+<!--truncate-->
+
+## Clustering Architecture
+
+At a high level, Hudi provides different operations such as insert/upsert/bulk_insert through it’s write client API to be able to write data to a Hudi table. To be able to choose a trade-off between file size and ingestion speed, Hudi provides a knob `hoodie.parquet.small.file.limit` to be able to configure the smallest allowable file size. Users are able to configure the small file [soft limit](https://hudi.apache.org/docs/configurations/#hoodieparquetsmallfilelimit) to `0` to force new [...]
+
+
+
+To be able to support an architecture that allows for fast ingestion without compromising query performance, we have introduced a ‘clustering’ service to rewrite the data to optimize Hudi data lake file layout.
+
+Clustering table service can run asynchronously or synchronously adding a new action type called “REPLACE”, that will mark the clustering action in the Hudi metadata timeline.
+
+
+
+### Overall, there are 2 parts to clustering
+
+1.  Scheduling clustering: Create a clustering plan using a pluggable clustering strategy.
+2.  Execute clustering: Process the plan using an execution strategy to create new files and replace old files.
+
+
+### Scheduling clustering
+
+Following steps are followed to schedule clustering.
+
+1.  Identify files that are eligible for clustering: Depending on the clustering strategy chosen, the scheduling logic will identify the files eligible for clustering.
+2.  Group files that are eligible for clustering based on specific criteria. Each group is expected to have data size in multiples of ‘targetFileSize’. Grouping is done as part of ‘strategy’ defined in the plan. Additionally, there is an option to put a cap on group size to improve parallelism and avoid shuffling large amounts of data.
+3.  Finally, the clustering plan is saved to the timeline in an avro [metadata format](https://github.com/apache/hudi/blob/master/hudi-common/src/main/avro/HoodieClusteringPlan.avsc).
+
+
+### Running clustering
+
+1.  Read the clustering plan and get the ‘clusteringGroups’ that mark the file groups that need to be clustered.
+2.  For each group, we instantiate appropriate strategy class with strategyParams (example: sortColumns) and apply that strategy to rewrite the data.
+3.  Create a “REPLACE” commit and update the metadata in [HoodieReplaceCommitMetadata](https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieReplaceCommitMetadata.java).
+
+
+Clustering Service builds on Hudi’s MVCC based design to allow for writers to continue to insert new data while clustering action runs in the background to reformat data layout, ensuring snapshot isolation between concurrent readers and writers.
+
+NOTE: Clustering can only be scheduled for tables / partitions not receiving any concurrent updates. In the future, concurrent updates use-case will be supported as well.
+
+![Clustering example](/assets/images/blog/clustering/example_perf_improvement.png)
+_Figure: Illustrating query performance improvements by clustering_
+
+### Setting up clustering
+Inline clustering can be setup easily using spark dataframe options. See sample below
+
+```scala
+import org.apache.hudi.QuickstartUtils._
+import scala.collection.JavaConversions._
+import org.apache.spark.sql.SaveMode._
+import org.apache.hudi.DataSourceReadOptions._
+import org.apache.hudi.DataSourceWriteOptions._
+import org.apache.hudi.config.HoodieWriteConfig._
+
+
+val df =  //generate data frame
+df.write.format("org.apache.hudi").
+        options(getQuickstartWriteConfigs).
+        option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+        option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+        option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+        option(TABLE_NAME, "tableName").
+        option("hoodie.parquet.small.file.limit", "0").
+        option("hoodie.clustering.inline", "true").
+        option("hoodie.clustering.inline.max.commits", "4").
+        option("hoodie.clustering.plan.strategy.target.file.max.bytes", "1073741824").
+        option("hoodie.clustering.plan.strategy.small.file.limit", "629145600").
+        option("hoodie.clustering.plan.strategy.sort.columns", "column1,column2"). //optional, if sorting is needed as part of rewriting data
+        mode(Append).
+        save("dfs://location");
+```
+
+## Async Clustering - Strategies
+For more advanced usecases, async clustering pipeline can also be setup. See an example [here](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-SetupforAsyncclusteringJob).
+
+On a high level, clustering creates a plan based on a configurable strategy, groups eligible files based on specific
+criteria and then executes the plan. Hudi supports [multi-writers](https://hudi.apache.org/docs/concurrency_control#enabling-multi-writing) which provides
+snapshot isolation between multiple table services, thus allowing writers to continue with ingestion while clustering
+runs in the background.
+
+As mentioned before, clustering plan as well as execution depends on configurable strategy. These strategies can be
+broadly classified into three types: clustering plan strategy, execution strategy and update strategy.
+
+### Plan Strategy
+
+This strategy comes into play while creating clustering plan. It helps to decide what file groups should be clustered.
+Let's look at different plan strategies that are available with Hudi. Note that these strategies are easily pluggable
+using this [config](/docs/configurations#hoodieclusteringplanstrategyclass).
+
+1. `SparkSizeBasedClusteringPlanStrategy`: It selects file slices based on
+   the [small file limit](/docs/configurations/#hoodieclusteringplanstrategysmallfilelimit)
+   of base files and creates clustering groups upto max file size allowed per group. The max size can be specified using
+   this [config](/docs/configurations/#hoodieclusteringplanstrategymaxbytespergroup). This
+   strategy is useful for stitching together medium-sized files into larger ones to reduce lot of files spread across
+   cold partitions.
+2. `SparkRecentDaysClusteringPlanStrategy`: It looks back previous 'N' days partitions and creates a plan that will
+   cluster the 'small' file slices within those partitions. This is the default strategy. It could be useful when the
+   workload is predictable and data is partitioned by time.
+3. `SparkSelectedPartitionsClusteringPlanStrategy`: In case you want to cluster only specific partitions within a range,
+   no matter how old or new are those partitions, then this strategy could be useful. To use this strategy, one needs
+   to set below two configs additionally (both begin and end partitions are inclusive):
+
+```
+hoodie.clustering.plan.strategy.cluster.begin.partition
+hoodie.clustering.plan.strategy.cluster.end.partition
+```
+
+:::note
+All the strategies are partition-aware and the latter two are still bound by the size limits of the first strategy.
+:::
+
+### Execution Strategy
+
+After building the clustering groups in the planning phase, Hudi applies execution strategy, for each group, primarily
+based on sort columns and size. The strategy can be specified using this [config](/docs/configurations/#hoodieclusteringexecutionstrategyclass).
+
+`SparkSortAndSizeExecutionStrategy` is the default strategy. Users can specify the columns to sort the data by, when
+clustering using
+this [config](/docs/configurations/#hoodieclusteringplanstrategysortcolumns). Apart from
+that, we can also set [max file size](/docs/configurations/#hoodieparquetmaxfilesize)
+for the parquet files produced due to clustering. The strategy uses bulk insert to write data into new files, in which
+case, Hudi implicitly uses a partitioner that does sorting based on specified columns. In this way, the strategy changes
+the data layout in a way that not only improves query performance but also balance rewrite overhead automatically.
+
+Now this strategy can be executed either as a single spark job or multiple jobs depending on number of clustering groups
+created in the planning phase. By default, Hudi will submit multiple spark jobs and union the results. In case you want
+to force Hudi to use single spark job, set the execution strategy
+class [config](/docs/configurations/#hoodieclusteringexecutionstrategyclass)
+to `SingleSparkJobExecutionStrategy`.
+
+### Update Strategy
+
+Currently, clustering can only be scheduled for tables/partitions not receiving any concurrent updates. By default,
+the [config for update strategy](/docs/configurations/#hoodieclusteringupdatesstrategy) is
+set to ***SparkRejectUpdateStrategy***. If some file group has updates during clustering then it will reject updates and
+throw an exception. However, in some use-cases updates are very sparse and do not touch most file groups. The default
+strategy to simply reject updates does not seem fair. In such use-cases, users can set the config to ***SparkAllowUpdateStrategy***.
+
+We discussed the critical strategy configurations. All other configurations related to clustering are
+listed [here](/docs/configurations/#Clustering-Configs). Out of this list, a few
+configurations that will be very useful are:
+
+|  Config key  | Remarks | Default |
+|  -----------  | -------  | ------- |
+| `hoodie.clustering.async.enabled` | Enable running of clustering service, asynchronously as writes happen on the table. | False |
+| `hoodie.clustering.async.max.commits` | Control frequency of async clustering by specifying after how many commits clustering should be triggered. | 4 |
+| `hoodie.clustering.preserve.commit.metadata` | When rewriting data, preserves existing _hoodie_commit_time. This means users can run incremental queries on clustered data without any side-effects. | False |
+
+## Asynchronous Clustering
+Users can leverage [HoodieClusteringJob](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-SetupforAsyncclusteringJob)
+to setup 2-step asynchronous clustering.
+
+### HoodieClusteringJob
+By specifying the `scheduleAndExecute` mode both schedule as well as clustering can be achieved in the same step. 
+The appropriate mode can be specified using `-mode` or `-m` option. There are three modes:
+
+1. `schedule`: Make a clustering plan. This gives an instant which can be passed in execute mode.
+2. `execute`: Execute a clustering plan at a particular instant. If no instant-time is specified, HoodieClusteringJob will execute for the earliest instant on the Hudi timeline.
+3. `scheduleAndExecute`: Make a clustering plan first and execute that plan immediately.
+
+Note that to run this job while the original writer is still running, please enable multi-writing:
+```
+hoodie.write.concurrency.mode=optimistic_concurrency_control
+hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+```
+
+A sample spark-submit command to setup HoodieClusteringJob is as below:
+
+```bash
+spark-submit \
+--class org.apache.hudi.utilities.HoodieClusteringJob \
+/path/to/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.9.0-SNAPSHOT.jar \
+--props /path/to/config/clusteringjob.properties \
+--mode scheduleAndExecute \
+--base-path /path/to/hudi_table/basePath \
+--table-name hudi_table_schedule_clustering \
+--spark-memory 1g
+```
+A sample `clusteringjob.properties` file:
+```
+hoodie.clustering.async.enabled=true
+hoodie.clustering.async.max.commits=4
+hoodie.clustering.plan.strategy.target.file.max.bytes=1073741824
+hoodie.clustering.plan.strategy.small.file.limit=629145600
+hoodie.clustering.execution.strategy.class=org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy
+hoodie.clustering.plan.strategy.sort.columns=column1,column2
+```
+
+### HoodieDeltaStreamer
+
+This brings us to our users' favorite utility in Hudi. Now, we can trigger asynchronous clustering with DeltaStreamer.
+Just set the `hoodie.clustering.async.enabled` config to true and specify other clustering config in properties file
+whose location can be pased as `—props` when starting the deltastreamer (just like in the case of HoodieClusteringJob).
+
+A sample spark-submit command to setup HoodieDeltaStreamer is as below:
+
+```bash
+spark-submit \
+--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
+/path/to/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.9.0-SNAPSHOT.jar \
+--props /path/to/config/clustering_kafka.properties \
+--schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
+--source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
+--source-ordering-field impresssiontime \
+--table-type COPY_ON_WRITE \
+--target-base-path /path/to/hudi_table/basePath \
+--target-table impressions_cow_cluster \
+--op INSERT \
+--hoodie-conf hoodie.clustering.async.enabled=true \
+--continuous
+```
+
+### Spark Structured Streaming
+
+We can also enable asynchronous clustering with Spark structured streaming sink as shown below.
+```scala
+val commonOpts = Map(
+   "hoodie.insert.shuffle.parallelism" -> "4",
+   "hoodie.upsert.shuffle.parallelism" -> "4",
+   DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+   DataSourceWriteOptions.PARTITIONPATH_FIELD.key -> "partition",
+   DataSourceWriteOptions.PRECOMBINE_FIELD.key -> "timestamp",
+   HoodieWriteConfig.TBL_NAME.key -> "hoodie_test"
+)
+
+def getAsyncClusteringOpts(isAsyncClustering: String, 
+                           clusteringNumCommit: String, 
+                           executionStrategy: String):Map[String, String] = {
+   commonOpts + (DataSourceWriteOptions.ASYNC_CLUSTERING_ENABLE.key -> isAsyncClustering,
+           HoodieClusteringConfig.ASYNC_CLUSTERING_MAX_COMMITS.key -> clusteringNumCommit,
+           HoodieClusteringConfig.EXECUTION_STRATEGY_CLASS_NAME.key -> executionStrategy
+   )
+}
+
+def initStreamingWriteFuture(hudiOptions: Map[String, String]): Future[Unit] = {
+   val streamingInput = // define the source of streaming
+   Future {
+      println("streaming starting")
+      streamingInput
+              .writeStream
+              .format("org.apache.hudi")
+              .options(hudiOptions)
+              .option("checkpointLocation", basePath + "/checkpoint")
+              .mode(Append)
+              .start()
+              .awaitTermination(10000)
+      println("streaming ends")
+   }
+}
+
+def structuredStreamingWithClustering(): Unit = {
+   val df = //generate data frame
+   val hudiOptions = getClusteringOpts("true", "1", "org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy")
+   val f1 = initStreamingWriteFuture(hudiOptions)
+   Await.result(f1, Duration.Inf)
+}
+```
diff --git a/website/versioned_docs/version-0.11.1/compaction.md b/website/versioned_docs/version-0.11.1/compaction.md
new file mode 100644
index 0000000000..9d73e31bd5
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/compaction.md
@@ -0,0 +1,140 @@
+---
+title: Compaction
+summary: "In this page, we describe async compaction in Hudi."
+toc: true
+last_modified_at:
+---
+
+## Async Compaction
+Compaction is executed asynchronously with Hudi by default. Async Compaction is performed in 2 steps:
+
+1. ***Compaction Scheduling***: This is done by the ingestion job. In this step, Hudi scans the partitions and selects **file
+   slices** to be compacted. A compaction plan is finally written to Hudi timeline.
+1. ***Compaction Execution***: In this step the compaction plan is read and file slices are compacted.
+
+There are few ways by which we can execute compactions asynchronously.
+
+### Spark Structured Streaming
+
+Compactions are scheduled and executed asynchronously inside the
+streaming job.  Async Compactions are enabled by default for structured streaming jobs
+on Merge-On-Read table.
+
+Here is an example snippet in java
+
+```properties
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.HoodieDataSourceHelpers;
+import org.apache.hudi.config.HoodieCompactionConfig;
+import org.apache.hudi.config.HoodieWriteConfig;
+
+import org.apache.spark.sql.streaming.OutputMode;
+import org.apache.spark.sql.streaming.ProcessingTime;
+
+
+ DataStreamWriter<Row> writer = streamingInput.writeStream().format("org.apache.hudi")
+        .option(DataSourceWriteOptions.OPERATION_OPT_KEY(), operationType)
+        .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY(), tableType)
+        .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "_row_key")
+        .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), "partition")
+        .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "timestamp")
+        .option(HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS_PROP, "10")
+        .option(DataSourceWriteOptions.ASYNC_COMPACT_ENABLE_OPT_KEY(), "true")
+        .option(HoodieWriteConfig.TABLE_NAME, tableName).option("checkpointLocation", checkpointLocation)
+        .outputMode(OutputMode.Append());
+ writer.trigger(new ProcessingTime(30000)).start(tablePath);
+```
+
+### DeltaStreamer Continuous Mode
+Hudi DeltaStreamer provides continuous ingestion mode where a single long running spark application  
+ingests data to Hudi table continuously from upstream sources. In this mode, Hudi supports managing asynchronous
+compactions. Here is an example snippet for running in continuous mode with async compactions
+
+```properties
+spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.6.0 \
+--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
+--table-type MERGE_ON_READ \
+--target-base-path <hudi_base_path> \
+--target-table <hudi_table> \
+--source-class org.apache.hudi.utilities.sources.JsonDFSSource \
+--source-ordering-field ts \
+--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
+--props /path/to/source.properties \
+--continous
+```
+
+## Synchronous Compaction
+By default, compaction is run asynchronously.
+
+If latency of ingesting records is important for you, you are most likely using Merge-On-Read tables.
+Merge-On-Read tables store data using a combination of columnar (e.g parquet) + row based (e.g avro) file formats.
+Updates are logged to delta files & later compacted to produce new versions of columnar files. 
+To improve ingestion latency, Async Compaction is the default configuration.
+
+If immediate read performance of a new commit is important for you, or you want simplicity of not managing separate compaction jobs,
+you may want Synchronous compaction, which means that as a commit is written it is also compacted by the same job.
+
+Compaction is run synchronously by passing the flag "--disable-compaction" (Meaning to disable async compaction scheduling).
+When both ingestion and compaction is running in the same spark context, you can use resource allocation configuration 
+in DeltaStreamer CLI such as ("--delta-sync-scheduling-weight",
+"--compact-scheduling-weight", ""--delta-sync-scheduling-minshare", and "--compact-scheduling-minshare")
+to control executor allocation between ingestion and compaction.
+
+
+## Offline Compaction
+
+The compaction of the MERGE_ON_READ table is enabled by default. The trigger strategy is to perform compaction after completing
+five commits. Because compaction consumes a lot of memory and is placed in the same pipeline with the write operation, it's easy to
+interfere with the write operation when there is a large amount of data (> 100000 per second). As this time, it is more stable to execute
+the compaction task by using offline compaction.
+
+:::note
+The execution of a compaction task includes two parts: schedule compaction plan and execute compaction plan. It's recommended that
+the process of schedule compaction plan be triggered periodically by the write task, and the write parameter `compaction.schedule.enable`
+is enabled by default.
+:::
+
+### Hudi Compactor Utility
+Hudi provides a standalone tool to execute specific compactions asynchronously. Below is an example and you can read more in the [deployment guide](/docs/deployment#compactions)
+
+Example:
+```properties
+spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.6.0 \
+--class org.apache.hudi.utilities.HoodieCompactor \
+--base-path <base_path> \
+--table-name <table_name> \
+--schema-file <schema_file> \
+--instant-time <compaction_instant>
+```
+
+Note, the `instant-time` parameter is now optional for the Hudi Compactor Utility. If using the utility without `--instant time`,
+the spark-submit will execute the earliest scheduled compaction on the Hudi timeline.
+
+### Hudi CLI
+Hudi CLI is yet another way to execute specific compactions asynchronously. Here is an example and you can read more in the [deployment guide](/docs/cli#compactions)
+
+Example:
+```properties
+hudi:trips->compaction run --tableName <table_name> --parallelism <parallelism> --compactionInstant <InstantTime>
+...
+```
+
+### Flink Offline Compaction
+Offline compaction needs to submit the Flink task on the command line. The program entry is as follows: `hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar` :
+`org.apache.hudi.sink.compact.HoodieFlinkCompactor`
+
+```bash
+# Command line
+./bin/flink run -c org.apache.hudi.sink.compact.HoodieFlinkCompactor lib/hudi-flink-bundle_2.11-0.9.0.jar --path hdfs://xxx:9000/table
+```
+
+#### Options
+
+|  Option Name  | Required | Default | Remarks |
+|  -----------  | -------  | ------- | ------- |
+| `--path` | `frue` | `--` | The path where the target table is stored on Hudi |
+| `--compaction-max-memory` | `false` | `100` | The index map size of log data during compaction, 100 MB by default. If you have enough memory, you can turn up this parameter |
+| `--schedule` | `false` | `false` | whether to execute the operation of scheduling compaction plan. When the write process is still writing， turning on this parameter have a risk of losing data. Therefore, it must be ensured that there are no write tasks currently writing data to this table when this parameter is turned on |
+| `--seq` | `false` | `LIFO` | The order in which compaction tasks are executed. Executing from the latest compaction plan by default. `LIFO`: executing from the latest plan. `FIFO`: executing from the oldest plan. |
+| `--service` | `false` | `false` | Whether to start a monitoring service that checks and schedules new compaction task in configured interval. |
+| `--min-compaction-interval-seconds` | `false` | `600(s)` | The checking interval for service mode, by default 10 minutes. |
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.11.1/comparison.md b/website/versioned_docs/version-0.11.1/comparison.md
new file mode 100644
index 0000000000..681b359a4d
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/comparison.md
@@ -0,0 +1,56 @@
+---
+title: "Comparison"
+keywords: [ apache, hudi, kafka, kudu, hive, hbase, stream processing]
+last_modified_at: 2019-12-30T15:59:57-04:00
+---
+
+Apache Hudi fills a big void for processing data on top of DFS, and thus mostly co-exists nicely with these technologies. However,
+it would be useful to understand how Hudi fits into the current big data ecosystem, contrasting it with a few related systems
+and bring out the different tradeoffs these systems have accepted in their design.
+
+## Kudu
+
+[Apache Kudu](https://kudu.apache.org) is a storage system that has similar goals as Hudi, which is to bring real-time analytics on petabytes of data via first
+class support for `upserts`. A key differentiator is that Kudu also attempts to serve as a datastore for OLTP workloads, something that Hudi does not aspire to be.
+Consequently, Kudu does not support incremental pulling (as of early 2017), something Hudi does to enable incremental processing use cases.
+
+
+Kudu diverges from a distributed file system abstraction and HDFS altogether, with its own set of storage servers talking to each  other via RAFT.
+Hudi, on the other hand, is designed to work with an underlying Hadoop compatible filesystem (HDFS,S3 or Ceph) and does not have its own fleet of storage servers,
+instead relying on Apache Spark to do the heavy-lifting. Thus, Hudi can be scaled easily, just like other Spark jobs, while Kudu would require hardware
+& operational support, typical to datastores like HBase or Vertica. We have not at this point, done any head to head benchmarks against Kudu (given RTTable is WIP).
+But, if we were to go with results shared by [CERN](https://db-blog.web.cern.ch/blog/zbigniew-baranowski/2017-01-performance-comparison-different-file-formats-and-storage-engines) ,
+we expect Hudi to positioned at something that ingests parquet with superior performance.
+
+
+## Hive Transactions
+
+[Hive Transactions/ACID](https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions) is another similar effort, which tries to implement storage like
+`merge-on-read`, on top of ORC file format. Understandably, this feature is heavily tied to Hive and other efforts like [LLAP](https://cwiki.apache.org/confluence/display/Hive/LLAP).
+Hive transactions does not offer the read-optimized storage option or the incremental pulling, that Hudi does. In terms of implementation choices, Hudi leverages
+the full power of a processing framework like Spark, while Hive transactions feature is implemented underneath by Hive tasks/queries kicked off by user or the Hive metastore.
+Based on our production experience, embedding Hudi as a library into existing Spark pipelines was much easier and less operationally heavy, compared with the other approach.
+Hudi is also designed to work with non-hive engines like PrestoDB/Spark and will incorporate file formats other than parquet over time.
+
+## HBase
+
+Even though [HBase](https://hbase.apache.org) is ultimately a key-value store for OLTP workloads, users often tend to associate HBase with analytics given the proximity to Hadoop.
+Given HBase is heavily write-optimized, it supports sub-second upserts out-of-box and Hive-on-HBase lets users query that data. However, in terms of actual performance for analytical workloads,
+hybrid columnar storage formats like Parquet/ORC handily beat HBase, since these workloads are predominantly read-heavy. Hudi bridges this gap between faster data and having
+analytical storage formats. From an operational perspective, arming users with a library that provides faster data, is more scalable, than managing a big farm of HBase region servers,
+just for analytics. Finally, HBase does not support incremental processing primitives like `commit times`, `incremental pull` as first class citizens like Hudi.
+
+## Stream Processing
+
+A popular question, we get is : "How does Hudi relate to stream processing systems?", which we will try to answer here. Simply put, Hudi can integrate with
+batch (`copy-on-write table`) and streaming (`merge-on-read table`) jobs of today, to store the computed results in Hadoop. For Spark apps, this can happen via direct
+integration of Hudi library with Spark/Spark streaming DAGs. In case of Non-Spark processing systems (eg: Flink, Hive), the processing can be done in the respective systems
+and later sent into a Hudi table via a Kafka topic/DFS intermediate file. In more conceptual level, data processing
+pipelines just consist of three components : `source`, `processing`, `sink`, with users ultimately running queries against the sink to use the results of the pipeline.
+Hudi can act as either a source or sink, that stores data on DFS. Applicability of Hudi to a given stream processing pipeline ultimately boils down to suitability
+of PrestoDB/SparkSQL/Hive for your queries.
+
+More advanced use cases revolve around the concepts of [incremental processing](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop), which effectively
+uses Hudi even inside the `processing` engine to speed up typical batch pipelines. For e.g: Hudi can be used as a state store inside a processing DAG (similar
+to how [rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends#the-rocksdbstatebackend) is used by Flink). This is an item on the roadmap
+and will eventually happen as a [Beam Runner](https://issues.apache.org/jira/browse/HUDI-60)
diff --git a/website/versioned_docs/version-0.11.1/concepts.md b/website/versioned_docs/version-0.11.1/concepts.md
new file mode 100644
index 0000000000..277484df63
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/concepts.md
@@ -0,0 +1,172 @@
+---
+version: 0.6.0
+title: "Concepts"
+keywords: [ hudi, design, table, queries, timeline]
+summary: "Here we introduce some basic concepts & give a broad technical overview of Hudi"
+toc: true
+last_modified_at: 2019-12-30T15:59:57-04:00
+---
+
+Apache Hudi (pronounced “Hudi”) provides the following streaming primitives over hadoop compatible storages
+
+ * Update/Delete Records      (how do I change records in a table?)
+ * Change Streams             (how do I fetch records that changed?)
+
+In this section, we will discuss key concepts & terminologies that are important to understand, to be able to effectively use these primitives.
+
+## Timeline
+At its core, Hudi maintains a `timeline` of all actions performed on the table at different `instants` of time that helps provide instantaneous views of the table,
+while also efficiently supporting retrieval of data in the order of arrival. A Hudi instant consists of the following components 
+
+ * `Instant action` : Type of action performed on the table
+ * `Instant time` : Instant time is typically a timestamp (e.g: 20190117010349), which monotonically increases in the order of action's begin time.
+ * `state` : current state of the instant
+ 
+Hudi guarantees that the actions performed on the timeline are atomic & timeline consistent based on the instant time.
+
+Key actions performed include
+
+ * `COMMITS` - A commit denotes an **atomic write** of a batch of records into a table.
+ * `CLEANS` - Background activity that gets rid of older versions of files in the table, that are no longer needed.
+ * `DELTA_COMMIT` - A delta commit refers to an **atomic write** of a batch of records into a  MergeOnRead type table, where some/all of the data could be just written to delta logs.
+ * `COMPACTION` - Background activity to reconcile differential data structures within Hudi e.g: moving updates from row based log files to columnar formats. Internally, compaction manifests as a special commit on the timeline
+ * `ROLLBACK` - Indicates that a commit/delta commit was unsuccessful & rolled back, removing any partial files produced during such a write
+ * `SAVEPOINT` - Marks certain file groups as "saved", such that cleaner will not delete them. It helps restore the table to a point on the timeline, in case of disaster/data recovery scenarios.
+
+Any given instant can be 
+in one of the following states
+
+ * `REQUESTED` - Denotes an action has been scheduled, but has not initiated
+ * `INFLIGHT` - Denotes that the action is currently being performed
+ * `COMPLETED` - Denotes completion of an action on the timeline
+
+<figure>
+    <img className="docimage" src="/assets/images/hudi_timeline.png" alt="hudi_timeline.png" />
+</figure>
+
+Example above shows upserts happenings between 10:00 and 10:20 on a Hudi table, roughly every 5 mins, leaving commit metadata on the Hudi timeline, along
+with other background cleaning/compactions. One key observation to make is that the commit time indicates the `arrival time` of the data (10:20AM), while the actual data
+organization reflects the actual time or `event time`, the data was intended for (hourly buckets from 07:00). These are two key concepts when reasoning about tradeoffs between latency and completeness of data.
+
+When there is late arriving data (data intended for 9:00 arriving >1 hr late at 10:20), we can see the upsert producing new data into even older time buckets/folders.
+With the help of the timeline, an incremental query attempting to get all new data that was committed successfully since 10:00 hours, is able to very efficiently consume
+only the changed files without say scanning all the time buckets > 07:00.
+
+## File management
+Hudi organizes a table into a directory structure under a `basepath` on DFS. Table is broken up into partitions, which are folders containing data files for that partition,
+very similar to Hive tables. Each partition is uniquely identified by its `partitionpath`, which is relative to the basepath.
+
+Within each partition, files are organized into `file groups`, uniquely identified by a `file id`. Each file group contains several
+`file slices`, where each slice contains a base file (`*.parquet`) produced at a certain commit/compaction instant time,
+ along with set of log files (`*.log.*`) that contain inserts/updates to the base file since the base file was produced. 
+Hudi adopts a MVCC design, where compaction action merges logs and base files to produce new file slices and cleaning action gets rid of 
+unused/older file slices to reclaim space on DFS. 
+
+## Index
+Hudi provides efficient upserts, by mapping a given hoodie key (record key + partition path) consistently to a file id, via an indexing mechanism. 
+This mapping between record key and file group/file id, never changes once the first version of a record has been written to a file. In short, the 
+mapped file group contains all versions of a group of records.
+
+## Table Types & Queries
+Hudi table types define how data is indexed & laid out on the DFS and how the above primitives and timeline activities are implemented on top of such organization (i.e how data is written). 
+In turn, `query types` define how the underlying data is exposed to the queries (i.e how data is read). 
+
+| Table Type    | Supported Query types |
+|-------------- |------------------|
+| Copy On Write | Snapshot Queries + Incremental Queries  |
+| Merge On Read | Snapshot Queries + Incremental Queries + Read Optimized Queries |
+
+### Table Types
+Hudi supports the following table types.
+
+  - [Copy On Write](#copy-on-write-table) : Stores data using exclusively columnar file formats (e.g parquet). Updates simply version & rewrite the files by performing a synchronous merge during write.
+  - [Merge On Read](#merge-on-read-table) : Stores data using a combination of columnar (e.g parquet) + row based (e.g avro) file formats. Updates are logged to delta files & later compacted to produce new versions of columnar files synchronously or asynchronously.
+    
+Following table summarizes the trade-offs between these two table types
+
+| Trade-off     | CopyOnWrite      | MergeOnRead |
+|-------------- |------------------| ------------------|
+| Data Latency | Higher   | Lower |
+| Update cost (I/O) | Higher (rewrite entire parquet) | Lower (append to delta log) |
+| Parquet File Size | Smaller (high update(I/0) cost) | Larger (low update cost) |
+| Write Amplification | Higher | Lower (depending on compaction strategy) |
+
+
+### Query types
+Hudi supports the following query types
+
+ - **Snapshot Queries** : Queries see the latest snapshot of the table as of a given commit or compaction action. In case of merge on read table, it exposes near-real time data(few mins) by merging 
+    the base and delta files of the latest file slice on-the-fly. For copy on write table,  it provides a drop-in replacement for existing parquet tables, while providing upsert/delete and other write side features. 
+ - **Incremental Queries** : Queries only see new data written to the table, since a given commit/compaction. This effectively provides change streams to enable incremental data pipelines. 
+ - **Read Optimized Queries** : Queries see the latest snapshot of table as of a given commit/compaction action. Exposes only the base/columnar files in latest file slices and guarantees the 
+    same columnar query performance compared to a non-hudi columnar table.
+
+Following table summarizes the trade-offs between the different query types.
+
+| Trade-off     | Snapshot    | Read Optimized |
+|-------------- |-------------| ------------------|
+| Data Latency  | Lower | Higher
+| Query Latency | Higher (merge base / columnar file + row based delta / log files) | Lower (raw base / columnar file performance)
+
+
+## Copy On Write Table
+
+File slices in Copy-On-Write table only contain the base/columnar file and each commit produces new versions of base files. 
+In other words, we implicitly compact on every commit, such that only columnar data exists. As a result, the write amplification 
+(number of bytes written for 1 byte of incoming data) is much higher, where read amplification is zero. 
+This is a much desired property for analytical workloads, which is predominantly read-heavy.
+
+Following illustrates how this works conceptually, when data written into copy-on-write table  and two queries running on top of it.
+
+
+<figure>
+    <img className="docimage" src="/assets/images/hudi_cow.png" alt="hudi_cow.png" />
+</figure>
+
+
+As data gets written, updates to existing file groups produce a new slice for that file group stamped with the commit instant time, 
+while inserts allocate a new file group and write its first slice for that file group. These file slices and their commit instant times are color coded above.
+SQL queries running against such a table (eg: `select count(*)` counting the total records in that partition), first checks the timeline for the latest commit
+and filters all but latest file slices of each file group. As you can see, an old query does not see the current inflight commit's files color coded in pink,
+but a new query starting after the commit picks up the new data. Thus queries are immune to any write failures/partial writes and only run on committed data.
+
+The intention of copy on write table, is to fundamentally improve how tables are managed today through
+
+  - First class support for atomically updating data at file-level, instead of rewriting whole tables/partitions
+  - Ability to incremental consume changes, as opposed to wasteful scans or fumbling with heuristics
+  - Tight control of file sizes to keep query performance excellent (small files hurt query performance considerably).
+
+
+## Merge On Read Table
+
+Merge on read table is a superset of copy on write, in the sense it still supports read optimized queries of the table by exposing only the base/columnar files in latest file slices.
+Additionally, it stores incoming upserts for each file group, onto a row based delta log, to support snapshot queries by applying the delta log, 
+onto the latest version of each file id on-the-fly during query time. Thus, this table type attempts to balance read and write amplification intelligently, to provide near real-time data.
+The most significant change here, would be to the compactor, which now carefully chooses which delta log files need to be compacted onto
+their columnar base file, to keep the query performance in check (larger delta log files would incur longer merge times with merge data on query side)
+
+Following illustrates how the table works, and shows two types of queries - snapshot query and read optimized query.
+
+<figure>
+    <img className="docimage" src="/assets/images/hudi_mor.png" alt="hudi_mor.png"  />
+</figure>
+
+There are lot of interesting things happening in this example, which bring out the subtleties in the approach.
+
+ - We now have commits every 1 minute or so, something we could not do in the other table type.
+ - Within each file id group, now there is an delta log file, which holds incoming updates to records in the base columnar files. In the example, the delta log files hold
+ all the data from 10:05 to 10:10. The base columnar files are still versioned with the commit, as before.
+ Thus, if one were to simply look at base files alone, then the table layout looks exactly like a copy on write table.
+ - A periodic compaction process reconciles these changes from the delta log and produces a new version of base file, just like what happened at 10:05 in the example.
+ - There are two ways of querying the same underlying table: Read Optimized query and Snapshot query, depending on whether we chose query performance or freshness of data.
+ - The semantics around when data from a commit is available to a query changes in a subtle way for a read optimized query. Note, that such a query
+ running at 10:10, wont see data after 10:05 above, while a snapshot query always sees the freshest data.
+ - When we trigger compaction & what it decides to compact hold all the key to solving these hard problems. By implementing a compacting
+ strategy, where we aggressively compact the latest partitions compared to older partitions, we could ensure the read optimized queries see data
+ published within X minutes in a consistent fashion.
+
+The intention of merge on read table is to enable near real-time processing directly on top of DFS, as opposed to copying
+data out to specialized systems, which may not be able to handle the data volume. There are also a few secondary side benefits to 
+this table such as reduced write amplification by avoiding synchronous merge of data, i.e, the amount of data written per 1 bytes of data in a batch
+
+
diff --git a/website/versioned_docs/version-0.11.1/concurrency_control.md b/website/versioned_docs/version-0.11.1/concurrency_control.md
new file mode 100644
index 0000000000..e71cb4a8f2
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/concurrency_control.md
@@ -0,0 +1,167 @@
+---
+title: "Concurrency Control"
+summary: In this page, we will discuss how to perform concurrent writes to Hudi Tables.
+toc: true
+last_modified_at: 2021-03-19T15:59:57-04:00
+---
+
+In this section, we will cover Hudi's concurrency model and describe ways to ingest data into a Hudi Table from multiple writers; using the [DeltaStreamer](#deltastreamer) tool as well as 
+using the [Hudi datasource](#datasource-writer).
+
+## Supported Concurrency Controls
+
+- **MVCC** : Hudi table services such as compaction, cleaning, clustering leverage Multi Version Concurrency Control to provide snapshot isolation
+between multiple table service writers and readers. Additionally, using MVCC, Hudi provides snapshot isolation between an ingestion writer and multiple concurrent readers. 
+  With this model, Hudi supports running any number of table service jobs concurrently, without any concurrency conflict. 
+  This is made possible by ensuring that scheduling plans of such table services always happens in a single writer mode to ensure no conflict and avoids race conditions.
+
+- **[NEW] OPTIMISTIC CONCURRENCY** : Write operations such as the ones described above (UPSERT, INSERT) etc, leverage optimistic concurrency control to enable multiple ingestion writers to
+the same Hudi Table. Hudi supports `file level OCC`, i.e., for any 2 commits (or writers) happening to the same table, if they do not have writes to overlapping files being changed, both writers are allowed to succeed. 
+  This feature is currently *experimental* and requires either Zookeeper or HiveMetastore to acquire locks.
+
+It may be helpful to understand the different guarantees provided by [write operations](/docs/write_operations/) via Hudi datasource or the delta streamer.
+
+## Single Writer Guarantees
+
+ - *UPSERT Guarantee*: The target table will NEVER show duplicates.
+ - *INSERT Guarantee*: The target table wilL NEVER have duplicates if [dedup](/docs/configurations#hoodiedatasourcewriteinsertdropduplicates) is enabled.
+ - *BULK_INSERT Guarantee*: The target table will NEVER have duplicates if [dedup](/docs/configurations#hoodiedatasourcewriteinsertdropduplicates) is enabled.
+ - *INCREMENTAL PULL Guarantee*: Data consumption and checkpoints are NEVER out of order.
+
+## Multi Writer Guarantees
+
+With multiple writers using OCC, some of the above guarantees change as follows
+
+- *UPSERT Guarantee*: The target table will NEVER show duplicates.
+- *INSERT Guarantee*: The target table MIGHT have duplicates even if [dedup](/docs/configurations#hoodiedatasourcewriteinsertdropduplicates) is enabled.
+- *BULK_INSERT Guarantee*: The target table MIGHT have duplicates even if [dedup](/docs/configurations#hoodiedatasourcewriteinsertdropduplicates) is enabled.
+- *INCREMENTAL PULL Guarantee*: Data consumption and checkpoints MIGHT be out of order due to multiple writer jobs finishing at different times.
+
+## Enabling Multi Writing
+
+The following properties are needed to be set properly to turn on optimistic concurrency control.
+
+```
+hoodie.write.concurrency.mode=optimistic_concurrency_control
+hoodie.cleaner.policy.failed.writes=LAZY
+hoodie.write.lock.provider=<lock-provider-classname>
+```
+
+There are 3 different server based lock providers that require different configuration to be set.
+
+**`Zookeeper`** based lock provider
+
+```
+hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+hoodie.write.lock.zookeeper.url
+hoodie.write.lock.zookeeper.port
+hoodie.write.lock.zookeeper.lock_key
+hoodie.write.lock.zookeeper.base_path
+```
+
+**`HiveMetastore`** based lock provider
+
+```
+hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider
+hoodie.write.lock.hivemetastore.database
+hoodie.write.lock.hivemetastore.table
+```
+
+`The HiveMetastore URI's are picked up from the hadoop configuration file loaded during runtime.`
+
+**`Amazon DynamoDB`** based lock provider
+
+Amazon DynamoDB based lock provides a simple way to support multi writing across different clusters
+
+```
+hoodie.write.lock.provider=org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
+hoodie.write.lock.dynamodb.table
+hoodie.write.lock.dynamodb.partition_key
+hoodie.write.lock.dynamodb.region
+```
+Also, to set up the credentials for accessing AWS resources, customers can pass the following props to Hudi jobs:
+```
+hoodie.aws.access.key
+hoodie.aws.secret.key
+hoodie.aws.session.token
+```
+If not configured, Hudi falls back to use [DefaultAWSCredentialsProviderChain](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html).
+
+## Datasource Writer
+
+The `hudi-spark` module offers the DataSource API to write (and read) a Spark DataFrame into a Hudi table.
+
+Following is an example of how to use optimistic_concurrency_control via spark datasource
+
+```java
+inputDF.write.format("hudi")
+       .options(getQuickstartWriteConfigs)
+       .option(PRECOMBINE_FIELD_OPT_KEY, "ts")
+       .option("hoodie.cleaner.policy.failed.writes", "LAZY")
+       .option("hoodie.write.concurrency.mode", "optimistic_concurrency_control")
+       .option("hoodie.write.lock.zookeeper.url", "zookeeper")
+       .option("hoodie.write.lock.zookeeper.port", "2181")
+       .option("hoodie.write.lock.zookeeper.lock_key", "test_table")
+       .option("hoodie.write.lock.zookeeper.base_path", "/test")
+       .option(RECORDKEY_FIELD_OPT_KEY, "uuid")
+       .option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath")
+       .option(TABLE_NAME, tableName)
+       .mode(Overwrite)
+       .save(basePath)
+```
+
+## DeltaStreamer
+
+The `HoodieDeltaStreamer` utility (part of hudi-utilities-bundle) provides ways to ingest from different sources such as DFS or Kafka, with the following capabilities.
+
+Using optimistic_concurrency_control via delta streamer requires adding the above configs to the properties file that can be passed to the
+job. For example below, adding the configs to kafka-source.properties file and passing them to deltastreamer will enable optimistic concurrency.
+A deltastreamer job can then be triggered as follows:
+
+```java
+[hoodie]$ spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
+  --props file://${PWD}/hudi-utilities/src/test/resources/delta-streamer-config/kafka-source.properties \
+  --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
+  --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
+  --source-ordering-field impresssiontime \
+  --target-base-path file:\/\/\/tmp/hudi-deltastreamer-op \ 
+  --target-table uber.impressions \
+  --op BULK_INSERT
+```
+
+## Best Practices when using Optimistic Concurrency Control
+
+Concurrent Writing to Hudi tables requires acquiring a lock with either Zookeeper or HiveMetastore. Due to several reasons you might want to configure retries to allow your application to acquire the lock. 
+1. Network connectivity or excessive load on servers increasing time for lock acquisition resulting in timeouts
+2. Running a large number of concurrent jobs that are writing to the same hudi table can result in contention during lock acquisition can cause timeouts
+3. In some scenarios of conflict resolution, Hudi commit operations might take upto 10's of seconds while the lock is being held. This can result in timeouts for other jobs waiting to acquire a lock.
+
+Set the correct native lock provider client retries. NOTE that sometimes these settings are set on the server once and all clients inherit the same configs. Please check your settings before enabling optimistic concurrency.
+   
+```
+hoodie.write.lock.wait_time_ms
+hoodie.write.lock.num_retries
+```
+
+Set the correct hudi client retries for Zookeeper & HiveMetastore. This is useful in cases when native client retry settings cannot be changed. Please note that these retries will happen in addition to any native client retries that you may have set. 
+
+```
+hoodie.write.lock.client.wait_time_ms
+hoodie.write.lock.client.num_retries
+```
+
+*Setting the right values for these depends on a case by case basis; some defaults have been provided for general cases.*
+
+## Disabling Multi Writing
+
+Remove the following settings that were used to enable multi-writer or override with default values.
+
+```
+hoodie.write.concurrency.mode=single_writer
+hoodie.cleaner.policy.failed.writes=EAGER
+```
+
+## Caveats
+
+If you are using the `WriteClient` API, please note that multiple writes to the table need to be initiated from 2 different instances of the write client. 
+It is NOT recommended to use the same instance of the write client to perform multi writing. 
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.11.1/configurations.md b/website/versioned_docs/version-0.11.1/configurations.md
new file mode 100644
index 0000000000..2be4adef60
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/configurations.md
@@ -0,0 +1,4271 @@
+---
+title: All Configurations
+keywords: [ configurations, default, flink options, spark, configs, parameters ] 
+permalink: /docs/configurations.html
+summary: This page covers the different ways of configuring your job to write/read Hudi tables. At a high level, you can control behaviour at few levels.
+toc: true
+last_modified_at: 2022-04-30T18:29:54.348
+---
+
+This page covers the different ways of configuring your job to write/read Hudi tables. At a high level, you can control behaviour at few levels.
+
+- [**Spark Datasource Configs**](#SPARK_DATASOURCE): These configs control the Hudi Spark Datasource, providing ability to define keys/partitioning, pick out the write operation, specify how to merge records or choosing query type to read.
+- [**Flink Sql Configs**](#FLINK_SQL): These configs control the Hudi Flink SQL source/sink connectors, providing ability to define record keys, pick out the write operation, specify how to merge records, enable/disable asynchronous compaction or choosing query type to read.
+- [**Write Client Configs**](#WRITE_CLIENT): Internally, the Hudi datasource uses a RDD based HoodieWriteClient API to actually perform writes to storage. These configs provide deep control over lower level aspects like file sizing, compression, parallelism, compaction, write schema, cleaning etc. Although Hudi provides sane defaults, from time-time these configs may need to be tweaked to optimize for specific workloads.
+- [**Metrics Configs**](#METRICS): These set of configs are used to enable monitoring and reporting of keyHudi stats and metrics.
+- [**Record Payload Config**](#RECORD_PAYLOAD): This is the lowest level of customization offered by Hudi. Record payloads define how to produce new values to upsert based on incoming new record and stored old record. Hudi provides default implementations such as OverwriteWithLatestAvroPayload which simply update table with the latest/last-written record. This can be overridden to a custom class extending HoodieRecordPayload class, on both datasource and WriteClient levels.
+- [**Kafka Connect Configs**](#KAFKA_CONNECT): These set of configs are used for Kafka Connect Sink Connector for writing Hudi Tables
+- [**Amazon Web Services Configs**](#AWS): Please fill in the description for Config Group Name: Amazon Web Services Configs
+
+## Spark Datasource Configs {#SPARK_DATASOURCE}
+These configs control the Hudi Spark Datasource, providing ability to define keys/partitioning, pick out the write operation, specify how to merge records or choosing query type to read.
+
+### Read Options {#Read-Options}
+
+Options useful for reading tables via `read.format.option(...)`
+
+
+`Config Class`: org.apache.hudi.DataSourceOptions.scala<br></br>
+> #### hoodie.file.index.enable
+> Enables use of the spark file index implementation for Hudi, that speeds up listing of large tables.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ENABLE_HOODIE_FILE_INDEX`<br></br>
+> `Deprecated Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.datasource.read.paths
+> Comma separated list of file paths to read within a Hudi table.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: READ_PATHS`<br></br>
+
+---
+
+> #### hoodie.datasource.read.incr.filters
+> For use-cases like DeltaStreamer which reads from Hoodie Incremental table and applies opaque map functions, filters appearing late in the sequence of transformations cannot be automatically pushed down. This option allows setting filters directly on Hoodie Source.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: PUSH_DOWN_INCR_FILTERS`<br></br>
+
+---
+
+> #### hoodie.enable.data.skipping
+> Enables data-skipping allowing queries to leverage indexes to reduce the search space by skipping over files<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ENABLE_DATA_SKIPPING`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### as.of.instant
+> The query instant for time travel. Without specified this option, we query the latest snapshot.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TIME_TRAVEL_AS_OF_INSTANT`<br></br>
+
+---
+
+> #### hoodie.datasource.read.schema.use.end.instanttime
+> Uses end instant schema when incrementally fetched data to. Default: users latest instant schema.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: INCREMENTAL_READ_SCHEMA_USE_END_INSTANTTIME`<br></br>
+
+---
+
+> #### hoodie.datasource.read.incr.path.glob
+> For the use-cases like users only want to incremental pull from certain partitions instead of the full table. This option allows using glob pattern to directly filter on path.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: INCR_PATH_GLOB`<br></br>
+
+---
+
+> #### hoodie.datasource.read.end.instanttime
+> Instant time to limit incrementally fetched data to. New data written with an instant_time <= END_INSTANTTIME are fetched out.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: END_INSTANTTIME`<br></br>
+
+---
+
+> #### hoodie.datasource.write.precombine.field
+> Field used in preCombining before actual write. When two records have the same key value, we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..)<br></br>
+> **Default Value**: ts (Optional)<br></br>
+> `Config Param: READ_PRE_COMBINE_FIELD`<br></br>
+
+---
+
+> #### hoodie.datasource.merge.type
+> For Snapshot query on merge on read table, control whether we invoke the record payload implementation to merge (payload_combine) or skip merging altogetherskip_merge<br></br>
+> **Default Value**: payload_combine (Optional)<br></br>
+> `Config Param: REALTIME_MERGE`<br></br>
+
+---
+
+> #### hoodie.datasource.read.extract.partition.values.from.path
+> When set to true, values for partition columns (partition values) will be extracted from physical partition path (default Spark behavior). When set to false partition values will be read from the data file (in Hudi partition columns are persisted by default). This config is a fallback allowing to preserve existing behavior, and should not be used otherwise.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: EXTRACT_PARTITION_VALUES_FROM_PARTITION_PATH`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.datasource.read.begin.instanttime
+> Instant time to start incrementally pulling data from. The instanttime here need not necessarily correspond to an instant on the timeline. New data written with an instant_time > BEGIN_INSTANTTIME are fetched out. For e.g: ‘20170901080000’ will get all new data written after Sep 1, 2017 08:00AM.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BEGIN_INSTANTTIME`<br></br>
+
+---
+
+> #### hoodie.datasource.read.incr.fallback.fulltablescan.enable
+> When doing an incremental query whether we should fall back to full table scans if file does not exist.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: INCREMENTAL_FALLBACK_TO_FULL_TABLE_SCAN_FOR_NON_EXISTING_FILES`<br></br>
+
+---
+
+> #### hoodie.datasource.query.type
+> Whether data needs to be read, in incremental mode (new data since an instantTime) (or) Read Optimized mode (obtain latest view, based on base files) (or) Snapshot mode (obtain latest view, by merging base and (if any) log files)<br></br>
+> **Default Value**: snapshot (Optional)<br></br>
+> `Config Param: QUERY_TYPE`<br></br>
+
+---
+
+### Write Options {#Write-Options}
+
+You can pass down any of the WriteClient level configs directly using `options()` or `option(k,v)` methods.
+
+```java
+inputDF.write()
+.format("org.apache.hudi")
+.options(clientOpts) // any of the Hudi client opts can be passed in as well
+.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "_row_key")
+.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), "partition")
+.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "timestamp")
+.option(HoodieWriteConfig.TABLE_NAME, tableName)
+.mode(SaveMode.Append)
+.save(basePath);
+```
+
+Options useful for writing tables via `write.format.option(...)`
+
+
+`Config Class`: org.apache.hudi.DataSourceOptions.scala<br></br>
+> #### hoodie.clustering.async.enabled
+> Enable running of clustering service, asynchronously as inserts happen on the table.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ASYNC_CLUSTERING_ENABLE`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.datasource.write.operation
+> Whether to do upsert, insert or bulkinsert for the write operation. Use bulkinsert to load new data into a table, and there on use upsert/insert. bulk insert uses a disk based write path to scale to load large inputs without need to cache it.<br></br>
+> **Default Value**: upsert (Optional)<br></br>
+> `Config Param: OPERATION`<br></br>
+
+---
+
+> #### hoodie.datasource.write.reconcile.schema
+> When a new batch of write has records with old schema, but latest table schema got evolved, this config will upgrade the records to leverage latest table schema(default values will be injected to missing fields). If not, the write batch would fail.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: RECONCILE_SCHEMA`<br></br>
+
+---
+
+> #### hoodie.datasource.write.recordkey.field
+> Record key field. Value to be used as the `recordKey` component of `HoodieKey`.
+Actual value will be obtained by invoking .toString() on the field value. Nested fields can be specified using
+the dot notation eg: `a.b.c`<br></br>
+> **Default Value**: uuid (Optional)<br></br>
+> `Config Param: RECORDKEY_FIELD`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.skip_ro_suffix
+> Skip the _ro suffix for Read optimized table, when registering<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SKIP_RO_SUFFIX_FOR_READ_OPTIMIZED_TABLE`<br></br>
+
+---
+
+> #### hoodie.datasource.write.partitionpath.urlencode
+> Should we url encode the partition path value, before creating the folder structure.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: URL_ENCODE_PARTITIONING`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.partition_extractor_class
+> Class which implements PartitionValueExtractor to extract the partition values, default 'SlashEncodedDayPartitionValueExtractor'.<br></br>
+> **Default Value**: org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor (Optional)<br></br>
+> `Config Param: HIVE_PARTITION_EXTRACTOR_CLASS`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.serde_properties
+> Serde properties to hive table.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HIVE_TABLE_SERDE_PROPERTIES`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.sync_comment
+> Whether to sync the table column comments while syncing the table.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SYNC_COMMENT`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.password
+> hive password to use<br></br>
+> **Default Value**: hive (Optional)<br></br>
+> `Config Param: HIVE_PASS`<br></br>
+
+---
+
+> #### hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled
+> When set to true, consistent value will be generated for a logical timestamp type column, like timestamp-millis and timestamp-micros, irrespective of whether row-writer is enabled. Disabled by default so as not to break the pipeline that deploy either fully row-writer path or non row-writer path. For example, if it is kept disabled then record key of timestamp type with value `2016-12-29 09:54:00` will be written as timestamp `2016-12-29 09:54:00.0` in row-writer path, while it will be [...]
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.support_timestamp
+> ‘INT64’ with original type TIMESTAMP_MICROS is converted to hive ‘timestamp’ type. Disabled by default for backward compatibility.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SUPPORT_TIMESTAMP_TYPE`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.create_managed_table
+> Whether to sync the table as managed table.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_CREATE_MANAGED_TABLE`<br></br>
+
+---
+
+> #### hoodie.clustering.inline
+> Turn on inline clustering - clustering will be run after each write operation is complete<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: INLINE_CLUSTERING_ENABLE`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.datasource.compaction.async.enable
+> Controls whether async compaction should be turned on for MOR table writing.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ASYNC_COMPACT_ENABLE`<br></br>
+
+---
+
+> #### hoodie.datasource.meta.sync.enable
+> Enable Syncing the Hudi Table with an external meta store or data catalog.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: META_SYNC_ENABLED`<br></br>
+
+---
+
+> #### hoodie.datasource.write.streaming.ignore.failed.batch
+> Config to indicate whether to ignore any non exception error (e.g. writestatus error) within a streaming microbatch<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: STREAMING_IGNORE_FAILED_BATCH`<br></br>
+
+---
+
+> #### hoodie.datasource.write.precombine.field
+> Field used in preCombining before actual write. When two records have the same key value, we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..)<br></br>
+> **Default Value**: ts (Optional)<br></br>
+> `Config Param: PRECOMBINE_FIELD`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.username
+> hive user name to use<br></br>
+> **Default Value**: hive (Optional)<br></br>
+> `Config Param: HIVE_USER`<br></br>
+
+---
+
+> #### hoodie.datasource.write.partitionpath.field
+> Partition path field. Value to be used at the partitionPath component of HoodieKey. Actual value ontained by invoking .toString()<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITIONPATH_FIELD`<br></br>
+
+---
+
+> #### hoodie.datasource.write.streaming.retry.count
+> Config to indicate how many times streaming job should retry for a failed micro batch.<br></br>
+> **Default Value**: 3 (Optional)<br></br>
+> `Config Param: STREAMING_RETRY_CNT`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.partition_fields
+> Field in the table to use for determining hive partition columns.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: HIVE_PARTITION_FIELDS`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.sync_as_datasource
+> <br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: HIVE_SYNC_AS_DATA_SOURCE_TABLE`<br></br>
+
+---
+
+> #### hoodie.sql.insert.mode
+> Insert mode when insert data to pk-table. The optional modes are: upsert, strict and non-strict.For upsert mode, insert statement do the upsert operation for the pk-table which will update the duplicate record.For strict mode, insert statement will keep the primary key uniqueness constraint which do not allow duplicate record.While for non-strict mode, hudi just do the insert operation for the pk-table.<br></br>
+> **Default Value**: upsert (Optional)<br></br>
+> `Config Param: SQL_INSERT_MODE`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.use_jdbc
+> Use JDBC when hive synchronization is enabled<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: HIVE_USE_JDBC`<br></br>
+> `Deprecated Version: 0.9.0`<br></br>
+
+---
+
+> #### hoodie.meta.sync.client.tool.class
+> Sync tool class name used to sync to metastore. Defaults to Hive.<br></br>
+> **Default Value**: org.apache.hudi.hive.HiveSyncTool (Optional)<br></br>
+> `Config Param: META_SYNC_CLIENT_TOOL_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.datasource.write.keygenerator.class
+> Key generator class, that implements `org.apache.hudi.keygen.KeyGenerator`<br></br>
+> **Default Value**: org.apache.hudi.keygen.SimpleKeyGenerator (Optional)<br></br>
+> `Config Param: KEYGENERATOR_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.datasource.write.payload.class
+> Payload class used. Override this, if you like to roll your own merge logic, when upserting/inserting. This will render any value set for PRECOMBINE_FIELD_OPT_VAL in-effective<br></br>
+> **Default Value**: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload (Optional)<br></br>
+> `Config Param: PAYLOAD_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.table_properties
+> Additional properties to store with table.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HIVE_TABLE_PROPERTIES`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.jdbcurl
+> Hive metastore url<br></br>
+> **Default Value**: jdbc:hive2://localhost:10000 (Optional)<br></br>
+> `Config Param: HIVE_URL`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.batch_num
+> The number of partitions one batch when synchronous partitions to hive.<br></br>
+> **Default Value**: 1000 (Optional)<br></br>
+> `Config Param: HIVE_BATCH_SYNC_PARTITION_NUM`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.assume_date_partitioning
+> Assume partitioning is yyyy/mm/dd<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_ASSUME_DATE_PARTITION`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.bucket_sync
+> Whether sync hive metastore bucket specification when using bucket index.The specification is 'CLUSTERED BY (trace_id) SORTED BY (trace_id ASC) INTO 65536 BUCKETS'<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SYNC_BUCKET_SYNC`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.auto_create_database
+> Auto create hive database if does not exists<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: HIVE_AUTO_CREATE_DATABASE`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.database
+> The name of the destination database that we should sync the hudi table to.<br></br>
+> **Default Value**: default (Optional)<br></br>
+> `Config Param: HIVE_DATABASE`<br></br>
+
+---
+
+> #### hoodie.datasource.write.streaming.retry.interval.ms
+>  Config to indicate how long (by millisecond) before a retry should issued for failed microbatch<br></br>
+> **Default Value**: 2000 (Optional)<br></br>
+> `Config Param: STREAMING_RETRY_INTERVAL_MS`<br></br>
+
+---
+
+> #### hoodie.sql.bulk.insert.enable
+> When set to true, the sql insert statement will use bulk insert.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: SQL_ENABLE_BULK_INSERT`<br></br>
+
+---
+
+> #### hoodie.datasource.write.commitmeta.key.prefix
+> Option keys beginning with this prefix, are automatically added to the commit/deltacommit metadata. This is useful to store checkpointing information, in a consistent way with the hudi timeline<br></br>
+> **Default Value**: _ (Optional)<br></br>
+> `Config Param: COMMIT_METADATA_KEYPREFIX`<br></br>
+
+---
+
+> #### hoodie.datasource.write.drop.partition.columns
+> When set to true, will not write the partition columns into hudi. By default, false.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: DROP_PARTITION_COLUMNS`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.enable
+> When set to true, register/sync the table to Apache Hive metastore.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SYNC_ENABLED`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.table
+> The name of the destination table that we should sync the hudi table to.<br></br>
+> **Default Value**: unknown (Optional)<br></br>
+> `Config Param: HIVE_TABLE`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.ignore_exceptions
+> Ignore exceptions when syncing with Hive.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_IGNORE_EXCEPTIONS`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.use_pre_apache_input_format
+> Flag to choose InputFormat under com.uber.hoodie package instead of org.apache.hudi package. Use this when you are in the process of migrating from com.uber.hoodie to org.apache.hudi. Stop using this after you migrated the table definition to org.apache.hudi input format<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_USE_PRE_APACHE_INPUT_FORMAT`<br></br>
+
+---
+
+> #### hoodie.datasource.write.table.type
+> The table type for the underlying data, for this write. This can’t change between writes.<br></br>
+> **Default Value**: COPY_ON_WRITE (Optional)<br></br>
+> `Config Param: TABLE_TYPE`<br></br>
+
+---
+
+> #### hoodie.datasource.write.row.writer.enable
+> When set to true, will perform write operations directly using the spark native `Row` representation, avoiding any additional conversion costs.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ENABLE_ROW_WRITER`<br></br>
+
+---
+
+> #### hoodie.datasource.write.hive_style_partitioning
+> Flag to indicate whether to use Hive style partitioning.
+If set true, the names of partition folders follow <partition_column_name>=<partition_value> format.
+By default false (the names of partition folders are only partition values)<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_STYLE_PARTITIONING`<br></br>
+
+---
+
+> #### hoodie.datasource.meta_sync.condition.sync
+> If true, only sync on conditions like schema change or partition change.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_CONDITIONAL_SYNC`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.mode
+> Mode to choose for Hive ops. Valid values are hms, jdbc and hiveql.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HIVE_SYNC_MODE`<br></br>
+
+---
+
+> #### hoodie.datasource.write.table.name
+> Table name for the datasource write. Also used to register the table into meta stores.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TABLE_NAME`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.base_file_format
+> Base file format for the sync.<br></br>
+> **Default Value**: PARQUET (Optional)<br></br>
+> `Config Param: HIVE_BASE_FILE_FORMAT`<br></br>
+
+---
+
+> #### hoodie.deltastreamer.source.kafka.value.deserializer.class
+> This class is used by kafka client to deserialize the records<br></br>
+> **Default Value**: io.confluent.kafka.serializers.KafkaAvroDeserializer (Optional)<br></br>
+> `Config Param: KAFKA_AVRO_VALUE_DESERIALIZER_CLASS`<br></br>
+> `Since Version: 0.9.0`<br></br>
+
+---
+
+> #### hoodie.datasource.hive_sync.metastore.uris
+> Hive metastore url<br></br>
+> **Default Value**: thrift://localhost:9083 (Optional)<br></br>
+> `Config Param: METASTORE_URIS`<br></br>
+
+---
+
+> #### hoodie.datasource.write.insert.drop.duplicates
+> If set to true, filters out all duplicate records from incoming dataframe, during insert operations.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: INSERT_DROP_DUPS`<br></br>
+
+---
+
+> #### hoodie.datasource.write.partitions.to.delete
+> Comma separated list of partitions to delete<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITIONS_TO_DELETE`<br></br>
+
+---
+
+### PreCommit Validator Configurations {#PreCommit-Validator-Configurations}
+
+The following set of configurations help validate new data before commits.
+
+`Config Class`: org.apache.hudi.config.HoodiePreCommitValidatorConfig<br></br>
+> #### hoodie.precommit.validators.single.value.sql.queries
+> Spark SQL queries to run on table before committing new data to validate state after commit.Multiple queries separated by ';' delimiter are supported.Expected result is included as part of query separated by '#'. Example query: 'query1#result1:query2#result2'Note \<TABLE_NAME\> variable is expected to be present in query.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: SINGLE_VALUE_SQL_QUERIES`<br></br>
+
+---
+
+> #### hoodie.precommit.validators.equality.sql.queries
+> Spark SQL queries to run on table before committing new data to validate state before and after commit. Multiple queries separated by ';' delimiter are supported. Example: "select count(*) from \<TABLE_NAME\> Note \<TABLE_NAME\> is replaced by table state before and after commit.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: EQUALITY_SQL_QUERIES`<br></br>
+
+---
+
+> #### hoodie.precommit.validators
+> Comma separated list of class names that can be invoked to validate commit<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: VALIDATOR_CLASS_NAMES`<br></br>
+
+---
+
+> #### hoodie.precommit.validators.inequality.sql.queries
+> Spark SQL queries to run on table before committing new data to validate state before and after commit.Multiple queries separated by ';' delimiter are supported.Example query: 'select count(*) from \<TABLE_NAME\> where col=null'Note \<TABLE_NAME\> variable is expected to be present in query.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: INEQUALITY_SQL_QUERIES`<br></br>
+
+---
+
+## Flink Sql Configs {#FLINK_SQL}
+These configs control the Hudi Flink SQL source/sink connectors, providing ability to define record keys, pick out the write operation, specify how to merge records, enable/disable asynchronous compaction or choosing query type to read.
+
+### Flink Options {#Flink-Options}
+
+Flink jobs using the SQL can be configured through the options in WITH clause. The actual datasource level configs are listed below.
+
+`Config Class`: org.apache.hudi.configuration.FlinkOptions<br></br>
+> #### read.streaming.enabled
+> Whether to read as streaming source, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: READ_AS_STREAMING`<br></br>
+
+---
+
+> #### hoodie.datasource.write.keygenerator.type
+> Key generator type, that implements will extract the key out of incoming record<br></br>
+> **Default Value**: SIMPLE (Optional)<br></br>
+> `Config Param: KEYGEN_TYPE`<br></br>
+
+---
+
+> #### compaction.trigger.strategy
+> Strategy to trigger compaction, options are 'num_commits': trigger compaction when reach N delta commits;
+'time_elapsed': trigger compaction when time elapsed &gt; N seconds since last compaction;
+'num_and_time': trigger compaction when both NUM_COMMITS and TIME_ELAPSED are satisfied;
+'num_or_time': trigger compaction when NUM_COMMITS or TIME_ELAPSED is satisfied.
+Default is 'num_commits'<br></br>
+> **Default Value**: num_commits (Optional)<br></br>
+> `Config Param: COMPACTION_TRIGGER_STRATEGY`<br></br>
+
+---
+
+> #### index.state.ttl
+> Index state ttl in days, default stores the index permanently<br></br>
+> **Default Value**: 0.0 (Optional)<br></br>
+> `Config Param: INDEX_STATE_TTL`<br></br>
+
+---
+
+> #### compaction.max_memory
+> Max memory in MB for compaction spillable map, default 100MB<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: COMPACTION_MAX_MEMORY`<br></br>
+
+---
+
+> #### hive_sync.support_timestamp
+> INT64 with original type TIMESTAMP_MICROS is converted to hive timestamp type.
+Disabled by default for backward compatibility.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: HIVE_SYNC_SUPPORT_TIMESTAMP`<br></br>
+
+---
+
+> #### hive_sync.serde_properties
+> Serde properties to hive table, the data format is k1=v1
+k2=v2<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HIVE_SYNC_TABLE_SERDE_PROPERTIES`<br></br>
+
+---
+
+> #### hive_sync.skip_ro_suffix
+> Skip the _ro suffix for Read optimized table when registering, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SYNC_SKIP_RO_SUFFIX`<br></br>
+
+---
+
+> #### metadata.compaction.delta_commits
+> Max delta commits for metadata table to trigger compaction, default 10<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: METADATA_COMPACTION_DELTA_COMMITS`<br></br>
+
+---
+
+> #### hive_sync.assume_date_partitioning
+> Assume partitioning is yyyy/mm/dd, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SYNC_ASSUME_DATE_PARTITION`<br></br>
+
+---
+
+> #### write.parquet.block.size
+> Parquet RowGroup size. It's recommended to make this large enough that scan costs can be amortized by packing enough column values into a single row group.<br></br>
+> **Default Value**: 120 (Optional)<br></br>
+> `Config Param: WRITE_PARQUET_BLOCK_SIZE`<br></br>
+
+---
+
+> #### hive_sync.table
+> Table name for hive sync, default 'unknown'<br></br>
+> **Default Value**: unknown (Optional)<br></br>
+> `Config Param: HIVE_SYNC_TABLE`<br></br>
+
+---
+
+> #### write.payload.class
+> Payload class used. Override this, if you like to roll your own merge logic, when upserting/inserting.
+This will render any value set for the option in-effective<br></br>
+> **Default Value**: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload (Optional)<br></br>
+> `Config Param: PAYLOAD_CLASS_NAME`<br></br>
+
+---
+
+> #### compaction.tasks
+> Parallelism of tasks that do actual compaction, default is 4<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: COMPACTION_TASKS`<br></br>
+
+---
+
+> #### hoodie.datasource.write.hive_style_partitioning
+> Whether to use Hive style partitioning.
+If set true, the names of partition folders follow &lt;partition_column_name&gt;=&lt;partition_value&gt; format.
+By default false (the names of partition folders are only partition values)<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_STYLE_PARTITIONING`<br></br>
+
+---
+
+> #### table.type
+> Type of table to write. COPY_ON_WRITE (or) MERGE_ON_READ<br></br>
+> **Default Value**: COPY_ON_WRITE (Optional)<br></br>
+> `Config Param: TABLE_TYPE`<br></br>
+
+---
+
+> #### hive_sync.auto_create_db
+> Auto create hive database if it does not exists, default true<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: HIVE_SYNC_AUTO_CREATE_DB`<br></br>
+
+---
+
+> #### compaction.timeout.seconds
+> Max timeout time in seconds for online compaction to rollback, default 20 minutes<br></br>
+> **Default Value**: 1200 (Optional)<br></br>
+> `Config Param: COMPACTION_TIMEOUT_SECONDS`<br></br>
+
+---
+
+> #### hive_sync.username
+> Username for hive sync, default 'hive'<br></br>
+> **Default Value**: hive (Optional)<br></br>
+> `Config Param: HIVE_SYNC_USERNAME`<br></br>
+
+---
+
+> #### write.sort.memory
+> Sort memory in MB, default 128MB<br></br>
+> **Default Value**: 128 (Optional)<br></br>
+> `Config Param: WRITE_SORT_MEMORY`<br></br>
+
+---
+
+> #### hive_sync.enable
+> Asynchronously sync Hive meta to HMS, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SYNC_ENABLED`<br></br>
+
+---
+
+> #### changelog.enabled
+> Whether to keep all the intermediate changes, we try to keep all the changes of a record when enabled:
+1). The sink accept the UPDATE_BEFORE message;
+2). The source try to emit every changes of a record.
+The semantics is best effort because the compaction job would finally merge all changes of a record into one.
+ default false to have UPSERT semantics<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: CHANGELOG_ENABLED`<br></br>
+
+---
+
+> #### read.streaming.check-interval
+> Check interval for streaming read of SECOND, default 1 minute<br></br>
+> **Default Value**: 60 (Optional)<br></br>
+> `Config Param: READ_STREAMING_CHECK_INTERVAL`<br></br>
+
+---
+
+> #### write.bulk_insert.shuffle_input
+> Whether to shuffle the inputs by specific fields for bulk insert tasks, default true<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: WRITE_BULK_INSERT_SHUFFLE_INPUT`<br></br>
+
+---
+
+> #### hoodie.datasource.merge.type
+> For Snapshot query on merge on read table. Use this key to define how the payloads are merged, in
+1) skip_merge: read the base file records plus the log file records;
+2) payload_combine: read the base file records first, for each record in base file, checks whether the key is in the
+   log file records(combines the two records with same key for base and log file records), then read the left log file records<br></br>
+> **Default Value**: payload_combine (Optional)<br></br>
+> `Config Param: MERGE_TYPE`<br></br>
+
+---
+
+> #### write.retry.times
+> Flag to indicate how many times streaming job should retry for a failed checkpoint batch.
+By default 3<br></br>
+> **Default Value**: 3 (Optional)<br></br>
+> `Config Param: RETRY_TIMES`<br></br>
+
+---
+
+> #### metadata.enabled
+> Enable the internal metadata table which serves table metadata like level file listings, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: METADATA_ENABLED`<br></br>
+
+---
+
+> #### read.tasks
+> Parallelism of tasks that do actual read, default is 4<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: READ_TASKS`<br></br>
+
+---
+
+> #### write.parquet.max.file.size
+> Target size for parquet files produced by Hudi write phases. For DFS, this needs to be aligned with the underlying filesystem block size for optimal performance.<br></br>
+> **Default Value**: 120 (Optional)<br></br>
+> `Config Param: WRITE_PARQUET_MAX_FILE_SIZE`<br></br>
+
+---
+
+> #### hoodie.bucket.index.hash.field
+> Index key field. Value to be used as hashing to find the bucket ID. Should be a subset of or equal to the recordKey fields.
+Actual value will be obtained by invoking .toString() on the field value. Nested fields can be specified using the dot notation eg: `a.b.c`<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: INDEX_KEY_FIELD`<br></br>
+
+---
+
+> #### hoodie.bucket.index.num.buckets
+> Hudi bucket number per partition. Only affected if using Hudi bucket index.<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: BUCKET_INDEX_NUM_BUCKETS`<br></br>
+
+---
+
+> #### read.end-commit
+> End commit instant for reading, the commit time format should be 'yyyyMMddHHmmss'<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: READ_END_COMMIT`<br></br>
+
+---
+
+> #### write.log.max.size
+> Maximum size allowed in MB for a log file before it is rolled over to the next version, default 1GB<br></br>
+> **Default Value**: 1024 (Optional)<br></br>
+> `Config Param: WRITE_LOG_MAX_SIZE`<br></br>
+
+---
+
+> #### hive_sync.file_format
+> File format for hive sync, default 'PARQUET'<br></br>
+> **Default Value**: PARQUET (Optional)<br></br>
+> `Config Param: HIVE_SYNC_FILE_FORMAT`<br></br>
+
+---
+
+> #### hive_sync.mode
+> Mode to choose for Hive ops. Valid values are hms, jdbc and hiveql, default 'jdbc'<br></br>
+> **Default Value**: jdbc (Optional)<br></br>
+> `Config Param: HIVE_SYNC_MODE`<br></br>
+
+---
+
+> #### write.retry.interval.ms
+> Flag to indicate how long (by millisecond) before a retry should issued for failed checkpoint batch.
+By default 2000 and it will be doubled by every retry<br></br>
+> **Default Value**: 2000 (Optional)<br></br>
+> `Config Param: RETRY_INTERVAL_MS`<br></br>
+
+---
+
+> #### write.partition.format
+> Partition path format, only valid when 'write.datetime.partitioning' is true, default is:
+1) 'yyyyMMddHH' for timestamp(3) WITHOUT TIME ZONE, LONG, FLOAT, DOUBLE, DECIMAL;
+2) 'yyyyMMdd' for DATE and INT.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITION_FORMAT`<br></br>
+
+---
+
+> #### hive_sync.db
+> Database name for hive sync, default 'default'<br></br>
+> **Default Value**: default (Optional)<br></br>
+> `Config Param: HIVE_SYNC_DB`<br></br>
+
+---
+
+> #### index.type
+> Index type of Flink write job, default is using state backed index.<br></br>
+> **Default Value**: FLINK_STATE (Optional)<br></br>
+> `Config Param: INDEX_TYPE`<br></br>
+
+---
+
+> #### hive_sync.password
+> Password for hive sync, default 'hive'<br></br>
+> **Default Value**: hive (Optional)<br></br>
+> `Config Param: HIVE_SYNC_PASSWORD`<br></br>
+
+---
+
+> #### hive_sync.use_jdbc
+> Use JDBC when hive synchronization is enabled, default true<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: HIVE_SYNC_USE_JDBC`<br></br>
+
+---
+
+> #### compaction.schedule.enabled
+> Schedule the compaction plan, enabled by default for MOR<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: COMPACTION_SCHEDULE_ENABLED`<br></br>
+
+---
+
+> #### hive_sync.jdbc_url
+> Jdbc URL for hive sync, default 'jdbc:hive2://localhost:10000'<br></br>
+> **Default Value**: jdbc:hive2://localhost:10000 (Optional)<br></br>
+> `Config Param: HIVE_SYNC_JDBC_URL`<br></br>
+
+---
+
+> #### hive_sync.partition_extractor_class
+> Tool to extract the partition value from HDFS path, default 'SlashEncodedDayPartitionValueExtractor'<br></br>
+> **Default Value**: org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor (Optional)<br></br>
+> `Config Param: HIVE_SYNC_PARTITION_EXTRACTOR_CLASS_NAME`<br></br>
+
+---
+
+> #### read.start-commit
+> Start commit instant for reading, the commit time format should be 'yyyyMMddHHmmss', by default reading from the latest instant for streaming read<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: READ_START_COMMIT`<br></br>
+
+---
+
+> #### write.precombine
+> Flag to indicate whether to drop duplicates before insert/upsert.
+By default these cases will accept duplicates, to gain extra performance:
+1) insert operation;
+2) upsert for MOR table, the MOR table deduplicate on reading<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: PRE_COMBINE`<br></br>
+
+---
+
+> #### write.batch.size
+> Batch buffer size in MB to flush data into the underneath filesystem, default 256MB<br></br>
+> **Default Value**: 256.0 (Optional)<br></br>
+> `Config Param: WRITE_BATCH_SIZE`<br></br>
+
+---
+
+> #### archive.min_commits
+> Min number of commits to keep before archiving older commits into a sequential log, default 40<br></br>
+> **Default Value**: 40 (Optional)<br></br>
+> `Config Param: ARCHIVE_MIN_COMMITS`<br></br>
+
+---
+
+> #### hoodie.datasource.write.keygenerator.class
+> Key generator class, that implements will extract the key out of incoming record<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: KEYGEN_CLASS_NAME`<br></br>
+
+---
+
+> #### index.global.enabled
+> Whether to update index for the old partition path
+if same key record with different partition path came in, default true<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: INDEX_GLOBAL_ENABLED`<br></br>
+
+---
+
+> #### index.partition.regex
+> Whether to load partitions in state if partition path matching， default `*`<br></br>
+> **Default Value**: .* (Optional)<br></br>
+> `Config Param: INDEX_PARTITION_REGEX`<br></br>
+
+---
+
+> #### hoodie.table.name
+> Table name to register to Hive metastore<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TABLE_NAME`<br></br>
+
+---
+
+> #### path
+> Base path for the target hoodie table.
+The path would be created if it does not exist,
+otherwise a Hoodie table expects to be initialized successfully<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PATH`<br></br>
+
+---
+
+> #### index.bootstrap.enabled
+> Whether to bootstrap the index state from existing hoodie table, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: INDEX_BOOTSTRAP_ENABLED`<br></br>
+
+---
+
+> #### read.streaming.skip_compaction
+> Whether to skip compaction instants for streaming read,
+there are two cases that this option can be used to avoid reading duplicates:
+1) you are definitely sure that the consumer reads faster than any compaction instants, usually with delta time compaction strategy that is long enough, for e.g, one week;
+2) changelog mode is enabled, this option is a solution to keep data integrity<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: READ_STREAMING_SKIP_COMPACT`<br></br>
+
+---
+
+> #### hoodie.datasource.write.partitionpath.urlencode
+> Whether to encode the partition path url, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: URL_ENCODE_PARTITIONING`<br></br>
+
+---
+
+> #### compaction.async.enabled
+> Async Compaction, enabled by default for MOR<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: COMPACTION_ASYNC_ENABLED`<br></br>
+
+---
+
+> #### hive_sync.ignore_exceptions
+> Ignore exceptions during hive synchronization, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SYNC_IGNORE_EXCEPTIONS`<br></br>
+
+---
+
+> #### hive_sync.table_properties
+> Additional properties to store with table, the data format is k1=v1
+k2=v2<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HIVE_SYNC_TABLE_PROPERTIES`<br></br>
+
+---
+
+> #### write.ignore.failed
+> Flag to indicate whether to ignore any non exception error (e.g. writestatus error). within a checkpoint batch.
+By default true (in favor of streaming progressing over data integrity)<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: IGNORE_FAILED`<br></br>
+
+---
+
+> #### write.commit.ack.timeout
+> Timeout limit for a writer task after it finishes a checkpoint and
+waits for the instant commit success, only for internal use<br></br>
+> **Default Value**: -1 (Optional)<br></br>
+> `Config Param: WRITE_COMMIT_ACK_TIMEOUT`<br></br>
+
+---
+
+> #### write.operation
+> The write operation, that this write should do<br></br>
+> **Default Value**: upsert (Optional)<br></br>
+> `Config Param: OPERATION`<br></br>
+
+---
+
+> #### hoodie.datasource.write.partitionpath.field
+> Partition path field. Value to be used at the `partitionPath` component of `HoodieKey`.
+Actual value obtained by invoking .toString(), default ''<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: PARTITION_PATH_FIELD`<br></br>
+
+---
+
+> #### write.bucket_assign.tasks
+> Parallelism of tasks that do bucket assign, default is the parallelism of the execution environment<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BUCKET_ASSIGN_TASKS`<br></br>
+
+---
+
+> #### source.avro-schema.path
+> Source avro schema file path, the parsed schema is used for deserialization<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: SOURCE_AVRO_SCHEMA_PATH`<br></br>
+
+---
+
+> #### compaction.delta_commits
+> Max delta commits needed to trigger compaction, default 5 commits<br></br>
+> **Default Value**: 5 (Optional)<br></br>
+> `Config Param: COMPACTION_DELTA_COMMITS`<br></br>
+
+---
+
+> #### write.insert.cluster
+> Whether to merge small files for insert mode, if true, the write throughput will decrease because the read/write of existing small file, only valid for COW table, default false<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: INSERT_CLUSTER`<br></br>
+
+---
+
+> #### partition.default_name
+> The default partition name in case the dynamic partition column value is null/empty string<br></br>
+> **Default Value**: default (Optional)<br></br>
+> `Config Param: PARTITION_DEFAULT_NAME`<br></br>
+
+---
+
+> #### write.bulk_insert.sort_input
+> Whether to sort the inputs by specific fields for bulk insert tasks, default true<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: WRITE_BULK_INSERT_SORT_INPUT`<br></br>
+
+---
+
+> #### source.avro-schema
+> Source avro schema string, the parsed schema is used for deserialization<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: SOURCE_AVRO_SCHEMA`<br></br>
+
+---
+
+> #### compaction.target_io
+> Target IO in MB for per compaction (both read and write), default 500 GB<br></br>
+> **Default Value**: 512000 (Optional)<br></br>
+> `Config Param: COMPACTION_TARGET_IO`<br></br>
+
+---
+
+> #### write.rate.limit
+> Write record rate limit per second to prevent traffic jitter and improve stability, default 0 (no limit)<br></br>
+> **Default Value**: 0 (Optional)<br></br>
+> `Config Param: WRITE_RATE_LIMIT`<br></br>
+
+---
+
+> #### write.log_block.size
+> Max log block size in MB for log file, default 128MB<br></br>
+> **Default Value**: 128 (Optional)<br></br>
+> `Config Param: WRITE_LOG_BLOCK_SIZE`<br></br>
+
+---
+
+> #### write.tasks
+> Parallelism of tasks that do actual write, default is 4<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: WRITE_TASKS`<br></br>
+
+---
+
+> #### clean.async.enabled
+> Whether to cleanup the old commits immediately on new commits, enabled by default<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: CLEAN_ASYNC_ENABLED`<br></br>
+
+---
+
+> #### clean.retain_commits
+> Number of commits to retain. So data will be retained for num_of_commits * time_between_commits (scheduled).
+This also directly translates into how much you can incrementally pull on this table, default 30<br></br>
+> **Default Value**: 30 (Optional)<br></br>
+> `Config Param: CLEAN_RETAIN_COMMITS`<br></br>
+
+---
+
+> #### read.utc-timezone
+> Use UTC timezone or local timezone to the conversion between epoch time and LocalDateTime. Hive 0.x/1.x/2.x use local timezone. But Hive 3.x use UTC timezone, by default true<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: UTC_TIMEZONE`<br></br>
+
+---
+
+> #### archive.max_commits
+> Max number of commits to keep before archiving older commits into a sequential log, default 50<br></br>
+> **Default Value**: 50 (Optional)<br></br>
+> `Config Param: ARCHIVE_MAX_COMMITS`<br></br>
+
+---
+
+> #### hoodie.datasource.query.type
+> Decides how data files need to be read, in
+1) Snapshot mode (obtain latest view, based on row &amp; columnar data);
+2) incremental mode (new data since an instantTime);
+3) Read Optimized mode (obtain latest view, based on columnar data)
+.Default: snapshot<br></br>
+> **Default Value**: snapshot (Optional)<br></br>
+> `Config Param: QUERY_TYPE`<br></br>
+
+---
+
+> #### write.precombine.field
+> Field used in preCombining before actual write. When two records have the same
+key value, we will pick the one with the largest value for the precombine field,
+determined by Object.compareTo(..)<br></br>
+> **Default Value**: ts (Optional)<br></br>
+> `Config Param: PRECOMBINE_FIELD`<br></br>
+
+---
+
+> #### write.index_bootstrap.tasks
+> Parallelism of tasks that do index bootstrap, default is the parallelism of the execution environment<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: INDEX_BOOTSTRAP_TASKS`<br></br>
+
+---
+
+> #### write.task.max.size
+> Maximum memory in MB for a write task, when the threshold hits,
+it flushes the max size data bucket to avoid OOM, default 1GB<br></br>
+> **Default Value**: 1024.0 (Optional)<br></br>
+> `Config Param: WRITE_TASK_MAX_SIZE`<br></br>
+
+---
+
+> #### hoodie.datasource.write.recordkey.field
+> Record key field. Value to be used as the `recordKey` component of `HoodieKey`.
+Actual value will be obtained by invoking .toString() on the field value. Nested fields can be specified using the dot notation eg: `a.b.c`<br></br>
+> **Default Value**: uuid (Optional)<br></br>
+> `Config Param: RECORD_KEY_FIELD`<br></br>
+
+---
+
+> #### write.parquet.page.size
+> Parquet page size. Page is the unit of read within a parquet file. Within a block, pages are compressed separately.<br></br>
+> **Default Value**: 1 (Optional)<br></br>
+> `Config Param: WRITE_PARQUET_PAGE_SIZE`<br></br>
+
+---
+
+> #### compaction.delta_seconds
+> Max delta seconds time needed to trigger compaction, default 1 hour<br></br>
+> **Default Value**: 3600 (Optional)<br></br>
+> `Config Param: COMPACTION_DELTA_SECONDS`<br></br>
+
+---
+
+> #### hive_sync.metastore.uris
+> Metastore uris for hive sync, default ''<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: HIVE_SYNC_METASTORE_URIS`<br></br>
+
+---
+
+> #### hive_sync.partition_fields
+> Partition fields for hive sync, default ''<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: HIVE_SYNC_PARTITION_FIELDS`<br></br>
+
+---
+
+> #### write.merge.max_memory
+> Max memory in MB for merge, default 100MB<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: WRITE_MERGE_MAX_MEMORY`<br></br>
+
+---
+
+## Write Client Configs {#WRITE_CLIENT}
+Internally, the Hudi datasource uses a RDD based HoodieWriteClient API to actually perform writes to storage. These configs provide deep control over lower level aspects like file sizing, compression, parallelism, compaction, write schema, cleaning etc. Although Hudi provides sane defaults, from time-time these configs may need to be tweaked to optimize for specific workloads.
+
+### Layout Configs {#Layout-Configs}
+
+Configurations that control storage layout and data distribution, which defines how the files are organized within a table.
+
+`Config Class`: org.apache.hudi.config.HoodieLayoutConfig<br></br>
+> #### hoodie.storage.layout.type
+> Type of storage layout. Possible options are [DEFAULT | BUCKET]<br></br>
+> **Default Value**: DEFAULT (Optional)<br></br>
+> `Config Param: LAYOUT_TYPE`<br></br>
+
+---
+
+> #### hoodie.storage.layout.partitioner.class
+> Partitioner class, it is used to distribute data in a specific way.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: LAYOUT_PARTITIONER_CLASS_NAME`<br></br>
+
+---
+
+### Write commit callback configs {#Write-commit-callback-configs}
+
+Controls callback behavior into HTTP endpoints, to push  notifications on commits on hudi tables.
+
+`Config Class`: org.apache.hudi.config.HoodieWriteCommitCallbackConfig<br></br>
+> #### hoodie.write.commit.callback.on
+> Turn commit callback on/off. off by default.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: TURN_CALLBACK_ON`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.http.url
+> Callback host to be sent along with callback messages<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: CALLBACK_HTTP_URL`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.http.timeout.seconds
+> Callback timeout in seconds. 3 by default<br></br>
+> **Default Value**: 3 (Optional)<br></br>
+> `Config Param: CALLBACK_HTTP_TIMEOUT_IN_SECONDS`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.class
+> Full path of callback class and must be a subclass of HoodieWriteCommitCallback class, org.apache.hudi.callback.impl.HoodieWriteCommitHttpCallback by default<br></br>
+> **Default Value**: org.apache.hudi.callback.impl.HoodieWriteCommitHttpCallback (Optional)<br></br>
+> `Config Param: CALLBACK_CLASS_NAME`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.http.api.key
+> Http callback API key. hudi_write_commit_http_callback by default<br></br>
+> **Default Value**: hudi_write_commit_http_callback (Optional)<br></br>
+> `Config Param: CALLBACK_HTTP_API_KEY_VALUE`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+### Table Configurations {#Table-Configurations}
+
+Configurations that persist across writes and read on a Hudi table  like  base, log file formats, table name, creation schema, table version layouts.  Configurations are loaded from hoodie.properties, these properties are usually set during initializing a path as hoodie base path and rarely changes during the lifetime of the table. Writers/Queries' configurations are validated against these  each time for compatibility.
+
+`Config Class`: org.apache.hudi.common.table.HoodieTableConfig<br></br>
+> #### hoodie.table.precombine.field
+> Field used in preCombining before actual write. By default, when two records have the same key value, the largest value for the precombine field determined by Object.compareTo(..), is picked.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PRECOMBINE_FIELD`<br></br>
+
+---
+
+> #### hoodie.archivelog.folder
+> path under the meta folder, to store archived timeline instants at.<br></br>
+> **Default Value**: archived (Optional)<br></br>
+> `Config Param: ARCHIVELOG_FOLDER`<br></br>
+
+---
+
+> #### hoodie.table.type
+> The table type for the underlying data, for this write. This can’t change between writes.<br></br>
+> **Default Value**: COPY_ON_WRITE (Optional)<br></br>
+> `Config Param: TYPE`<br></br>
+
+---
+
+> #### hoodie.table.timeline.timezone
+> User can set hoodie commit timeline timezone, such as utc, local and so on. local is default<br></br>
+> **Default Value**: LOCAL (Optional)<br></br>
+> `Config Param: TIMELINE_TIMEZONE`<br></br>
+
+---
+
+> #### hoodie.partition.metafile.use.base.format
+> If true, partition metafiles are saved in the same format as base-files for this dataset (e.g. Parquet / ORC). If false (default) partition metafiles are saved as properties files.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: PARTITION_METAFILE_USE_BASE_FORMAT`<br></br>
+
+---
+
+> #### hoodie.table.checksum
+> Table checksum is used to guard against partial writes in HDFS. It is added as the last entry in hoodie.properties and then used to validate while reading table config.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TABLE_CHECKSUM`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.table.create.schema
+> Schema used when creating the table, for the first time.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: CREATE_SCHEMA`<br></br>
+
+---
+
+> #### hoodie.table.recordkey.fields
+> Columns used to uniquely identify the table. Concatenated values of these fields are used as  the record key component of HoodieKey.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: RECORDKEY_FIELDS`<br></br>
+
+---
+
+> #### hoodie.table.log.file.format
+> Log format used for the delta logs.<br></br>
+> **Default Value**: HOODIE_LOG (Optional)<br></br>
+> `Config Param: LOG_FILE_FORMAT`<br></br>
+
+---
+
+> #### hoodie.bootstrap.index.enable
+> Whether or not, this is a bootstrapped table, with bootstrap base data and an mapping index defined, default true.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: BOOTSTRAP_INDEX_ENABLE`<br></br>
+
+---
+
+> #### hoodie.table.metadata.partitions
+> Comma-separated list of metadata partitions that have been completely built and in-sync with data table. These partitions are ready for use by the readers<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TABLE_METADATA_PARTITIONS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.table.metadata.partitions.inflight
+> Comma-separated list of metadata partitions whose building is in progress. These partitions are not yet ready for use by the readers.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TABLE_METADATA_PARTITIONS_INFLIGHT`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.table.partition.fields
+> Fields used to partition the table. Concatenated values of these fields are used as the partition path, by invoking toString()<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITION_FIELDS`<br></br>
+
+---
+
+> #### hoodie.populate.meta.fields
+> When enabled, populates all meta fields. When disabled, no meta fields are populated and incremental queries will not be functional. This is only meant to be used for append only/immutable data for batch processing<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: POPULATE_META_FIELDS`<br></br>
+
+---
+
+> #### hoodie.compaction.payload.class
+> Payload class to use for performing compactions, i.e merge delta logs with current base file and then  produce a new base file.<br></br>
+> **Default Value**: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload (Optional)<br></br>
+> `Config Param: PAYLOAD_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.bootstrap.index.class
+> Implementation to use, for mapping base files to bootstrap base file, that contain actual data.<br></br>
+> **Default Value**: org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex (Optional)<br></br>
+> `Config Param: BOOTSTRAP_INDEX_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.datasource.write.partitionpath.urlencode
+> Should we url encode the partition path value, before creating the folder structure.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: URL_ENCODE_PARTITIONING`<br></br>
+
+---
+
+> #### hoodie.datasource.write.hive_style_partitioning
+> Flag to indicate whether to use Hive style partitioning.
+If set true, the names of partition folders follow <partition_column_name>=<partition_value> format.
+By default false (the names of partition folders are only partition values)<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_STYLE_PARTITIONING_ENABLE`<br></br>
+
+---
+
+> #### hoodie.table.keygenerator.class
+> Key Generator class property for the hoodie table<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: KEY_GENERATOR_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.table.version
+> Version of table, used for running upgrade/downgrade steps between releases with potentially breaking/backwards compatible changes.<br></br>
+> **Default Value**: ZERO (Optional)<br></br>
+> `Config Param: VERSION`<br></br>
+
+---
+
+> #### hoodie.table.base.file.format
+> Base file format to store all the base file data.<br></br>
+> **Default Value**: PARQUET (Optional)<br></br>
+> `Config Param: BASE_FILE_FORMAT`<br></br>
+
+---
+
+> #### hoodie.bootstrap.base.path
+> Base path of the dataset that needs to be bootstrapped as a Hudi table<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BOOTSTRAP_BASE_PATH`<br></br>
+
+---
+
+> #### hoodie.datasource.write.drop.partition.columns
+> When set to true, will not write the partition columns into hudi. By default, false.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: DROP_PARTITION_COLUMNS`<br></br>
+
+---
+
+> #### hoodie.database.name
+> Database name that will be used for incremental query.If different databases have the same table name during incremental query, we can set it to limit the table name under a specific database<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: DATABASE_NAME`<br></br>
+
+---
+
+> #### hoodie.timeline.layout.version
+> Version of timeline used, by the table.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TIMELINE_LAYOUT_VERSION`<br></br>
+
+---
+
+> #### hoodie.table.name
+> Table name that will be used for registering with Hive. Needs to be same across runs.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: NAME`<br></br>
+
+---
+
+### Memory Configurations {#Memory-Configurations}
+
+Controls memory usage for compaction and merges, performed internally by Hudi.
+
+`Config Class`: org.apache.hudi.config.HoodieMemoryConfig<br></br>
+> #### hoodie.memory.merge.fraction
+> This fraction is multiplied with the user memory fraction (1 - spark.memory.fraction) to get a final fraction of heap space to use during merge<br></br>
+> **Default Value**: 0.6 (Optional)<br></br>
+> `Config Param: MAX_MEMORY_FRACTION_FOR_MERGE`<br></br>
+
+---
+
+> #### hoodie.memory.dfs.buffer.max.size
+> Property to control the max memory in bytes for dfs input stream buffer size<br></br>
+> **Default Value**: 16777216 (Optional)<br></br>
+> `Config Param: MAX_DFS_STREAM_BUFFER_SIZE`<br></br>
+
+---
+
+> #### hoodie.memory.writestatus.failure.fraction
+> Property to control how what fraction of the failed record, exceptions we report back to driver. Default is 10%. If set to 100%, with lot of failures, this can cause memory pressure, cause OOMs and mask actual data errors.<br></br>
+> **Default Value**: 0.1 (Optional)<br></br>
+> `Config Param: WRITESTATUS_FAILURE_FRACTION`<br></br>
+
+---
+
+> #### hoodie.memory.compaction.fraction
+> HoodieCompactedLogScanner reads logblocks, converts records to HoodieRecords and then merges these log blocks and records. At any point, the number of entries in a log block can be less than or equal to the number of entries in the corresponding parquet file. This can lead to OOM in the Scanner. Hence, a spillable map helps alleviate the memory pressure. Use this config to set the max allowable inMemory footprint of the spillable map<br></br>
+> **Default Value**: 0.6 (Optional)<br></br>
+> `Config Param: MAX_MEMORY_FRACTION_FOR_COMPACTION`<br></br>
+
+---
+
+> #### hoodie.memory.merge.max.size
+> Maximum amount of memory used  in bytes for merge operations, before spilling to local storage.<br></br>
+> **Default Value**: 1073741824 (Optional)<br></br>
+> `Config Param: MAX_MEMORY_FOR_MERGE`<br></br>
+
+---
+
+> #### hoodie.memory.spillable.map.path
+> Default file path prefix for spillable map<br></br>
+> **Default Value**: /tmp/ (Optional)<br></br>
+> `Config Param: SPILLABLE_MAP_BASE_PATH`<br></br>
+
+---
+
+> #### hoodie.memory.compaction.max.size
+> Maximum amount of memory used  in bytes for compaction operations in bytes , before spilling to local storage.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: MAX_MEMORY_FOR_COMPACTION`<br></br>
+
+---
+
+### Storage Configs {#Storage-Configs}
+
+Configurations that control aspects around writing, sizing, reading base and log files.
+
+`Config Class`: org.apache.hudi.config.HoodieStorageConfig<br></br>
+> #### hoodie.logfile.data.block.max.size
+> LogFile Data block max size in bytes. This is the maximum size allowed for a single data block to be appended to a log file. This helps to make sure the data appended to the log file is broken up into sizable blocks to prevent from OOM errors. This size should be greater than the JVM memory.<br></br>
+> **Default Value**: 268435456 (Optional)<br></br>
+> `Config Param: LOGFILE_DATA_BLOCK_MAX_SIZE`<br></br>
+
+---
+
+> #### hoodie.parquet.outputtimestamptype
+> Sets spark.sql.parquet.outputTimestampType. Parquet timestamp type to use when Spark writes data to Parquet files.<br></br>
+> **Default Value**: TIMESTAMP_MICROS (Optional)<br></br>
+> `Config Param: PARQUET_OUTPUT_TIMESTAMP_TYPE`<br></br>
+
+---
+
+> #### hoodie.orc.stripe.size
+> Size of the memory buffer in bytes for writing<br></br>
+> **Default Value**: 67108864 (Optional)<br></br>
+> `Config Param: ORC_STRIPE_SIZE`<br></br>
+
+---
+
+> #### hoodie.orc.block.size
+> ORC block size, recommended to be aligned with the target file size.<br></br>
+> **Default Value**: 125829120 (Optional)<br></br>
+> `Config Param: ORC_BLOCK_SIZE`<br></br>
+
+---
+
+> #### hoodie.orc.compression.codec
+> Compression codec to use for ORC base files.<br></br>
+> **Default Value**: ZLIB (Optional)<br></br>
+> `Config Param: ORC_COMPRESSION_CODEC_NAME`<br></br>
+
+---
+
+> #### hoodie.parquet.max.file.size
+> Target size in bytes for parquet files produced by Hudi write phases. For DFS, this needs to be aligned with the underlying filesystem block size for optimal performance.<br></br>
+> **Default Value**: 125829120 (Optional)<br></br>
+> `Config Param: PARQUET_MAX_FILE_SIZE`<br></br>
+
+---
+
+> #### hoodie.hfile.max.file.size
+> Target file size in bytes for HFile base files.<br></br>
+> **Default Value**: 125829120 (Optional)<br></br>
+> `Config Param: HFILE_MAX_FILE_SIZE`<br></br>
+
+---
+
+> #### hoodie.parquet.writelegacyformat.enabled
+> Sets spark.sql.parquet.writeLegacyFormat. If true, data will be written in a way of Spark 1.4 and earlier. For example, decimal values will be written in Parquet's fixed-length byte array format which other systems such as Apache Hive and Apache Impala use. If false, the newer format in Parquet will be used. For example, decimals will be written in int-based format.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: PARQUET_WRITE_LEGACY_FORMAT_ENABLED`<br></br>
+
+---
+
+> #### hoodie.parquet.block.size
+> Parquet RowGroup size in bytes. It's recommended to make this large enough that scan costs can be amortized by packing enough column values into a single row group.<br></br>
+> **Default Value**: 125829120 (Optional)<br></br>
+> `Config Param: PARQUET_BLOCK_SIZE`<br></br>
+
+---
+
+> #### hoodie.logfile.max.size
+> LogFile max size in bytes. This is the maximum size allowed for a log file before it is rolled over to the next version.<br></br>
+> **Default Value**: 1073741824 (Optional)<br></br>
+> `Config Param: LOGFILE_MAX_SIZE`<br></br>
+
+---
+
+> #### hoodie.parquet.dictionary.enabled
+> Whether to use dictionary encoding<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: PARQUET_DICTIONARY_ENABLED`<br></br>
+
+---
+
+> #### hoodie.hfile.block.size
+> Lower values increase the size in bytes of metadata tracked within HFile, but can offer potentially faster lookup times.<br></br>
+> **Default Value**: 1048576 (Optional)<br></br>
+> `Config Param: HFILE_BLOCK_SIZE`<br></br>
+
+---
+
+> #### hoodie.parquet.page.size
+> Parquet page size in bytes. Page is the unit of read within a parquet file. Within a block, pages are compressed separately.<br></br>
+> **Default Value**: 1048576 (Optional)<br></br>
+> `Config Param: PARQUET_PAGE_SIZE`<br></br>
+
+---
+
+> #### hoodie.hfile.compression.algorithm
+> Compression codec to use for hfile base files.<br></br>
+> **Default Value**: GZ (Optional)<br></br>
+> `Config Param: HFILE_COMPRESSION_ALGORITHM_NAME`<br></br>
+
+---
+
+> #### hoodie.orc.max.file.size
+> Target file size in bytes for ORC base files.<br></br>
+> **Default Value**: 125829120 (Optional)<br></br>
+> `Config Param: ORC_FILE_MAX_SIZE`<br></br>
+
+---
+
+> #### hoodie.logfile.data.block.format
+> Format of the data block within delta logs. Following formats are currently supported "avro", "hfile", "parquet"<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: LOGFILE_DATA_BLOCK_FORMAT`<br></br>
+
+---
+
+> #### hoodie.logfile.to.parquet.compression.ratio
+> Expected additional compression as records move from log files to parquet. Used for merge_on_read table to send inserts into log files & control the size of compacted parquet file.<br></br>
+> **Default Value**: 0.35 (Optional)<br></br>
+> `Config Param: LOGFILE_TO_PARQUET_COMPRESSION_RATIO_FRACTION`<br></br>
+
+---
+
+> #### hoodie.parquet.compression.ratio
+> Expected compression of parquet data used by Hudi, when it tries to size new parquet files. Increase this value, if bulk_insert is producing smaller than expected sized files<br></br>
+> **Default Value**: 0.1 (Optional)<br></br>
+> `Config Param: PARQUET_COMPRESSION_RATIO_FRACTION`<br></br>
+
+---
+
+> #### hoodie.parquet.compression.codec
+> Compression Codec for parquet files<br></br>
+> **Default Value**: gzip (Optional)<br></br>
+> `Config Param: PARQUET_COMPRESSION_CODEC_NAME`<br></br>
+
+---
+
+### DynamoDB based Locks Configurations {#DynamoDB-based-Locks-Configurations}
+
+Configs that control DynamoDB based locking mechanisms required for concurrency control  between writers to a Hudi table. Concurrency between Hudi's own table services  are auto managed internally.
+
+`Config Class`: org.apache.hudi.config.DynamoDbBasedLockConfig<br></br>
+> #### hoodie.write.lock.dynamodb.billing_mode
+> For DynamoDB based lock provider, by default it is PAY_PER_REQUEST mode<br></br>
+> **Default Value**: PAY_PER_REQUEST (Optional)<br></br>
+> `Config Param: DYNAMODB_LOCK_BILLING_MODE`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.dynamodb.table
+> For DynamoDB based lock provider, the name of the DynamoDB table acting as lock table<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: DYNAMODB_LOCK_TABLE_NAME`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.dynamodb.region
+> For DynamoDB based lock provider, the region used in endpoint for Amazon DynamoDB service. Would try to first get it from AWS_REGION environment variable. If not find, by default use us-east-1<br></br>
+> **Default Value**: us-east-1 (Optional)<br></br>
+> `Config Param: DYNAMODB_LOCK_REGION`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.dynamodb.partition_key
+> For DynamoDB based lock provider, the partition key for the DynamoDB lock table. Each Hudi dataset should has it's unique key so concurrent writers could refer to the same partition key. By default we use the Hudi table name specified to be the partition key<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: DYNAMODB_LOCK_PARTITION_KEY`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.dynamodb.write_capacity
+> For DynamoDB based lock provider, write capacity units when using PROVISIONED billing mode<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: DYNAMODB_LOCK_WRITE_CAPACITY`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.dynamodb.table_creation_timeout
+> For DynamoDB based lock provider, the maximum number of milliseconds to wait for creating DynamoDB table<br></br>
+> **Default Value**: 600000 (Optional)<br></br>
+> `Config Param: DYNAMODB_LOCK_TABLE_CREATION_TIMEOUT`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.dynamodb.read_capacity
+> For DynamoDB based lock provider, read capacity units when using PROVISIONED billing mode<br></br>
+> **Default Value**: 20 (Optional)<br></br>
+> `Config Param: DYNAMODB_LOCK_READ_CAPACITY`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.dynamodb.endpoint_url
+> For DynamoDB based lock provider, the url endpoint used for Amazon DynamoDB service. Useful for development with a local dynamodb instance.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: DYNAMODB_ENDPOINT_URL`<br></br>
+> `Since Version: 0.10.1`<br></br>
+
+---
+
+### Metadata Configs {#Metadata-Configs}
+
+Configurations used by the Hudi Metadata Table. This table maintains the metadata about a given Hudi table (e.g file listings)  to avoid overhead of accessing cloud storage, during queries.
+
+`Config Class`: org.apache.hudi.common.config.HoodieMetadataConfig<br></br>
+> #### hoodie.metadata.index.column.stats.parallelism
+> Parallelism to use, when generating column stats index.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: COLUMN_STATS_INDEX_PARALLELISM`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metadata.compact.max.delta.commits
+> Controls how often the metadata table is compacted.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: COMPACT_NUM_DELTA_COMMITS`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.assume.date.partitioning
+> Should HoodieWriteClient assume the data is partitioned by dates, i.e three levels from base path. This is a stop-gap to support tables created by versions < 0.3.1. Will be removed eventually<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ASSUME_DATE_PARTITIONING`<br></br>
+> `Since Version: 0.3.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.column.stats.enable
+> Enable indexing column ranges of user data files under metadata table key lookups. When enabled, metadata table will have a partition to store the column ranges and will be used for pruning files during the index lookups.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ENABLE_METADATA_INDEX_COLUMN_STATS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.bloom.filter.column.list
+> Comma-separated list of columns for which bloom filter index will be built. If not set, only record key will be indexed.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BLOOM_FILTER_INDEX_FOR_COLUMNS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metadata.metrics.enable
+> Enable publishing of metrics around metadata table.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: METRICS_ENABLE`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.bloom.filter.file.group.count
+> Metadata bloom filter index partition file group count. This controls the size of the base and log files and read parallelism in the bloom filter index partition. The recommendation is to size the file group count such that the base files are under 1GB.<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: METADATA_INDEX_BLOOM_FILTER_FILE_GROUP_COUNT`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metadata.cleaner.commits.retained
+> Number of commits to retain, without cleaning, on metadata table.<br></br>
+> **Default Value**: 3 (Optional)<br></br>
+> `Config Param: CLEANER_COMMITS_RETAINED`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.check.timeout.seconds
+> After the async indexer has finished indexing upto the base instant, it will ensure that all inflight writers reliably write index updates as well. If this timeout expires, then the indexer will abort itself safely.<br></br>
+> **Default Value**: 900 (Optional)<br></br>
+> `Config Param: METADATA_INDEX_CHECK_TIMEOUT_SECONDS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### _hoodie.metadata.ignore.spurious.deletes
+> There are cases when extra files are requested to be deleted from metadata table which are never added before. This config determines how to handle such spurious deletes<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: IGNORE_SPURIOUS_DELETES`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.file.listing.parallelism
+> Parallelism to use, when listing the table on lake storage.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: FILE_LISTING_PARALLELISM_VALUE`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.metadata.populate.meta.fields
+> When enabled, populates all meta fields. When disabled, no meta fields are populated.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: POPULATE_META_FIELDS`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.async
+> Enable asynchronous indexing of metadata table.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ASYNC_INDEX_ENABLE`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.column.stats.column.list
+> Comma-separated list of columns for which column stats index will be built. If not set, all columns will be indexed<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: COLUMN_STATS_INDEX_FOR_COLUMNS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metadata.enable.full.scan.log.files
+> Enable full scanning of log files while reading log records. If disabled, Hudi does look up of only interested entries.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ENABLE_FULL_SCAN_LOG_FILES`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.column.stats.file.group.count
+> Metadata column stats partition file group count. This controls the size of the base and log files and read parallelism in the column stats index partition. The recommendation is to size the file group count such that the base files are under 1GB.<br></br>
+> **Default Value**: 2 (Optional)<br></br>
+> `Config Param: METADATA_INDEX_COLUMN_STATS_FILE_GROUP_COUNT`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metadata.enable
+> Enable the internal metadata table which serves table metadata like level file listings<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ENABLE`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.bloom.filter.enable
+> Enable indexing bloom filters of user data files under metadata table. When enabled, metadata table will have a partition to store the bloom filter index and will be used during the index lookups.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ENABLE_METADATA_INDEX_BLOOM_FILTER`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metadata.index.bloom.filter.parallelism
+> Parallelism to use for generating bloom filter index in metadata table.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: BLOOM_FILTER_INDEX_PARALLELISM`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metadata.clean.async
+> Enable asynchronous cleaning for metadata table<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ASYNC_CLEAN_ENABLE`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.metadata.keep.max.commits
+> Similar to hoodie.metadata.keep.min.commits, this config controls the maximum number of instants to retain in the active timeline.<br></br>
+> **Default Value**: 30 (Optional)<br></br>
+> `Config Param: MAX_COMMITS_TO_KEEP`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.metadata.insert.parallelism
+> Parallelism to use when inserting to the metadata table<br></br>
+> **Default Value**: 1 (Optional)<br></br>
+> `Config Param: INSERT_PARALLELISM_VALUE`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.metadata.dir.filter.regex
+> Directories matching this regex, will be filtered out when initializing metadata table from lake storage for the first time.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: DIR_FILTER_REGEX`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.metadata.keep.min.commits
+> Archiving service moves older entries from metadata table’s timeline into an archived log after each write, to keep the overhead constant, even as the metadata table size grows.  This config controls the minimum number of instants to retain in the active timeline.<br></br>
+> **Default Value**: 20 (Optional)<br></br>
+> `Config Param: MIN_COMMITS_TO_KEEP`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+### Consistency Guard Configurations {#Consistency-Guard-Configurations}
+
+The consistency guard related config options, to help talk to eventually consistent object storage.(Tip: S3 is NOT eventually consistent anymore!)
+
+`Config Class`: org.apache.hudi.common.fs.ConsistencyGuardConfig<br></br>
+> #### hoodie.optimistic.consistency.guard.sleep_time_ms
+> Amount of time (in ms), to wait after which we assume storage is consistent.<br></br>
+> **Default Value**: 500 (Optional)<br></br>
+> `Config Param: OPTIMISTIC_CONSISTENCY_GUARD_SLEEP_TIME_MS`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.consistency.check.max_interval_ms
+> Maximum amount of time (in ms), to wait for consistency checking.<br></br>
+> **Default Value**: 20000 (Optional)<br></br>
+> `Config Param: MAX_CHECK_INTERVAL_MS`<br></br>
+> `Since Version: 0.5.0`<br></br>
+> `Deprecated Version: 0.7.0`<br></br>
+
+---
+
+> #### _hoodie.optimistic.consistency.guard.enable
+> Enable consistency guard, which optimistically assumes consistency is achieved after a certain time period.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: OPTIMISTIC_CONSISTENCY_GUARD_ENABLE`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.consistency.check.enabled
+> Enabled to handle S3 eventual consistency issue. This property is no longer required since S3 is now strongly consistent. Will be removed in the future releases.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ENABLE`<br></br>
+> `Since Version: 0.5.0`<br></br>
+> `Deprecated Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.consistency.check.max_checks
+> Maximum number of consistency checks to perform, with exponential backoff.<br></br>
+> **Default Value**: 6 (Optional)<br></br>
+> `Config Param: MAX_CHECKS`<br></br>
+> `Since Version: 0.5.0`<br></br>
+> `Deprecated Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.consistency.check.initial_interval_ms
+> Amount of time (in ms) to wait, before checking for consistency after an operation on storage.<br></br>
+> **Default Value**: 400 (Optional)<br></br>
+> `Config Param: INITIAL_CHECK_INTERVAL_MS`<br></br>
+> `Since Version: 0.5.0`<br></br>
+> `Deprecated Version: 0.7.0`<br></br>
+
+---
+
+### FileSystem Guard Configurations {#FileSystem-Guard-Configurations}
+
+The filesystem retry related config options, to help deal with runtime exception like list/get/put/delete performance issues.
+
+`Config Class`: org.apache.hudi.common.fs.FileSystemRetryConfig<br></br>
+> #### hoodie.filesystem.operation.retry.max_interval_ms
+> Maximum amount of time (in ms), to wait for next retry.<br></br>
+> **Default Value**: 2000 (Optional)<br></br>
+> `Config Param: MAX_RETRY_INTERVAL_MS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.filesystem.operation.retry.enable
+> Enabled to handle list/get/delete etc file system performance issue.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: FILESYSTEM_RETRY_ENABLE`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.filesystem.operation.retry.max_numbers
+> Maximum number of retry actions to perform, with exponential backoff.<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: MAX_RETRY_NUMBERS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.filesystem.operation.retry.exceptions
+> The class name of the Exception that needs to be re-tryed, separated by commas. Default is empty which means retry all the IOException and RuntimeException from FileSystem<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: RETRY_EXCEPTIONS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.filesystem.operation.retry.initial_interval_ms
+> Amount of time (in ms) to wait, before retry to do operations on storage.<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: INITIAL_RETRY_INTERVAL_MS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+### Write Configurations {#Write-Configurations}
+
+Configurations that control write behavior on Hudi tables. These can be directly passed down from even higher level frameworks (e.g Spark datasources, Flink sink) and utilities (e.g DeltaStreamer).
+
+`Config Class`: org.apache.hudi.config.HoodieWriteConfig<br></br>
+> #### hoodie.combine.before.upsert
+> When upserted records share same key, controls whether they should be first combined (i.e de-duplicated) before writing to storage. This should be turned off only if you are absolutely certain that there are no duplicates incoming,  otherwise it can lead to duplicate keys and violate the uniqueness guarantees.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: COMBINE_BEFORE_UPSERT`<br></br>
+
+---
+
+> #### hoodie.write.markers.type
+> Marker type to use.  Two modes are supported: - DIRECT: individual marker file corresponding to each data file is directly created by the writer. - TIMELINE_SERVER_BASED: marker operations are all handled at the timeline service which serves as a proxy.  New marker entries are batch processed and stored in a limited number of underlying files for efficiency.  If HDFS is used or timeline server is disabled, DIRECT markers are used as fallback even if this is configure.  For Spark struct [...]
+> **Default Value**: TIMELINE_SERVER_BASED (Optional)<br></br>
+> `Config Param: MARKERS_TYPE`<br></br>
+> `Since Version: 0.9.0`<br></br>
+
+---
+
+> #### hoodie.consistency.check.max_interval_ms
+> Max time to wait between successive attempts at performing consistency checks<br></br>
+> **Default Value**: 300000 (Optional)<br></br>
+> `Config Param: MAX_CONSISTENCY_CHECK_INTERVAL_MS`<br></br>
+
+---
+
+> #### hoodie.embed.timeline.server.port
+> Port at which the timeline server listens for requests. When running embedded in each writer, it picks a free port and communicates to all the executors. This should rarely be changed.<br></br>
+> **Default Value**: 0 (Optional)<br></br>
+> `Config Param: EMBEDDED_TIMELINE_SERVER_PORT_NUM`<br></br>
+
+---
+
+> #### hoodie.auto.adjust.lock.configs
+> Auto adjust lock configurations when metadata table is enabled and for async table services.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: AUTO_ADJUST_LOCK_CONFIGS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.schema.on.read.enable
+> enable full schema evolution for hoodie<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: SCHEMA_EVOLUTION_ENABLE`<br></br>
+
+---
+
+> #### hoodie.table.services.enabled
+> Master control to disable all table services including archive, clean, compact, cluster, etc.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: TABLE_SERVICES_ENABLED`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.table.base.file.format
+> Base file format to store all the base file data.<br></br>
+> **Default Value**: PARQUET (Optional)<br></br>
+> `Config Param: BASE_FILE_FORMAT`<br></br>
+
+---
+
+> #### hoodie.avro.schema.validate
+> Validate the schema used for the write against the latest schema, for backwards compatibility.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: AVRO_SCHEMA_VALIDATE_ENABLE`<br></br>
+
+---
+
+> #### hoodie.write.buffer.limit.bytes
+> Size of in-memory buffer used for parallelizing network reads and lake storage writes.<br></br>
+> **Default Value**: 4194304 (Optional)<br></br>
+> `Config Param: WRITE_BUFFER_LIMIT_BYTES_VALUE`<br></br>
+
+---
+
+> #### hoodie.insert.shuffle.parallelism
+> Parallelism for inserting records into the table. Inserts can shuffle data before writing to tune file sizes and optimize the storage layout.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: INSERT_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.embed.timeline.server.async
+> Controls whether or not, the requests to the timeline server are processed in asynchronous fashion, potentially improving throughput.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: EMBEDDED_TIMELINE_SERVER_USE_ASYNC_ENABLE`<br></br>
+
+---
+
+> #### hoodie.rollback.parallelism
+> Parallelism for rollback of commits. Rollbacks perform delete of files or logging delete blocks to file groups on storage in parallel.<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: ROLLBACK_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.write.status.storage.level
+> Write status objects hold metadata about a write (stats, errors), that is not yet committed to storage. This controls the how that information is cached for inspection by clients. We rarely expect this to be changed.<br></br>
+> **Default Value**: MEMORY_AND_DISK_SER (Optional)<br></br>
+> `Config Param: WRITE_STATUS_STORAGE_LEVEL_VALUE`<br></br>
+
+---
+
+> #### hoodie.writestatus.class
+> Subclass of org.apache.hudi.client.WriteStatus to be used to collect information about a write. Can be overridden to collection additional metrics/statistics about the data if needed.<br></br>
+> **Default Value**: org.apache.hudi.client.WriteStatus (Optional)<br></br>
+> `Config Param: WRITE_STATUS_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.base.path
+> Base path on lake storage, under which all the table data is stored. Always prefix it explicitly with the storage scheme (e.g hdfs://, s3:// etc). Hudi stores all the main meta-data about commits, savepoints, cleaning audit logs etc in .hoodie directory under this base path directory.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BASE_PATH`<br></br>
+
+---
+
+> #### hoodie.allow.empty.commit
+> Whether to allow generation of empty commits, even if no data was written in the commit. It's useful in cases where extra metadata needs to be published regardless e.g tracking source offsets when ingesting data<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ALLOW_EMPTY_COMMIT`<br></br>
+
+---
+
+> #### hoodie.bulkinsert.user.defined.partitioner.class
+> If specified, this class will be used to re-partition records before they are bulk inserted. This can be used to sort, pack, cluster data optimally for common query patterns. For now we support a build-in user defined bulkinsert partitioner org.apache.hudi.execution.bulkinsert.RDDCustomColumnsSortPartitioner which can does sorting based on specified column values set by hoodie.bulkinsert.user.defined.partitioner.sort.columns<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BULKINSERT_USER_DEFINED_PARTITIONER_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.table.name
+> Table name that will be used for registering with metastores like HMS. Needs to be same across runs.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TBL_NAME`<br></br>
+
+---
+
+> #### hoodie.combine.before.delete
+> During delete operations, controls whether we should combine deletes (and potentially also upserts) before  writing to storage.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: COMBINE_BEFORE_DELETE`<br></br>
+
+---
+
+> #### hoodie.embed.timeline.server.threads
+> Number of threads to serve requests in the timeline server. By default, auto configured based on the number of underlying cores.<br></br>
+> **Default Value**: -1 (Optional)<br></br>
+> `Config Param: EMBEDDED_TIMELINE_NUM_SERVER_THREADS`<br></br>
+
+---
+
+> #### hoodie.fileid.prefix.provider.class
+> File Id Prefix provider class, that implements `org.apache.hudi.fileid.FileIdPrefixProvider`<br></br>
+> **Default Value**: org.apache.hudi.table.RandomFileIdPrefixProvider (Optional)<br></br>
+> `Config Param: FILEID_PREFIX_PROVIDER_CLASS`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.fail.on.timeline.archiving
+> Timeline archiving removes older instants from the timeline, after each write operation, to minimize metadata overhead. Controls whether or not, the write should be failed as well, if such archiving fails.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: FAIL_ON_TIMELINE_ARCHIVING_ENABLE`<br></br>
+
+---
+
+> #### hoodie.datasource.write.keygenerator.class
+> Key generator class, that implements `org.apache.hudi.keygen.KeyGenerator` extract a key out of incoming records.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: KEYGENERATOR_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.combine.before.insert
+> When inserted records share same key, controls whether they should be first combined (i.e de-duplicated) before writing to storage.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: COMBINE_BEFORE_INSERT`<br></br>
+
+---
+
+> #### hoodie.embed.timeline.server.gzip
+> Controls whether gzip compression is used, for large responses from the timeline server, to improve latency.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: EMBEDDED_TIMELINE_SERVER_COMPRESS_ENABLE`<br></br>
+
+---
+
+> #### hoodie.markers.timeline_server_based.batch.interval_ms
+> The batch interval in milliseconds for marker creation batch processing<br></br>
+> **Default Value**: 50 (Optional)<br></br>
+> `Config Param: MARKERS_TIMELINE_SERVER_BASED_BATCH_INTERVAL_MS`<br></br>
+> `Since Version: 0.9.0`<br></br>
+
+---
+
+> #### hoodie.markers.timeline_server_based.batch.num_threads
+> Number of threads to use for batch processing marker creation requests at the timeline server<br></br>
+> **Default Value**: 20 (Optional)<br></br>
+> `Config Param: MARKERS_TIMELINE_SERVER_BASED_BATCH_NUM_THREADS`<br></br>
+> `Since Version: 0.9.0`<br></br>
+
+---
+
+> #### _.hoodie.allow.multi.write.on.same.instant
+> <br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ALLOW_MULTI_WRITE_ON_SAME_INSTANT_ENABLE`<br></br>
+
+---
+
+> #### hoodie.datasource.write.payload.class
+> Payload class used. Override this, if you like to roll your own merge logic, when upserting/inserting. This will render any value set for PRECOMBINE_FIELD_OPT_VAL in-effective<br></br>
+> **Default Value**: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload (Optional)<br></br>
+> `Config Param: WRITE_PAYLOAD_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.bulkinsert.shuffle.parallelism
+> For large initial imports using bulk_insert operation, controls the parallelism to use for sort modes or custom partitioning donebefore writing records to the table.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: BULKINSERT_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.delete.shuffle.parallelism
+> Parallelism used for “delete” operation. Delete operations also performs shuffles, similar to upsert operation.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: DELETE_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.consistency.check.max_checks
+> Maximum number of checks, for consistency of written data.<br></br>
+> **Default Value**: 7 (Optional)<br></br>
+> `Config Param: MAX_CONSISTENCY_CHECKS`<br></br>
+
+---
+
+> #### hoodie.datasource.write.keygenerator.type
+> Easily configure one the built-in key generators, instead of specifying the key generator class.Currently supports SIMPLE, COMPLEX, TIMESTAMP, CUSTOM, NON_PARTITION, GLOBAL_DELETE<br></br>
+> **Default Value**: SIMPLE (Optional)<br></br>
+> `Config Param: KEYGENERATOR_TYPE`<br></br>
+
+---
+
+> #### hoodie.merge.allow.duplicate.on.inserts
+> When enabled, we allow duplicate keys even if inserts are routed to merge with an existing file (for ensuring file sizing). This is only relevant for insert operation, since upsert, delete operations will ensure unique key constraints are maintained.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: MERGE_ALLOW_DUPLICATE_ON_INSERTS_ENABLE`<br></br>
+
+---
+
+> #### hoodie.embed.timeline.server.reuse.enabled
+> Controls whether the timeline server instance should be cached and reused across the JVM (across task lifecycles)to avoid startup costs. This should rarely be changed.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: EMBEDDED_TIMELINE_SERVER_REUSE_ENABLED`<br></br>
+
+---
+
+> #### hoodie.datasource.write.precombine.field
+> Field used in preCombining before actual write. When two records have the same key value, we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..)<br></br>
+> **Default Value**: ts (Optional)<br></br>
+> `Config Param: PRECOMBINE_FIELD_NAME`<br></br>
+
+---
+
+> #### hoodie.bulkinsert.sort.mode
+> Sorting modes to use for sorting records for bulk insert. This is use when user hoodie.bulkinsert.user.defined.partitioner.classis not configured. Available values are - GLOBAL_SORT: this ensures best file sizes, with lowest memory overhead at cost of sorting. PARTITION_SORT: Strikes a balance by only sorting within a partition, still keeping the memory overhead of writing lowest and best effort file sizing. NONE: No sorting. Fastest and matches `spark.write.parquet()` in terms of numb [...]
+> **Default Value**: GLOBAL_SORT (Optional)<br></br>
+> `Config Param: BULK_INSERT_SORT_MODE`<br></br>
+
+---
+
+> #### hoodie.avro.schema
+> Schema string representing the current write schema of the table. Hudi passes this to implementations of HoodieRecordPayload to convert incoming records to avro. This is also used as the write schema evolving records during an update.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: AVRO_SCHEMA_STRING`<br></br>
+
+---
+
+> #### hoodie.auto.commit
+> Controls whether a write operation should auto commit. This can be turned off to perform inspection of the uncommitted write before deciding to commit.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: AUTO_COMMIT_ENABLE`<br></br>
+
+---
+
+> #### hoodie.embed.timeline.server
+> When true, spins up an instance of the timeline server (meta server that serves cached file listings, statistics),running on each writer's driver process, accepting requests during the write from executors.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: EMBEDDED_TIMELINE_SERVER_ENABLE`<br></br>
+
+---
+
+> #### hoodie.timeline.layout.version
+> Controls the layout of the timeline. Version 0 relied on renames, Version 1 (default) models the timeline as an immutable log relying only on atomic writes for object storage.<br></br>
+> **Default Value**: 1 (Optional)<br></br>
+> `Config Param: TIMELINE_LAYOUT_VERSION_NUM`<br></br>
+> `Since Version: 0.5.1`<br></br>
+
+---
+
+> #### hoodie.schema.cache.enable
+> cache query internalSchemas in driver/executor side<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ENABLE_INTERNAL_SCHEMA_CACHE`<br></br>
+
+---
+
+> #### hoodie.refresh.timeline.server.based.on.latest.commit
+> Refresh timeline in timeline server based on latest commit apart from timeline hash difference. By default (false), <br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: REFRESH_TIMELINE_SERVER_BASED_ON_LATEST_COMMIT`<br></br>
+
+---
+
+> #### hoodie.upsert.shuffle.parallelism
+> Parallelism to use for upsert operation on the table. Upserts can shuffle data to perform index lookups, file sizing, bin packing records optimallyinto file groups.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: UPSERT_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.write.schema
+> The specified write schema. In most case, we do not need set this parameter, but for the case the write schema is not equal to the specified table schema, we can specify the write schema by this parameter. Used by MergeIntoHoodieTableCommand<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: WRITE_SCHEMA`<br></br>
+
+---
+
+> #### hoodie.rollback.using.markers
+> Enables a more efficient mechanism for rollbacks based on the marker files generated during the writes. Turned on by default.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ROLLBACK_USING_MARKERS_ENABLE`<br></br>
+
+---
+
+> #### hoodie.merge.data.validation.enabled
+> When enabled, data validation checks are performed during merges to ensure expected number of records after merge operation.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: MERGE_DATA_VALIDATION_CHECK_ENABLE`<br></br>
+
+---
+
+> #### hoodie.internal.schema
+> Schema string representing the latest schema of the table. Hudi passes this to implementations of evolution of schema<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: INTERNAL_SCHEMA_STRING`<br></br>
+
+---
+
+> #### hoodie.client.heartbeat.tolerable.misses
+> Number of heartbeat misses, before a writer is deemed not alive and all pending writes are aborted.<br></br>
+> **Default Value**: 2 (Optional)<br></br>
+> `Config Param: CLIENT_HEARTBEAT_NUM_TOLERABLE_MISSES`<br></br>
+
+---
+
+> #### hoodie.write.concurrency.mode
+> Enable different concurrency modes. Options are SINGLE_WRITER: Only one active writer to the table. Maximizes throughputOPTIMISTIC_CONCURRENCY_CONTROL: Multiple writers can operate on the table and exactly one of them succeed if a conflict (writes affect the same file group) is detected.<br></br>
+> **Default Value**: SINGLE_WRITER (Optional)<br></br>
+> `Config Param: WRITE_CONCURRENCY_MODE`<br></br>
+
+---
+
+> #### hoodie.markers.delete.parallelism
+> Determines the parallelism for deleting marker files, which are used to track all files (valid or invalid/partial) written during a write operation. Increase this value if delays are observed, with large batch writes.<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: MARKERS_DELETE_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.release.resource.on.completion.enable
+> Control to enable release all persist rdds when the spark job finish.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: RELEASE_RESOURCE_ENABLE`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.bulkinsert.user.defined.partitioner.sort.columns
+> Columns to sort the data by when use org.apache.hudi.execution.bulkinsert.RDDCustomColumnsSortPartitioner as user defined partitioner during bulk_insert. For example 'column1,column2'<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BULKINSERT_USER_DEFINED_PARTITIONER_SORT_COLUMNS`<br></br>
+
+---
+
+> #### hoodie.finalize.write.parallelism
+> Parallelism for the write finalization internal operation, which involves removing any partially written files from lake storage, before committing the write. Reduce this value, if the high number of tasks incur delays for smaller tables or low latency writes.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: FINALIZE_WRITE_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.merge.small.file.group.candidates.limit
+> Limits number of file groups, whose base file satisfies small-file limit, to consider for appending records during upsert operation. Only applicable to MOR tables<br></br>
+> **Default Value**: 1 (Optional)<br></br>
+> `Config Param: MERGE_SMALL_FILE_GROUP_CANDIDATES_LIMIT`<br></br>
+
+---
+
+> #### hoodie.client.heartbeat.interval_in_ms
+> Writers perform heartbeats to indicate liveness. Controls how often (in ms), such heartbeats are registered to lake storage.<br></br>
+> **Default Value**: 60000 (Optional)<br></br>
+> `Config Param: CLIENT_HEARTBEAT_INTERVAL_IN_MS`<br></br>
+
+---
+
+> #### hoodie.allow.operation.metadata.field
+> Whether to include '_hoodie_operation' in the metadata fields. Once enabled, all the changes of a record are persisted to the delta log directly without merge<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ALLOW_OPERATION_METADATA_FIELD`<br></br>
+> `Since Version: 0.9.0`<br></br>
+
+---
+
+> #### hoodie.consistency.check.initial_interval_ms
+> Initial time between successive attempts to ensure written data's metadata is consistent on storage. Grows with exponential backoff after the initial value.<br></br>
+> **Default Value**: 2000 (Optional)<br></br>
+> `Config Param: INITIAL_CONSISTENCY_CHECK_INTERVAL_MS`<br></br>
+
+---
+
+> #### hoodie.avro.schema.external.transformation
+> When enabled, records in older schema are rewritten into newer schema during upsert,delete and background compaction,clustering operations.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: AVRO_EXTERNAL_SCHEMA_TRANSFORMATION_ENABLE`<br></br>
+
+---
+
+### Key Generator Options {#Key-Generator-Options}
+
+Hudi maintains keys (record key + partition path) for uniquely identifying a particular record. This config allows developers to setup the Key generator class that will extract these out of incoming records.
+
+`Config Class`: org.apache.hudi.keygen.constant.KeyGeneratorOptions<br></br>
+> #### hoodie.datasource.write.partitionpath.urlencode
+> Should we url encode the partition path value, before creating the folder structure.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: URL_ENCODE_PARTITIONING`<br></br>
+
+---
+
+> #### hoodie.datasource.write.hive_style_partitioning
+> Flag to indicate whether to use Hive style partitioning.
+If set true, the names of partition folders follow <partition_column_name>=<partition_value> format.
+By default false (the names of partition folders are only partition values)<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_STYLE_PARTITIONING_ENABLE`<br></br>
+
+---
+
+> #### hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled
+> When set to true, consistent value will be generated for a logical timestamp type column, like timestamp-millis and timestamp-micros, irrespective of whether row-writer is enabled. Disabled by default so as not to break the pipeline that deploy either fully row-writer path or non row-writer path. For example, if it is kept disabled then record key of timestamp type with value `2016-12-29 09:54:00` will be written as timestamp `2016-12-29 09:54:00.0` in row-writer path, while it will be [...]
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED`<br></br>
+
+---
+
+> #### hoodie.datasource.write.partitionpath.field
+> Partition path field. Value to be used at the partitionPath component of HoodieKey. Actual value ontained by invoking .toString()<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITIONPATH_FIELD_NAME`<br></br>
+
+---
+
+> #### hoodie.datasource.write.recordkey.field
+> Record key field. Value to be used as the `recordKey` component of `HoodieKey`.
+Actual value will be obtained by invoking .toString() on the field value. Nested fields can be specified using
+the dot notation eg: `a.b.c`<br></br>
+> **Default Value**: uuid (Optional)<br></br>
+> `Config Param: RECORDKEY_FIELD_NAME`<br></br>
+
+---
+
+### HBase Index Configs {#HBase-Index-Configs}
+
+Configurations that control indexing behavior (when HBase based indexing is enabled), which tags incoming records as either inserts or updates to older records.
+
+`Config Class`: org.apache.hudi.config.HoodieHBaseIndexConfig<br></br>
+> #### hoodie.index.hbase.zkport
+> Only applies if index type is HBASE. HBase ZK Quorum port to connect to<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: ZKPORT`<br></br>
+
+---
+
+> #### hoodie.hbase.index.update.partition.path
+> Only applies if index type is HBASE. When an already existing record is upserted to a new partition compared to whats in storage, this config when set, will delete old record in old partition and will insert it as new record in new partition.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: UPDATE_PARTITION_PATH_ENABLE`<br></br>
+
+---
+
+> #### hoodie.index.hbase.qps.allocator.class
+> Property to set which implementation of HBase QPS resource allocator to be used, whichcontrols the batching rate dynamically.<br></br>
+> **Default Value**: org.apache.hudi.index.hbase.DefaultHBaseQPSResourceAllocator (Optional)<br></br>
+> `Config Param: QPS_ALLOCATOR_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.index.hbase.put.batch.size.autocompute
+> Property to set to enable auto computation of put batch size<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: PUT_BATCH_SIZE_AUTO_COMPUTE`<br></br>
+
+---
+
+> #### hoodie.index.hbase.rollback.sync
+> When set to true, the rollback method will delete the last failed task index. The default value is false. Because deleting the index will add extra load on the Hbase cluster for each rollback<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ROLLBACK_SYNC_ENABLE`<br></br>
+
+---
+
+> #### hoodie.index.hbase.get.batch.size
+> Controls the batch size for performing gets against HBase. Batching improves throughput, by saving round trips.<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: GET_BATCH_SIZE`<br></br>
+
+---
+
+> #### hoodie.index.hbase.zkpath.qps_root
+> chroot in zookeeper, to use for all qps allocation co-ordination.<br></br>
+> **Default Value**: /QPS_ROOT (Optional)<br></br>
+> `Config Param: ZKPATH_QPS_ROOT`<br></br>
+
+---
+
+> #### hoodie.index.hbase.max.qps.per.region.server
+> Property to set maximum QPS allowed per Region Server. This should be same across various jobs. This is intended to
+ limit the aggregate QPS generated across various jobs to an Hbase Region Server. It is recommended to set this
+ value based on global indexing throughput needs and most importantly, how much the HBase installation in use is
+ able to tolerate without Region Servers going down.<br></br>
+> **Default Value**: 1000 (Optional)<br></br>
+> `Config Param: MAX_QPS_PER_REGION_SERVER`<br></br>
+
+---
+
+> #### hoodie.index.hbase.max.qps.fraction
+> Maximum for HBASE_QPS_FRACTION_PROP to stabilize skewed write workloads<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: MAX_QPS_FRACTION`<br></br>
+
+---
+
+> #### hoodie.index.hbase.min.qps.fraction
+> Minimum for HBASE_QPS_FRACTION_PROP to stabilize skewed write workloads<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: MIN_QPS_FRACTION`<br></br>
+
+---
+
+> #### hoodie.index.hbase.zk.connection_timeout_ms
+> Timeout to use for establishing connection with zookeeper, from HBase client.<br></br>
+> **Default Value**: 15000 (Optional)<br></br>
+> `Config Param: ZK_CONNECTION_TIMEOUT_MS`<br></br>
+
+---
+
+> #### hoodie.index.hbase.table
+> Only applies if index type is HBASE. HBase Table name to use as the index. Hudi stores the row_key and [partition_path, fileID, commitTime] mapping in the table<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TABLENAME`<br></br>
+
+---
+
+> #### hoodie.index.hbase.dynamic_qps
+> Property to decide if HBASE_QPS_FRACTION_PROP is dynamically calculated based on write volume.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: COMPUTE_QPS_DYNAMICALLY`<br></br>
+
+---
+
+> #### hoodie.index.hbase.zknode.path
+> Only applies if index type is HBASE. This is the root znode that will contain all the znodes created/used by HBase<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: ZK_NODE_PATH`<br></br>
+
+---
+
+> #### hoodie.index.hbase.zkquorum
+> Only applies if index type is HBASE. HBase ZK Quorum url to connect to<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: ZKQUORUM`<br></br>
+
+---
+
+> #### hoodie.index.hbase.qps.fraction
+> Property to set the fraction of the global share of QPS that should be allocated to this job. Let's say there are 3 jobs which have input size in terms of number of rows required for HbaseIndexing as x, 2x, 3x respectively. Then this fraction for the jobs would be (0.17) 1/6, 0.33 (2/6) and 0.5 (3/6) respectively. Default is 50%, which means a total of 2 jobs can run using HbaseIndex without overwhelming Region Servers.<br></br>
+> **Default Value**: 0.5 (Optional)<br></br>
+> `Config Param: QPS_FRACTION`<br></br>
+
+---
+
+> #### hoodie.index.hbase.zk.session_timeout_ms
+> Session timeout value to use for Zookeeper failure detection, for the HBase client.Lower this value, if you want to fail faster.<br></br>
+> **Default Value**: 60000 (Optional)<br></br>
+> `Config Param: ZK_SESSION_TIMEOUT_MS`<br></br>
+
+---
+
+> #### hoodie.index.hbase.put.batch.size
+> Controls the batch size for performing puts against HBase. Batching improves throughput, by saving round trips.<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: PUT_BATCH_SIZE`<br></br>
+
+---
+
+> #### hoodie.index.hbase.desired_puts_time_in_secs
+> <br></br>
+> **Default Value**: 600 (Optional)<br></br>
+> `Config Param: DESIRED_PUTS_TIME_IN_SECONDS`<br></br>
+
+---
+
+> #### hoodie.index.hbase.sleep.ms.for.put.batch
+> <br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: SLEEP_MS_FOR_PUT_BATCH`<br></br>
+
+---
+
+> #### hoodie.index.hbase.sleep.ms.for.get.batch
+> <br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: SLEEP_MS_FOR_GET_BATCH`<br></br>
+
+---
+
+### Write commit pulsar callback configs {#Write-commit-pulsar-callback-configs}
+
+Controls notifications sent to pulsar, on events happening to a hudi table.
+
+`Config Class`: org.apache.hudi.utilities.callback.pulsar.HoodieWriteCommitPulsarCallbackConfig<br></br>
+> #### hoodie.write.commit.callback.pulsar.operation-timeout
+> Duration of waiting for completing an operation.<br></br>
+> **Default Value**: 30s (Optional)<br></br>
+> `Config Param: OPERATION_TIMEOUT`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.pulsar.topic
+> pulsar topic name to publish timeline activity into.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TOPIC`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.pulsar.producer.block-if-queue-full
+> When the queue is full, the method is blocked instead of an exception is thrown.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: PRODUCER_BLOCK_QUEUE_FULL`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.pulsar.producer.send-timeout
+> The timeout in each sending to pulsar.<br></br>
+> **Default Value**: 30s (Optional)<br></br>
+> `Config Param: PRODUCER_SEND_TIMEOUT`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.pulsar.broker.service.url
+> Server's url of pulsar cluster, to be used for publishing commit metadata.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BROKER_SERVICE_URL`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.pulsar.keepalive-interval
+> Duration of keeping alive interval for each client broker connection.<br></br>
+> **Default Value**: 30s (Optional)<br></br>
+> `Config Param: KEEPALIVE_INTERVAL`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.pulsar.producer.pending-total-size
+> The maximum number of pending messages across partitions.<br></br>
+> **Default Value**: 50000 (Optional)<br></br>
+> `Config Param: PRODUCER_PENDING_SIZE`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.pulsar.request-timeout
+> Duration of waiting for completing a request.<br></br>
+> **Default Value**: 60s (Optional)<br></br>
+> `Config Param: REQUEST_TIMEOUT`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.pulsar.producer.pending-queue-size
+> The maximum size of a queue holding pending messages.<br></br>
+> **Default Value**: 1000 (Optional)<br></br>
+> `Config Param: PRODUCER_PENDING_QUEUE_SIZE`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.pulsar.producer.route-mode
+> Message routing logic for producers on partitioned topics.<br></br>
+> **Default Value**: RoundRobinPartition (Optional)<br></br>
+> `Config Param: PRODUCER_ROUTE_MODE`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.pulsar.connection-timeout
+> Duration of waiting for a connection to a broker to be established.<br></br>
+> **Default Value**: 10s (Optional)<br></br>
+> `Config Param: CONNECTION_TIMEOUT`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+### Write commit Kafka callback configs {#Write-commit-Kafka-callback-configs}
+
+Controls notifications sent to Kafka, on events happening to a hudi table.
+
+`Config Class`: org.apache.hudi.utilities.callback.kafka.HoodieWriteCommitKafkaCallbackConfig<br></br>
+> #### hoodie.write.commit.callback.kafka.topic
+> Kafka topic name to publish timeline activity into.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: TOPIC`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.kafka.partition
+> It may be desirable to serialize all changes into a single Kafka partition  for providing strict ordering. By default, Kafka messages are keyed by table name, which  guarantees ordering at the table level, but not globally (or when new partitions are added)<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITION`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.kafka.retries
+> Times to retry the produce. 3 by default<br></br>
+> **Default Value**: 3 (Optional)<br></br>
+> `Config Param: RETRIES`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.kafka.acks
+> kafka acks level, all by default to ensure strong durability.<br></br>
+> **Default Value**: all (Optional)<br></br>
+> `Config Param: ACKS`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.write.commit.callback.kafka.bootstrap.servers
+> Bootstrap servers of kafka cluster, to be used for publishing commit metadata.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BOOTSTRAP_SERVERS`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+### Locks Configurations {#Locks-Configurations}
+
+Configs that control locking mechanisms required for concurrency control  between writers to a Hudi table. Concurrency between Hudi's own table services  are auto managed internally.
+
+`Config Class`: org.apache.hudi.config.HoodieLockConfig<br></br>
+> #### hoodie.write.lock.zookeeper.base_path
+> The base path on Zookeeper under which to create lock related ZNodes. This should be same for all concurrent writers to the same table<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: ZK_BASE_PATH`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.zookeeper.lock_key
+> Key name under base_path at which to create a ZNode and acquire lock. Final path on zk will look like base_path/lock_key. If this parameter is not set, we would set it as the table name<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: ZK_LOCK_KEY`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.client.num_retries
+> Maximum number of times to retry to acquire lock additionally from the lock manager.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: LOCK_ACQUIRE_CLIENT_NUM_RETRIES`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.wait_time_ms_between_retry
+> Initial amount of time to wait between retries to acquire locks,  subsequent retries will exponentially backoff.<br></br>
+> **Default Value**: 1000 (Optional)<br></br>
+> `Config Param: LOCK_ACQUIRE_RETRY_WAIT_TIME_IN_MILLIS`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.num_retries
+> Maximum number of times to retry lock acquire, at each lock provider<br></br>
+> **Default Value**: 15 (Optional)<br></br>
+> `Config Param: LOCK_ACQUIRE_NUM_RETRIES`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.wait_time_ms
+> Timeout in ms, to wait on an individual lock acquire() call, at the lock provider.<br></br>
+> **Default Value**: 60000 (Optional)<br></br>
+> `Config Param: LOCK_ACQUIRE_WAIT_TIMEOUT_MS`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.zookeeper.connection_timeout_ms
+> Timeout in ms, to wait for establishing connection with Zookeeper.<br></br>
+> **Default Value**: 15000 (Optional)<br></br>
+> `Config Param: ZK_CONNECTION_TIMEOUT_MS`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.zookeeper.port
+> Zookeeper port to connect to.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: ZK_PORT`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.hivemetastore.table
+> For Hive based lock provider, the Hive table to acquire lock against<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HIVE_TABLE_NAME`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.zookeeper.url
+> Zookeeper URL to connect to.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: ZK_CONNECT_URL`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.filesystem.path
+> For DFS based lock providers, path to store the locks under.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: FILESYSTEM_LOCK_PATH`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.provider
+> Lock provider class name, user can provide their own implementation of LockProvider which should be subclass of org.apache.hudi.common.lock.LockProvider<br></br>
+> **Default Value**: org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider (Optional)<br></br>
+> `Config Param: LOCK_PROVIDER_CLASS_NAME`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.zookeeper.session_timeout_ms
+> Timeout in ms, to wait after losing connection to ZooKeeper, before the session is expired<br></br>
+> **Default Value**: 60000 (Optional)<br></br>
+> `Config Param: ZK_SESSION_TIMEOUT_MS`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.conflict.resolution.strategy
+> Lock provider class name, this should be subclass of org.apache.hudi.client.transaction.ConflictResolutionStrategy<br></br>
+> **Default Value**: org.apache.hudi.client.transaction.SimpleConcurrentFileWritesConflictResolutionStrategy (Optional)<br></br>
+> `Config Param: WRITE_CONFLICT_RESOLUTION_STRATEGY_CLASS_NAME`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.hivemetastore.database
+> For Hive based lock provider, the Hive database to acquire lock against<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HIVE_DATABASE_NAME`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.hivemetastore.uris
+> For Hive based lock provider, the Hive metastore URI to acquire locks against.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HIVE_METASTORE_URI`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.max_wait_time_ms_between_retry
+> Maximum amount of time to wait between retries by lock provider client. This bounds the maximum delay from the exponential backoff. Currently used by ZK based lock provider only.<br></br>
+> **Default Value**: 5000 (Optional)<br></br>
+> `Config Param: LOCK_ACQUIRE_RETRY_MAX_WAIT_TIME_IN_MILLIS`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+> #### hoodie.write.lock.client.wait_time_ms_between_retry
+> Amount of time to wait between retries on the lock provider by the lock manager<br></br>
+> **Default Value**: 10000 (Optional)<br></br>
+> `Config Param: LOCK_ACQUIRE_CLIENT_RETRY_WAIT_TIME_IN_MILLIS`<br></br>
+> `Since Version: 0.8.0`<br></br>
+
+---
+
+### Compaction Configs {#Compaction-Configs}
+
+Configurations that control compaction (merging of log files onto a new base files) as well as  cleaning (reclamation of older/unused file groups/slices).
+
+`Config Class`: org.apache.hudi.config.HoodieCompactionConfig<br></br>
+> #### hoodie.compaction.payload.class
+> This needs to be same as class used during insert/upserts. Just like writing, compaction also uses the record payload class to merge records in the log against each other, merge again with the base file and produce the final record to be written after compaction.<br></br>
+> **Default Value**: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload (Optional)<br></br>
+> `Config Param: PAYLOAD_CLASS_NAME`<br></br>
+
+---
+
+> #### hoodie.copyonwrite.record.size.estimate
+> The average record size. If not explicitly specified, hudi will compute the record size estimate compute dynamically based on commit metadata.  This is critical in computing the insert parallelism and bin-packing inserts into small files.<br></br>
+> **Default Value**: 1024 (Optional)<br></br>
+> `Config Param: COPY_ON_WRITE_RECORD_SIZE_ESTIMATE`<br></br>
+
+---
+
+> #### hoodie.cleaner.policy
+> Cleaning policy to be used. The cleaner service deletes older file slices files to re-claim space. By default, cleaner spares the file slices written by the last N commits, determined by  hoodie.cleaner.commits.retained Long running query plans may often refer to older file slices and will break if those are cleaned, before the query has had   a chance to run. So, it is good to make sure that the data is retained for more than the maximum query execution time<br></br>
+> **Default Value**: KEEP_LATEST_COMMITS (Optional)<br></br>
+> `Config Param: CLEANER_POLICY`<br></br>
+
+---
+
+> #### hoodie.compact.inline.max.delta.seconds
+> Number of elapsed seconds after the last compaction, before scheduling a new one.<br></br>
+> **Default Value**: 3600 (Optional)<br></br>
+> `Config Param: INLINE_COMPACT_TIME_DELTA_SECONDS`<br></br>
+
+---
+
+> #### hoodie.cleaner.delete.bootstrap.base.file
+> When set to true, cleaner also deletes the bootstrap base file when it's skeleton base file is  cleaned. Turn this to true, if you want to ensure the bootstrap dataset storage is reclaimed over time, as the table receives updates/deletes. Another reason to turn this on, would be to ensure data residing in bootstrap  base files are also physically deleted, to comply with data privacy enforcement processes.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: CLEANER_BOOTSTRAP_BASE_FILE_ENABLE`<br></br>
+
+---
+
+> #### hoodie.archive.merge.enable
+> When enable, hoodie will auto merge several small archive files into larger one. It's useful when storage scheme doesn't support append operation.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ARCHIVE_MERGE_ENABLE`<br></br>
+
+---
+
+> #### hoodie.cleaner.commits.retained
+> Number of commits to retain, without cleaning. This will be retained for num_of_commits * time_between_commits (scheduled). This also directly translates into how much data retention the table supports for incremental queries.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: CLEANER_COMMITS_RETAINED`<br></br>
+
+---
+
+> #### hoodie.cleaner.policy.failed.writes
+> Cleaning policy for failed writes to be used. Hudi will delete any files written by failed writes to re-claim space. Choose to perform this rollback of failed writes eagerly before every writer starts (only supported for single writer) or lazily by the cleaner (required for multi-writers)<br></br>
+> **Default Value**: EAGER (Optional)<br></br>
+> `Config Param: FAILED_WRITES_CLEANER_POLICY`<br></br>
+
+---
+
+> #### hoodie.compaction.logfile.size.threshold
+> Only if the log file size is greater than the threshold in bytes, the file group will be compacted.<br></br>
+> **Default Value**: 0 (Optional)<br></br>
+> `Config Param: COMPACTION_LOG_FILE_SIZE_THRESHOLD`<br></br>
+
+---
+
+> #### hoodie.clean.async
+> Only applies when hoodie.clean.automatic is turned on. When turned on runs cleaner async with writing, which can speed up overall write performance.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ASYNC_CLEAN`<br></br>
+
+---
+
+> #### hoodie.clean.automatic
+> When enabled, the cleaner table service is invoked immediately after each commit, to delete older file slices. It's recommended to enable this, to ensure metadata and data storage growth is bounded.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: AUTO_CLEAN`<br></br>
+
+---
+
+> #### hoodie.commits.archival.batch
+> Archiving of instants is batched in best-effort manner, to pack more instants into a single archive log. This config controls such archival batch size.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: COMMITS_ARCHIVAL_BATCH_SIZE`<br></br>
+
+---
+
+> #### hoodie.compaction.reverse.log.read
+> HoodieLogFormatReader reads a logfile in the forward direction starting from pos=0 to pos=file_length. If this config is set to true, the reader reads the logfile in reverse direction, from pos=file_length to pos=0<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: COMPACTION_REVERSE_LOG_READ_ENABLE`<br></br>
+
+---
+
+> #### hoodie.clean.allow.multiple
+> Allows scheduling/executing multiple cleans by enabling this config. If users prefer to strictly ensure clean requests should be mutually exclusive, .i.e. a 2nd clean will not be scheduled if another clean is not yet completed to avoid repeat cleaning of same files, they might want to disable this config.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ALLOW_MULTIPLE_CLEANS`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.archive.merge.small.file.limit.bytes
+> This config sets the archive file size limit below which an archive file becomes a candidate to be selected as such a small file.<br></br>
+> **Default Value**: 20971520 (Optional)<br></br>
+> `Config Param: ARCHIVE_MERGE_SMALL_FILE_LIMIT_BYTES`<br></br>
+
+---
+
+> #### hoodie.cleaner.fileversions.retained
+> When KEEP_LATEST_FILE_VERSIONS cleaning policy is used,  the minimum number of file slices to retain in each file group, during cleaning.<br></br>
+> **Default Value**: 3 (Optional)<br></br>
+> `Config Param: CLEANER_FILE_VERSIONS_RETAINED`<br></br>
+
+---
+
+> #### hoodie.compact.inline
+> When set to true, compaction service is triggered after each write. While being  simpler operationally, this adds extra latency on the write path.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: INLINE_COMPACT`<br></br>
+
+---
+
+> #### hoodie.clean.max.commits
+> Number of commits after the last clean operation, before scheduling of a new clean is attempted.<br></br>
+> **Default Value**: 1 (Optional)<br></br>
+> `Config Param: CLEAN_MAX_COMMITS`<br></br>
+
+---
+
+> #### hoodie.compaction.lazy.block.read
+> When merging the delta log files, this config helps to choose whether the log blocks should be read lazily or not. Choose true to use lazy block reading (low memory usage, but incurs seeks to each block header) or false for immediate block read (higher memory usage)<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: COMPACTION_LAZY_BLOCK_READ_ENABLE`<br></br>
+
+---
+
+> #### hoodie.archive.merge.files.batch.size
+> The number of small archive files to be merged at once.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: ARCHIVE_MERGE_FILES_BATCH_SIZE`<br></br>
+
+---
+
+> #### hoodie.archive.async
+> Only applies when hoodie.archive.automatic is turned on. When turned on runs archiver async with writing, which can speed up overall write performance.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ASYNC_ARCHIVE`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.parquet.small.file.limit
+> During upsert operation, we opportunistically expand existing small files on storage, instead of writing new files, to keep number of files to an optimum. This config sets the file size limit below which a file on storage  becomes a candidate to be selected as such a `small file`. By default, treat any file <= 100MB as a small file. Also note that if this set <= 0, will not try to get small files and directly write new files<br></br>
+> **Default Value**: 104857600 (Optional)<br></br>
+> `Config Param: PARQUET_SMALL_FILE_LIMIT`<br></br>
+
+---
+
+> #### hoodie.compaction.strategy
+> Compaction strategy decides which file groups are picked up for compaction during each compaction run. By default. Hudi picks the log file with most accumulated unmerged data<br></br>
+> **Default Value**: org.apache.hudi.table.action.compact.strategy.LogFileSizeBasedCompactionStrategy (Optional)<br></br>
+> `Config Param: COMPACTION_STRATEGY`<br></br>
+
+---
+
+> #### hoodie.cleaner.hours.retained
+> Number of hours for which commits need to be retained. This config provides a more flexible option ascompared to number of commits retained for cleaning service. Setting this property ensures all the files, but the latest in a file group, corresponding to commits with commit times older than the configured number of hours to be retained are cleaned.<br></br>
+> **Default Value**: 24 (Optional)<br></br>
+> `Config Param: CLEANER_HOURS_RETAINED`<br></br>
+
+---
+
+> #### hoodie.compaction.target.io
+> Amount of MBs to spend during compaction run for the LogFileSizeBasedCompactionStrategy. This value helps bound ingestion latency while compaction is run inline mode.<br></br>
+> **Default Value**: 512000 (Optional)<br></br>
+> `Config Param: TARGET_IO_PER_COMPACTION_IN_MB`<br></br>
+
+---
+
+> #### hoodie.archive.automatic
+> When enabled, the archival table service is invoked immediately after each commit, to archive commits if we cross a maximum value of commits. It's recommended to enable this, to ensure number of active commits is bounded.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: AUTO_ARCHIVE`<br></br>
+
+---
+
+> #### hoodie.clean.trigger.strategy
+> Controls how cleaning is scheduled. Valid options: NUM_COMMITS<br></br>
+> **Default Value**: NUM_COMMITS (Optional)<br></br>
+> `Config Param: CLEAN_TRIGGER_STRATEGY`<br></br>
+
+---
+
+> #### hoodie.compaction.preserve.commit.metadata
+> When rewriting data, preserves existing hoodie_commit_time<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: PRESERVE_COMMIT_METADATA`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.copyonwrite.insert.auto.split
+> Config to control whether we control insert split sizes automatically based on average record sizes. It's recommended to keep this turned on, since hand tuning is otherwise extremely cumbersome.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: COPY_ON_WRITE_AUTO_SPLIT_INSERTS`<br></br>
+
+---
+
+> #### hoodie.compact.inline.max.delta.commits
+> Number of delta commits after the last compaction, before scheduling of a new compaction is attempted.<br></br>
+> **Default Value**: 5 (Optional)<br></br>
+> `Config Param: INLINE_COMPACT_NUM_DELTA_COMMITS`<br></br>
+
+---
+
+> #### hoodie.keep.min.commits
+> Similar to hoodie.keep.max.commits, but controls the minimum number ofinstants to retain in the active timeline.<br></br>
+> **Default Value**: 20 (Optional)<br></br>
+> `Config Param: MIN_COMMITS_TO_KEEP`<br></br>
+
+---
+
+> #### hoodie.cleaner.parallelism
+> Parallelism for the cleaning operation. Increase this if cleaning becomes slow.<br></br>
+> **Default Value**: 200 (Optional)<br></br>
+> `Config Param: CLEANER_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.cleaner.incremental.mode
+> When enabled, the plans for each cleaner service run is computed incrementally off the events  in the timeline, since the last cleaner run. This is much more efficient than obtaining listings for the full table for each planning (even with a metadata table).<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: CLEANER_INCREMENTAL_MODE_ENABLE`<br></br>
+
+---
+
+> #### hoodie.record.size.estimation.threshold
+> We use the previous commits' metadata to calculate the estimated record size and use it  to bin pack records into partitions. If the previous commit is too small to make an accurate estimation,  Hudi will search commits in the reverse order, until we find a commit that has totalBytesWritten  larger than (PARQUET_SMALL_FILE_LIMIT_BYTES * this_threshold)<br></br>
+> **Default Value**: 1.0 (Optional)<br></br>
+> `Config Param: RECORD_SIZE_ESTIMATION_THRESHOLD`<br></br>
+
+---
+
+> #### hoodie.compact.inline.trigger.strategy
+> Controls how compaction scheduling is triggered, by time or num delta commits or combination of both. Valid options: NUM_COMMITS,TIME_ELAPSED,NUM_AND_TIME,NUM_OR_TIME<br></br>
+> **Default Value**: NUM_COMMITS (Optional)<br></br>
+> `Config Param: INLINE_COMPACT_TRIGGER_STRATEGY`<br></br>
+
+---
+
+> #### hoodie.keep.max.commits
+> Archiving service moves older entries from timeline into an archived log after each write, to  keep the metadata overhead constant, even as the table size grows.This config controls the maximum number of instants to retain in the active timeline. <br></br>
+> **Default Value**: 30 (Optional)<br></br>
+> `Config Param: MAX_COMMITS_TO_KEEP`<br></br>
+
+---
+
+> #### hoodie.archive.delete.parallelism
+> Parallelism for deleting archived hoodie commits.<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: DELETE_ARCHIVED_INSTANT_PARALLELISM_VALUE`<br></br>
+
+---
+
+> #### hoodie.copyonwrite.insert.split.size
+> Number of inserts assigned for each partition/bucket for writing. We based the default on writing out 100MB files, with at least 1kb records (100K records per file), and   over provision to 500K. As long as auto-tuning of splits is turned on, this only affects the first   write, where there is no history to learn record sizes from.<br></br>
+> **Default Value**: 500000 (Optional)<br></br>
+> `Config Param: COPY_ON_WRITE_INSERT_SPLIT_SIZE`<br></br>
+
+---
+
+> #### hoodie.compact.schedule.inline
+> When set to true, compaction service will be attempted for inline scheduling after each write. Users have to ensure they have a separate job to run async compaction(execution) for the one scheduled by this writer. Users can choose to set both `hoodie.compact.inline` and `hoodie.compact.schedule.inline` to false and have both scheduling and execution triggered by any async process. But if `hoodie.compact.inline` is set to false, and `hoodie.compact.schedule.inline` is set to true, regul [...]
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: SCHEDULE_INLINE_COMPACT`<br></br>
+
+---
+
+> #### hoodie.compaction.daybased.target.partitions
+> Used by org.apache.hudi.io.compact.strategy.DayBasedCompactionStrategy to denote the number of latest partitions to compact during a compaction run.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: TARGET_PARTITIONS_PER_DAYBASED_COMPACTION`<br></br>
+
+---
+
+### File System View Storage Configurations {#File-System-View-Storage-Configurations}
+
+Configurations that control how file metadata is stored by Hudi, for transaction processing and queries.
+
+`Config Class`: org.apache.hudi.common.table.view.FileSystemViewStorageConfig<br></br>
+> #### hoodie.filesystem.view.spillable.replaced.mem.fraction
+> Fraction of the file system view memory, to be used for holding replace commit related metadata.<br></br>
+> **Default Value**: 0.01 (Optional)<br></br>
+> `Config Param: SPILLABLE_REPLACED_MEM_FRACTION`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.spillable.dir
+> Path on local storage to use, when file system view is held in a spillable map.<br></br>
+> **Default Value**: /tmp/ (Optional)<br></br>
+> `Config Param: SPILLABLE_DIR`<br></br>
+
+---
+
+> #### hoodie.filesystem.remote.backup.view.enable
+> Config to control whether backup needs to be configured if clients were not able to reach timeline service.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: REMOTE_BACKUP_VIEW_ENABLE`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.spillable.compaction.mem.fraction
+> Fraction of the file system view memory, to be used for holding compaction related metadata.<br></br>
+> **Default Value**: 0.8 (Optional)<br></br>
+> `Config Param: SPILLABLE_COMPACTION_MEM_FRACTION`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.spillable.mem
+> Amount of memory to be used in bytes for holding file system view, before spilling to disk.<br></br>
+> **Default Value**: 104857600 (Optional)<br></br>
+> `Config Param: SPILLABLE_MEMORY`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.secondary.type
+> Specifies the secondary form of storage for file system view, if the primary (e.g timeline server)  is unavailable.<br></br>
+> **Default Value**: MEMORY (Optional)<br></br>
+> `Config Param: SECONDARY_VIEW_TYPE`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.remote.host
+> We expect this to be rarely hand configured.<br></br>
+> **Default Value**: localhost (Optional)<br></br>
+> `Config Param: REMOTE_HOST_NAME`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.type
+> File system view provides APIs for viewing the files on the underlying lake storage,  as file groups and file slices. This config controls how such a view is held. Options include MEMORY,SPILLABLE_DISK,EMBEDDED_KV_STORE,REMOTE_ONLY,REMOTE_FIRST which provide different trade offs for memory usage and API request performance.<br></br>
+> **Default Value**: MEMORY (Optional)<br></br>
+> `Config Param: VIEW_TYPE`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.remote.timeout.secs
+> Timeout in seconds, to wait for API requests against a remote file system view. e.g timeline server.<br></br>
+> **Default Value**: 300 (Optional)<br></br>
+> `Config Param: REMOTE_TIMEOUT_SECS`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.remote.port
+> Port to serve file system view queries, when remote. We expect this to be rarely hand configured.<br></br>
+> **Default Value**: 26754 (Optional)<br></br>
+> `Config Param: REMOTE_PORT_NUM`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.spillable.bootstrap.base.file.mem.fraction
+> Fraction of the file system view memory, to be used for holding mapping to bootstrap base files.<br></br>
+> **Default Value**: 0.05 (Optional)<br></br>
+> `Config Param: BOOTSTRAP_BASE_FILE_MEM_FRACTION`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.spillable.clustering.mem.fraction
+> Fraction of the file system view memory, to be used for holding clustering related metadata.<br></br>
+> **Default Value**: 0.01 (Optional)<br></br>
+> `Config Param: SPILLABLE_CLUSTERING_MEM_FRACTION`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.rocksdb.base.path
+> Path on local storage to use, when storing file system view in embedded kv store/rocksdb.<br></br>
+> **Default Value**: /tmp/hoodie_timeline_rocksdb (Optional)<br></br>
+> `Config Param: ROCKSDB_BASE_PATH`<br></br>
+
+---
+
+> #### hoodie.filesystem.view.incr.timeline.sync.enable
+> Controls whether or not, the file system view is incrementally updated as new actions are performed on the timeline.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: INCREMENTAL_TIMELINE_SYNC_ENABLE`<br></br>
+
+---
+
+### Index Configs {#Index-Configs}
+
+Configurations that control indexing behavior, which tags incoming records as either inserts or updates to older records.
+
+`Config Class`: org.apache.hudi.config.HoodieIndexConfig<br></br>
+> #### hoodie.index.bloom.num_entries
+> Only applies if index type is BLOOM. This is the number of entries to be stored in the bloom filter. The rationale for the default: Assume the maxParquetFileSize is 128MB and averageRecordSize is 1kb and hence we approx a total of 130K records in a file. The default (60000) is roughly half of this approximation. Warning: Setting this very low, will generate a lot of false positives and index lookup will have to scan a lot more files than it has to and setting this to a very high number [...]
+> **Default Value**: 60000 (Optional)<br></br>
+> `Config Param: BLOOM_FILTER_NUM_ENTRIES_VALUE`<br></br>
+
+---
+
+> #### hoodie.bloom.index.keys.per.bucket
+> Only applies if bloomIndexBucketizedChecking is enabled and index type is bloom. This configuration controls the “bucket” size which tracks the number of record-key checks made against a single file and is the unit of work allocated to each partition performing bloom filter lookup. A higher value would amortize the fixed cost of reading a bloom filter to memory.<br></br>
+> **Default Value**: 10000000 (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_KEYS_PER_BUCKET`<br></br>
+
+---
+
+> #### hoodie.simple.index.input.storage.level
+> Only applies when #simpleIndexUseCaching is set. Determine what level of persistence is used to cache input RDDs. Refer to org.apache.spark.storage.StorageLevel for different values<br></br>
+> **Default Value**: MEMORY_AND_DISK_SER (Optional)<br></br>
+> `Config Param: SIMPLE_INDEX_INPUT_STORAGE_LEVEL_VALUE`<br></br>
+
+---
+
+> #### hoodie.simple.index.parallelism
+> Only applies if index type is SIMPLE. This is the amount of parallelism for index lookup, which involves a Spark Shuffle<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: SIMPLE_INDEX_PARALLELISM`<br></br>
+
+---
+
+> #### hoodie.global.simple.index.parallelism
+> Only applies if index type is GLOBAL_SIMPLE. This is the amount of parallelism for index lookup, which involves a Spark Shuffle<br></br>
+> **Default Value**: 100 (Optional)<br></br>
+> `Config Param: GLOBAL_SIMPLE_INDEX_PARALLELISM`<br></br>
+
+---
+
+> #### hoodie.simple.index.update.partition.path
+> Similar to Key: 'hoodie.bloom.index.update.partition.path' , default: true description: Only applies if index type is GLOBAL_BLOOM. When set to true, an update including the partition path of a record that already exists will result in inserting the incoming record into the new partition and deleting the original record in the old partition. When set to false, the original record will only be updated in the old partition since version: version is not defined deprecated after: version i [...]
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: SIMPLE_INDEX_UPDATE_PARTITION_PATH_ENABLE`<br></br>
+
+---
+
+> #### hoodie.bucket.index.num.buckets
+> Only applies if index type is BUCKET_INDEX. Determine the number of buckets in the hudi table, and each partition is divided to N buckets.<br></br>
+> **Default Value**: 256 (Optional)<br></br>
+> `Config Param: BUCKET_INDEX_NUM_BUCKETS`<br></br>
+
+---
+
+> #### hoodie.bucket.index.hash.field
+> Index key. It is used to index the record and find its file group. If not set, use record key field as default<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BUCKET_INDEX_HASH_FIELD`<br></br>
+
+---
+
+> #### hoodie.bloom.index.use.metadata
+> Only applies if index type is BLOOM.When true, the index lookup uses bloom filters and column stats from metadata table when available to speed up the process.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_USE_METADATA`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.bloom.index.bucketized.checking
+> Only applies if index type is BLOOM. When true, bucketized bloom filtering is enabled. This reduces skew seen in sort based bloom index lookup<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_BUCKETIZED_CHECKING`<br></br>
+
+---
+
+> #### hoodie.index.type
+> Type of index to use. Default is Bloom filter. Possible options are [BLOOM | GLOBAL_BLOOM |SIMPLE | GLOBAL_SIMPLE | INMEMORY | HBASE | BUCKET]. Bloom filters removes the dependency on a external system and is stored in the footer of the Parquet Data Files<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: INDEX_TYPE`<br></br>
+
+---
+
+> #### hoodie.index.bloom.fpp
+> Only applies if index type is BLOOM. Error rate allowed given the number of entries. This is used to calculate how many bits should be assigned for the bloom filter and the number of hash functions. This is usually set very low (default: 0.000000001), we like to tradeoff disk space for lower false positives. If the number of entries added to bloom filter exceeds the configured value (hoodie.index.bloom.num_entries), then this fpp may not be honored.<br></br>
+> **Default Value**: 0.000000001 (Optional)<br></br>
+> `Config Param: BLOOM_FILTER_FPP_VALUE`<br></br>
+
+---
+
+> #### hoodie.bloom.index.update.partition.path
+> Only applies if index type is GLOBAL_BLOOM. When set to true, an update including the partition path of a record that already exists will result in inserting the incoming record into the new partition and deleting the original record in the old partition. When set to false, the original record will only be updated in the old partition<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_UPDATE_PARTITION_PATH_ENABLE`<br></br>
+
+---
+
+> #### hoodie.bloom.index.use.caching
+> Only applies if index type is BLOOM.When true, the input RDD will cached to speed up index lookup by reducing IO for computing parallelism or affected partitions<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_USE_CACHING`<br></br>
+
+---
+
+> #### hoodie.bloom.index.input.storage.level
+> Only applies when #bloomIndexUseCaching is set. Determine what level of persistence is used to cache input RDDs. Refer to org.apache.spark.storage.StorageLevel for different values<br></br>
+> **Default Value**: MEMORY_AND_DISK_SER (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_INPUT_STORAGE_LEVEL_VALUE`<br></br>
+
+---
+
+> #### hoodie.bloom.index.use.treebased.filter
+> Only applies if index type is BLOOM. When true, interval tree based file pruning optimization is enabled. This mode speeds-up file-pruning based on key ranges when compared with the brute-force mode<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_TREE_BASED_FILTER`<br></br>
+
+---
+
+> #### hoodie.bloom.index.parallelism
+> Only applies if index type is BLOOM. This is the amount of parallelism for index lookup, which involves a shuffle. By default, this is auto computed based on input workload characteristics.<br></br>
+> **Default Value**: 0 (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_PARALLELISM`<br></br>
+
+---
+
+> #### hoodie.bloom.index.filter.dynamic.max.entries
+> The threshold for the maximum number of keys to record in a dynamic Bloom filter row. Only applies if filter type is BloomFilterTypeCode.DYNAMIC_V0.<br></br>
+> **Default Value**: 100000 (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_FILTER_DYNAMIC_MAX_ENTRIES`<br></br>
+
+---
+
+> #### hoodie.simple.index.use.caching
+> Only applies if index type is SIMPLE. When true, the incoming writes will cached to speed up index lookup by reducing IO for computing parallelism or affected partitions<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: SIMPLE_INDEX_USE_CACHING`<br></br>
+
+---
+
+> #### hoodie.bloom.index.prune.by.ranges
+> Only applies if index type is BLOOM. When true, range information from files to leveraged speed up index lookups. Particularly helpful, if the key has a monotonously increasing prefix, such as timestamp. If the record key is completely random, it is better to turn this off, since range pruning will only  add extra overhead to the index lookup.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: BLOOM_INDEX_PRUNE_BY_RANGES`<br></br>
+
+---
+
+> #### hoodie.bloom.index.filter.type
+> Filter type used. Default is BloomFilterTypeCode.DYNAMIC_V0. Available values are [BloomFilterTypeCode.SIMPLE , BloomFilterTypeCode.DYNAMIC_V0]. Dynamic bloom filters auto size themselves based on number of keys.<br></br>
+> **Default Value**: DYNAMIC_V0 (Optional)<br></br>
+> `Config Param: BLOOM_FILTER_TYPE`<br></br>
+
+---
+
+> #### hoodie.index.class
+> Full path of user-defined index class and must be a subclass of HoodieIndex class. It will take precedence over the hoodie.index.type configuration if specified<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: INDEX_CLASS_NAME`<br></br>
+
+---
+
+### Clustering Configs {#Clustering-Configs}
+
+Configurations that control the clustering table service in hudi, which optimizes the storage layout for better query performance by sorting and sizing data files.
+
+`Config Class`: org.apache.hudi.config.HoodieClusteringConfig<br></br>
+> #### hoodie.clustering.plan.strategy.cluster.end.partition
+> End partition used to filter partition (inclusive), only effective when the filter mode 'hoodie.clustering.plan.partition.filter.mode' is SELECTED_PARTITIONS<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITION_FILTER_END_PARTITION`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.clustering.rollback.pending.replacecommit.on.conflict
+> If updates are allowed to file groups pending clustering, then set this config to rollback failed or pending clustering instants. Pending clustering will be rolled back ONLY IF there is conflict between incoming upsert and filegroup to be clustered. Please exercise caution while setting this config, especially when clustering is done very frequently. This could lead to race condition in rare scenarios, for example, when the clustering completes after instants are fetched but before rol [...]
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ROLLBACK_PENDING_CLUSTERING_ON_CONFLICT`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.clustering.async.max.commits
+> Config to control frequency of async clustering<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: ASYNC_CLUSTERING_MAX_COMMITS`<br></br>
+> `Since Version: 0.9.0`<br></br>
+
+---
+
+> #### hoodie.clustering.inline.max.commits
+> Config to control frequency of clustering planning<br></br>
+> **Default Value**: 4 (Optional)<br></br>
+> `Config Param: INLINE_CLUSTERING_MAX_COMMITS`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.layout.optimize.enable
+> This setting has no effect. Please refer to clustering configuration, as well as LAYOUT_OPTIMIZE_STRATEGY config to enable advanced record layout optimization strategies<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: LAYOUT_OPTIMIZE_ENABLE`<br></br>
+> `Since Version: 0.10.0`<br></br>
+> `Deprecated Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.target.file.max.bytes
+> Each group can produce 'N' (CLUSTERING_MAX_GROUP_SIZE/CLUSTERING_TARGET_FILE_SIZE) output file groups<br></br>
+> **Default Value**: 1073741824 (Optional)<br></br>
+> `Config Param: PLAN_STRATEGY_TARGET_FILE_MAX_BYTES`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.daybased.skipfromlatest.partitions
+> Number of partitions to skip from latest when choosing partitions to create ClusteringPlan<br></br>
+> **Default Value**: 0 (Optional)<br></br>
+> `Config Param: PLAN_STRATEGY_SKIP_PARTITIONS_FROM_LATEST`<br></br>
+> `Since Version: 0.9.0`<br></br>
+
+---
+
+> #### hoodie.clustering.execution.strategy.class
+> Config to provide a strategy class (subclass of RunClusteringStrategy) to define how the  clustering plan is executed. By default, we sort the file groups in th plan by the specified columns, while  meeting the configured target file sizes.<br></br>
+> **Default Value**: org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy (Optional)<br></br>
+> `Config Param: EXECUTION_STRATEGY_CLASS_NAME`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.clustering.async.enabled
+> Enable running of clustering service, asynchronously as inserts happen on the table.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ASYNC_CLUSTERING_ENABLE`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.class
+> Config to provide a strategy class (subclass of ClusteringPlanStrategy) to create clustering plan i.e select what file groups are being clustered. Default strategy, looks at the clustering small file size limit (determined by hoodie.clustering.plan.strategy.small.file.limit) to pick the small file slices within partitions for clustering.<br></br>
+> **Default Value**: org.apache.hudi.client.clustering.plan.strategy.SparkSizeBasedClusteringPlanStrategy (Optional)<br></br>
+> `Config Param: PLAN_STRATEGY_CLASS_NAME`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.layout.optimize.build.curve.sample.size
+> Determines target sample size used by the Boundary-based Interleaved Index method of building space-filling curve. Larger sample size entails better layout optimization outcomes, at the expense of higher memory footprint.<br></br>
+> **Default Value**: 200000 (Optional)<br></br>
+> `Config Param: LAYOUT_OPTIMIZE_BUILD_CURVE_SAMPLE_SIZE`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.partition.selected
+> Partitions to run clustering<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITION_SELECTED`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.clustering.updates.strategy
+> Determines how to handle updates, deletes to file groups that are under clustering. Default strategy just rejects the update<br></br>
+> **Default Value**: org.apache.hudi.client.clustering.update.strategy.SparkRejectUpdateStrategy (Optional)<br></br>
+> `Config Param: UPDATES_STRATEGY`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.layout.optimize.strategy
+> Determines ordering strategy used in records layout optimization. Currently supported strategies are "linear", "z-order" and "hilbert" values are supported.<br></br>
+> **Default Value**: linear (Optional)<br></br>
+> `Config Param: LAYOUT_OPTIMIZE_STRATEGY`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.clustering.inline
+> Turn on inline clustering - clustering will be run after each write operation is complete<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: INLINE_CLUSTERING`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.cluster.begin.partition
+> Begin partition used to filter partition (inclusive), only effective when the filter mode 'hoodie.clustering.plan.partition.filter.mode' is SELECTED_PARTITIONS<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITION_FILTER_BEGIN_PARTITION`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.sort.columns
+> Columns to sort the data by when clustering<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PLAN_STRATEGY_SORT_COLUMNS`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.clustering.preserve.commit.metadata
+> When rewriting data, preserves existing hoodie_commit_time<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: PRESERVE_COMMIT_METADATA`<br></br>
+> `Since Version: 0.9.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.max.num.groups
+> Maximum number of groups to create as part of ClusteringPlan. Increasing groups will increase parallelism<br></br>
+> **Default Value**: 30 (Optional)<br></br>
+> `Config Param: PLAN_STRATEGY_MAX_GROUPS`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.partition.filter.mode
+> Partition filter mode used in the creation of clustering plan. Available values are - NONE: do not filter table partition and thus the clustering plan will include all partitions that have clustering candidate.RECENT_DAYS: keep a continuous range of partitions, worked together with configs 'hoodie.clustering.plan.strategy.daybased.lookback.partitions' and 'hoodie.clustering.plan.strategy.daybased.skipfromlatest.partitions.SELECTED_PARTITIONS: keep partitions that are in the specified r [...]
+> **Default Value**: NONE (Optional)<br></br>
+> `Config Param: PLAN_PARTITION_FILTER_MODE_NAME`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.clustering.schedule.inline
+> When set to true, clustering service will be attempted for inline scheduling after each write. Users have to ensure they have a separate job to run async clustering(execution) for the one scheduled by this writer. Users can choose to set both `hoodie.clustering.inline` and `hoodie.clustering.schedule.inline` to false and have both scheduling and execution triggered by any async process, on which case `hoodie.clustering.async.enabled` is expected to be set to true. But if `hoodie.cluste [...]
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: SCHEDULE_INLINE_CLUSTERING`<br></br>
+
+---
+
+> #### hoodie.layout.optimize.data.skipping.enable
+> Enable data skipping by collecting statistics once layout optimization is complete.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: LAYOUT_OPTIMIZE_DATA_SKIPPING_ENABLE`<br></br>
+> `Since Version: 0.10.0`<br></br>
+> `Deprecated Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.max.bytes.per.group
+> Each clustering operation can create multiple output file groups. Total amount of data processed by clustering operation is defined by below two properties (CLUSTERING_MAX_BYTES_PER_GROUP * CLUSTERING_MAX_NUM_GROUPS). Max amount of data to be included in one group<br></br>
+> **Default Value**: 2147483648 (Optional)<br></br>
+> `Config Param: PLAN_STRATEGY_MAX_BYTES_PER_OUTPUT_FILEGROUP`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.small.file.limit
+> Files smaller than the size in bytes specified here are candidates for clustering<br></br>
+> **Default Value**: 314572800 (Optional)<br></br>
+> `Config Param: PLAN_STRATEGY_SMALL_FILE_LIMIT`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.layout.optimize.curve.build.method
+> Controls how data is sampled to build the space-filling curves. Two methods: "direct", "sample". The direct method is faster than the sampling, however sample method would produce a better data layout.<br></br>
+> **Default Value**: direct (Optional)<br></br>
+> `Config Param: LAYOUT_OPTIMIZE_SPATIAL_CURVE_BUILD_METHOD`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.partition.regex.pattern
+> Filter clustering partitions that matched regex pattern<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITION_REGEX_PATTERN`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.daybased.lookback.partitions
+> Number of partitions to list to create ClusteringPlan<br></br>
+> **Default Value**: 2 (Optional)<br></br>
+> `Config Param: DAYBASED_LOOKBACK_PARTITIONS`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+### Common Configurations {#Common-Configurations}
+
+The following set of configurations are common across Hudi.
+
+`Config Class`: org.apache.hudi.common.config.HoodieCommonConfig<br></br>
+> #### hoodie.common.diskmap.compression.enabled
+> Turn on compression for BITCASK disk map used by the External Spillable Map<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: DISK_MAP_BITCASK_COMPRESSION_ENABLED`<br></br>
+
+---
+
+> #### hoodie.common.spillable.diskmap.type
+> When handling input data that cannot be held in memory, to merge with a file on storage, a spillable diskmap is employed.  By default, we use a persistent hashmap based loosely on bitcask, that offers O(1) inserts, lookups. Change this to `ROCKS_DB` to prefer using rocksDB, for handling the spill.<br></br>
+> **Default Value**: BITCASK (Optional)<br></br>
+> `Config Param: SPILLABLE_DISK_MAP_TYPE`<br></br>
+
+---
+
+### Bootstrap Configs {#Bootstrap-Configs}
+
+Configurations that control how you want to bootstrap your existing tables for the first time into hudi. The bootstrap operation can flexibly avoid copying data over before you can use Hudi and support running the existing  writers and new hudi writers in parallel, to validate the migration.
+
+`Config Class`: org.apache.hudi.config.HoodieBootstrapConfig<br></br>
+> #### hoodie.bootstrap.partitionpath.translator.class
+> Translates the partition paths from the bootstrapped data into how is laid out as a Hudi table.<br></br>
+> **Default Value**: org.apache.hudi.client.bootstrap.translator.IdentityBootstrapPartitionPathTranslator (Optional)<br></br>
+> `Config Param: PARTITION_PATH_TRANSLATOR_CLASS_NAME`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.bootstrap.full.input.provider
+> Class to use for reading the bootstrap dataset partitions/files, for Bootstrap mode FULL_RECORD<br></br>
+> **Default Value**: org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider (Optional)<br></br>
+> `Config Param: FULL_BOOTSTRAP_INPUT_PROVIDER_CLASS_NAME`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.bootstrap.keygen.type
+> Type of build-in key generator, currently support SIMPLE, COMPLEX, TIMESTAMP, CUSTOM, NON_PARTITION, GLOBAL_DELETE<br></br>
+> **Default Value**: SIMPLE (Optional)<br></br>
+> `Config Param: KEYGEN_TYPE`<br></br>
+> `Since Version: 0.9.0`<br></br>
+
+---
+
+> #### hoodie.bootstrap.keygen.class
+> Key generator implementation to be used for generating keys from the bootstrapped dataset<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: KEYGEN_CLASS_NAME`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.bootstrap.parallelism
+> Parallelism value to be used to bootstrap data into hudi<br></br>
+> **Default Value**: 1500 (Optional)<br></br>
+> `Config Param: PARALLELISM_VALUE`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.bootstrap.base.path
+> Base path of the dataset that needs to be bootstrapped as a Hudi table<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BASE_PATH`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.bootstrap.mode.selector.regex
+> Matches each bootstrap dataset partition against this regex and applies the mode below to it.<br></br>
+> **Default Value**: .* (Optional)<br></br>
+> `Config Param: PARTITION_SELECTOR_REGEX_PATTERN`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.bootstrap.index.class
+> Implementation to use, for mapping a skeleton base file to a boostrap base file.<br></br>
+> **Default Value**: org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex (Optional)<br></br>
+> `Config Param: INDEX_CLASS_NAME`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.bootstrap.mode.selector.regex.mode
+> Bootstrap mode to apply for partition paths, that match regex above. METADATA_ONLY will generate just skeleton base files with keys/footers, avoiding full cost of rewriting the dataset. FULL_RECORD will perform a full copy/rewrite of the data as a Hudi table.<br></br>
+> **Default Value**: METADATA_ONLY (Optional)<br></br>
+> `Config Param: PARTITION_SELECTOR_REGEX_MODE`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.bootstrap.mode.selector
+> Selects the mode in which each file/partition in the bootstrapped dataset gets bootstrapped<br></br>
+> **Default Value**: org.apache.hudi.client.bootstrap.selector.MetadataOnlyBootstrapModeSelector (Optional)<br></br>
+> `Config Param: MODE_SELECTOR_CLASS_NAME`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+## Metrics Configs {#METRICS}
+These set of configs are used to enable monitoring and reporting of keyHudi stats and metrics.
+
+### Metrics Configurations for Datadog reporter {#Metrics-Configurations-for-Datadog-reporter}
+
+Enables reporting on Hudi metrics using the Datadog reporter type. Hudi publishes metrics on every commit, clean, rollback etc.
+
+`Config Class`: org.apache.hudi.config.metrics.HoodieMetricsDatadogConfig<br></br>
+> #### hoodie.metrics.datadog.api.timeout.seconds
+> Datadog API timeout in seconds. Default to 3.<br></br>
+> **Default Value**: 3 (Optional)<br></br>
+> `Config Param: API_TIMEOUT_IN_SECONDS`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.datadog.metric.prefix
+> Datadog metric prefix to be prepended to each metric name with a dot as delimiter. For example, if it is set to foo, foo. will be prepended.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: METRIC_PREFIX_VALUE`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.datadog.api.site
+> Datadog API site: EU or US<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: API_SITE_VALUE`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.datadog.api.key.skip.validation
+> Before sending metrics via Datadog API, whether to skip validating Datadog API key or not. Default to false.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: API_KEY_SKIP_VALIDATION`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.datadog.metric.host
+> Datadog metric host to be sent along with metrics data.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: METRIC_HOST_NAME`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.datadog.report.period.seconds
+> Datadog reporting period in seconds. Default to 30.<br></br>
+> **Default Value**: 30 (Optional)<br></br>
+> `Config Param: REPORT_PERIOD_IN_SECONDS`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.datadog.api.key
+> Datadog API key<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: API_KEY`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.datadog.api.key.supplier
+> Datadog API key supplier to supply the API key at runtime. This will take effect if hoodie.metrics.datadog.api.key is not set.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: API_KEY_SUPPLIER`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.datadog.metric.tags
+> Datadog metric tags (comma-delimited) to be sent along with metrics data.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: METRIC_TAG_VALUES`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+### Metrics Configurations {#Metrics-Configurations}
+
+Enables reporting on Hudi metrics. Hudi publishes metrics on every commit, clean, rollback etc. The following sections list the supported reporters.
+
+`Config Class`: org.apache.hudi.config.metrics.HoodieMetricsConfig<br></br>
+> #### hoodie.metrics.executor.enable
+> <br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: EXECUTOR_METRICS_ENABLE`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.metrics.reporter.metricsname.prefix
+> The prefix given to the metrics names.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: METRICS_REPORTER_PREFIX`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.metrics.reporter.type
+> Type of metrics reporter.<br></br>
+> **Default Value**: GRAPHITE (Optional)<br></br>
+> `Config Param: METRICS_REPORTER_TYPE_VALUE`<br></br>
+> `Since Version: 0.5.0`<br></br>
+
+---
+
+> #### hoodie.metrics.reporter.class
+> <br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: METRICS_REPORTER_CLASS_NAME`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.on
+> Turn on/off metrics reporting. off by default.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: TURN_METRICS_ON`<br></br>
+> `Since Version: 0.5.0`<br></br>
+
+---
+
+### Metrics Configurations for Jmx {#Metrics-Configurations-for-Jmx}
+
+Enables reporting on Hudi metrics using Jmx.  Hudi publishes metrics on every commit, clean, rollback etc.
+
+`Config Class`: org.apache.hudi.config.metrics.HoodieMetricsJmxConfig<br></br>
+> #### hoodie.metrics.jmx.host
+> Jmx host to connect to<br></br>
+> **Default Value**: localhost (Optional)<br></br>
+> `Config Param: JMX_HOST_NAME`<br></br>
+> `Since Version: 0.5.1`<br></br>
+
+---
+
+> #### hoodie.metrics.jmx.port
+> Jmx port to connect to<br></br>
+> **Default Value**: 9889 (Optional)<br></br>
+> `Config Param: JMX_PORT_NUM`<br></br>
+> `Since Version: 0.5.1`<br></br>
+
+---
+
+### Metrics Configurations for Prometheus {#Metrics-Configurations-for-Prometheus}
+
+Enables reporting on Hudi metrics using Prometheus.  Hudi publishes metrics on every commit, clean, rollback etc.
+
+`Config Class`: org.apache.hudi.config.metrics.HoodieMetricsPrometheusConfig<br></br>
+> #### hoodie.metrics.pushgateway.random.job.name.suffix
+> Whether the pushgateway name need a random suffix , default true.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: PUSHGATEWAY_RANDOM_JOBNAME_SUFFIX`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.pushgateway.port
+> Port for the push gateway.<br></br>
+> **Default Value**: 9091 (Optional)<br></br>
+> `Config Param: PUSHGATEWAY_PORT_NUM`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.pushgateway.delete.on.shutdown
+> Delete the pushgateway info or not when job shutdown, true by default.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: PUSHGATEWAY_DELETE_ON_SHUTDOWN_ENABLE`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.prometheus.port
+> Port for prometheus server.<br></br>
+> **Default Value**: 9090 (Optional)<br></br>
+> `Config Param: PROMETHEUS_PORT_NUM`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.pushgateway.job.name
+> Name of the push gateway job.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: PUSHGATEWAY_JOBNAME`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.pushgateway.report.period.seconds
+> Reporting interval in seconds.<br></br>
+> **Default Value**: 30 (Optional)<br></br>
+> `Config Param: PUSHGATEWAY_REPORT_PERIOD_IN_SECONDS`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+> #### hoodie.metrics.pushgateway.host
+> Hostname of the prometheus push gateway.<br></br>
+> **Default Value**: localhost (Optional)<br></br>
+> `Config Param: PUSHGATEWAY_HOST_NAME`<br></br>
+> `Since Version: 0.6.0`<br></br>
+
+---
+
+### Metrics Configurations for Amazon CloudWatch {#Metrics-Configurations-for-Amazon-CloudWatch}
+
+Enables reporting on Hudi metrics using Amazon CloudWatch.  Hudi publishes metrics on every commit, clean, rollback etc.
+
+`Config Class`: org.apache.hudi.config.HoodieMetricsCloudWatchConfig<br></br>
+> #### hoodie.metrics.cloudwatch.report.period.seconds
+> Reporting interval in seconds<br></br>
+> **Default Value**: 60 (Optional)<br></br>
+> `Config Param: REPORT_PERIOD_SECONDS`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.metrics.cloudwatch.namespace
+> Namespace of reporter<br></br>
+> **Default Value**: Hudi (Optional)<br></br>
+> `Config Param: METRIC_NAMESPACE`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.metrics.cloudwatch.metric.prefix
+> Metric prefix of reporter<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: METRIC_PREFIX`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.metrics.cloudwatch.maxDatumsPerRequest
+> Max number of Datums per request<br></br>
+> **Default Value**: 20 (Optional)<br></br>
+> `Config Param: MAX_DATUMS_PER_REQUEST`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+### Metrics Configurations for Graphite {#Metrics-Configurations-for-Graphite}
+
+Enables reporting on Hudi metrics using Graphite.  Hudi publishes metrics on every commit, clean, rollback etc.
+
+`Config Class`: org.apache.hudi.config.metrics.HoodieMetricsGraphiteConfig<br></br>
+> #### hoodie.metrics.graphite.port
+> Graphite port to connect to.<br></br>
+> **Default Value**: 4756 (Optional)<br></br>
+> `Config Param: GRAPHITE_SERVER_PORT_NUM`<br></br>
+> `Since Version: 0.5.0`<br></br>
+
+---
+
+> #### hoodie.metrics.graphite.report.period.seconds
+> Graphite reporting period in seconds. Default to 30.<br></br>
+> **Default Value**: 30 (Optional)<br></br>
+> `Config Param: GRAPHITE_REPORT_PERIOD_IN_SECONDS`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.metrics.graphite.host
+> Graphite host to connect to.<br></br>
+> **Default Value**: localhost (Optional)<br></br>
+> `Config Param: GRAPHITE_SERVER_HOST_NAME`<br></br>
+> `Since Version: 0.5.0`<br></br>
+
+---
+
+> #### hoodie.metrics.graphite.metric.prefix
+> Standard prefix applied to all metrics. This helps to add datacenter, environment information for e.g<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: GRAPHITE_METRIC_PREFIX_VALUE`<br></br>
+> `Since Version: 0.5.1`<br></br>
+
+---
+
+## Record Payload Config {#RECORD_PAYLOAD}
+This is the lowest level of customization offered by Hudi. Record payloads define how to produce new values to upsert based on incoming new record and stored old record. Hudi provides default implementations such as OverwriteWithLatestAvroPayload which simply update table with the latest/last-written record. This can be overridden to a custom class extending HoodieRecordPayload class, on both datasource and WriteClient levels.
+
+### Payload Configurations {#Payload-Configurations}
+
+Payload related configs, that can be leveraged to control merges based on specific business fields in the data.
+
+`Config Class`: org.apache.hudi.config.HoodiePayloadConfig<br></br>
+> #### hoodie.payload.event.time.field
+> Table column/field name to derive timestamp associated with the records. This canbe useful for e.g, determining the freshness of the table.<br></br>
+> **Default Value**: ts (Optional)<br></br>
+> `Config Param: EVENT_TIME_FIELD`<br></br>
+
+---
+
+> #### hoodie.payload.ordering.field
+> Table column/field name to order records that have the same key, before merging and writing to storage.<br></br>
+> **Default Value**: ts (Optional)<br></br>
+> `Config Param: ORDERING_FIELD`<br></br>
+
+---
+
+## Kafka Connect Configs {#KAFKA_CONNECT}
+These set of configs are used for Kafka Connect Sink Connector for writing Hudi Tables
+
+### Kafka Sink Connect Configurations {#Kafka-Sink-Connect-Configurations}
+
+Configurations for Kafka Connect Sink Connector for Hudi.
+
+`Config Class`: org.apache.hudi.connect.writers.KafkaConnectConfigs<br></br>
+> #### hoodie.kafka.coordinator.write.timeout.secs
+> The timeout after sending an END_COMMIT until when the coordinator will wait for the write statuses from all the partitionsto ignore the current commit and start a new commit.<br></br>
+> **Default Value**: 300 (Optional)<br></br>
+> `Config Param: COORDINATOR_WRITE_TIMEOUT_SECS`<br></br>
+
+---
+
+> #### hoodie.meta.sync.classes
+> Meta sync client tool, using comma to separate multi tools<br></br>
+> **Default Value**: org.apache.hudi.hive.HiveSyncTool (Optional)<br></br>
+> `Config Param: META_SYNC_CLASSES`<br></br>
+
+---
+
+> #### hoodie.kafka.allow.commit.on.errors
+> Commit even when some records failed to be written<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ALLOW_COMMIT_ON_ERRORS`<br></br>
+
+---
+
+> #### hadoop.home
+> The Hadoop home directory.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HADOOP_HOME`<br></br>
+
+---
+
+> #### hoodie.meta.sync.enable
+> Enable Meta Sync such as Hive<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: META_SYNC_ENABLE`<br></br>
+
+---
+
+> #### hoodie.kafka.commit.interval.secs
+> The interval at which Hudi will commit the records written to the files, making them consumable on the read-side.<br></br>
+> **Default Value**: 60 (Optional)<br></br>
+> `Config Param: COMMIT_INTERVAL_SECS`<br></br>
+
+---
+
+> #### hoodie.kafka.control.topic
+> Kafka topic name used by the Hudi Sink Connector for sending and receiving control messages. Not used for data records.<br></br>
+> **Default Value**: hudi-control-topic (Optional)<br></br>
+> `Config Param: CONTROL_TOPIC_NAME`<br></br>
+
+---
+
+> #### bootstrap.servers
+> The bootstrap servers for the Kafka Cluster.<br></br>
+> **Default Value**: localhost:9092 (Optional)<br></br>
+> `Config Param: KAFKA_BOOTSTRAP_SERVERS`<br></br>
+
+---
+
+> #### hoodie.schemaprovider.class
+> subclass of org.apache.hudi.schema.SchemaProvider to attach schemas to input & target table data, built in options: org.apache.hudi.schema.FilebasedSchemaProvider.<br></br>
+> **Default Value**: org.apache.hudi.schema.FilebasedSchemaProvider (Optional)<br></br>
+> `Config Param: SCHEMA_PROVIDER_CLASS`<br></br>
+
+---
+
+> #### hadoop.conf.dir
+> The Hadoop configuration directory.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HADOOP_CONF_DIR`<br></br>
+
+---
+
+> #### hoodie.kafka.compaction.async.enable
+> Controls whether async compaction should be turned on for MOR table writing.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: ASYNC_COMPACT_ENABLE`<br></br>
+
+---
+
+## Amazon Web Services Configs {#AWS}
+Please fill in the description for Config Group Name: Amazon Web Services Configs
+
+### Amazon Web Services Configs {#Amazon-Web-Services-Configs}
+
+Amazon Web Services configurations to access resources like Amazon DynamoDB (for locks), Amazon CloudWatch (metrics).
+
+`Config Class`: org.apache.hudi.config.HoodieAWSConfig<br></br>
+> #### hoodie.aws.session.token
+> AWS session token<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: AWS_SESSION_TOKEN`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.aws.access.key
+> AWS access key id<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: AWS_ACCESS_KEY`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.aws.secret.key
+> AWS secret key<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: AWS_SECRET_KEY`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
diff --git a/website/versioned_docs/version-0.11.1/cos_hoodie.md b/website/versioned_docs/version-0.11.1/cos_hoodie.md
new file mode 100644
index 0000000000..dfde6e8cff
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/cos_hoodie.md
@@ -0,0 +1,71 @@
+---
+title: Tencent Cloud
+keywords: [ hudi, hive, tencent, cos, spark, presto]
+summary: In this page, we go over how to configure Hudi with COS filesystem.
+last_modified_at: 2020-04-21T11:38:24-10:00
+---
+In this page, we explain how to get your Hudi spark job to store into Tencent Cloud COS.
+
+## Tencent Cloud COS configs
+
+There are two configurations required for Hudi-COS compatibility:
+
+- Adding Tencent Cloud COS Credentials for Hudi
+- Adding required Jars to classpath
+
+### Tencent Cloud COS Credentials
+
+Add the required configs in your core-site.xml from where Hudi can fetch them. Replace the `fs.defaultFS` with your COS bucket name, replace `fs.cosn.userinfo.secretId` with your COS secret Id, replace `fs.cosn.userinfo.secretKey` with your COS key. Hudi should be able to read/write from the bucket.
+
+```xml
+    <property>
+        <name>fs.defaultFS</name>
+        <value>cosn://bucketname</value>
+        <description>COS bucket name</description>
+    </property>
+
+    <property>
+        <name>fs.cosn.userinfo.secretId</name>
+        <value>cos-secretId</value>
+        <description>Tencent Cloud Secret Id</description>
+    </property>
+
+    <property>
+        <name>fs.cosn.userinfo.secretKey</name>
+        <value>cos-secretkey</value>
+        <description>Tencent Cloud Secret Key</description>
+    </property>
+
+    <property>
+        <name>fs.cosn.bucket.region</name>
+        <value>ap-region</value>
+        <description>The region where the bucket is located.</description>
+    </property>
+
+    <property>
+        <name>fs.cosn.bucket.endpoint_suffix</name>
+        <value>cos.endpoint.suffix</value>
+        <description>
+          COS endpoint to connect to. 
+          For public cloud users, it is recommended not to set this option, and only the correct area field is required.
+        </description>
+    </property>
+
+    <property>
+        <name>fs.cosn.impl</name>
+        <value>org.apache.hadoop.fs.CosFileSystem</value>
+        <description>The implementation class of the CosN Filesystem.</description>
+    </property>
+
+    <property>
+        <name>fs.AbstractFileSystem.cosn.impl</name>
+        <value>org.apache.hadoop.fs.CosN</value>
+        <description>The implementation class of the CosN AbstractFileSystem.</description>
+    </property>
+
+```
+
+### Tencent Cloud COS Libs
+COS hadoop libraries to add to our classpath
+
+- org.apache.hadoop:hadoop-cos:2.8.5
diff --git a/website/docs/deployment.md b/website/versioned_docs/version-0.11.1/deployment.md
similarity index 98%
copy from website/docs/deployment.md
copy to website/versioned_docs/version-0.11.1/deployment.md
index a4a57fb6b0..3236fd5657 100644
--- a/website/docs/deployment.md
+++ b/website/versioned_docs/version-0.11.1/deployment.md
@@ -29,11 +29,11 @@ With Merge_On_Read Table, Hudi ingestion needs to also take care of compacting d
 from varied sources such as DFS, Kafka and DB Changelogs and ingest them to hudi tables.  It runs as a spark application in two modes.
 
 To use DeltaStreamer in Spark, the `hudi-utilities-bundle` is required, by adding
-`--packages org.apache.hudi:hudi-utilities-bundle_2.11:0.11.0` to the `spark-submit` command. From 0.11.0 release, we start
+`--packages org.apache.hudi:hudi-utilities-bundle_2.11:0.11.1` to the `spark-submit` command. From 0.11.0 release, we start
 to provide a new `hudi-utilities-slim-bundle` which aims to exclude dependencies that can cause conflicts and compatibility
 issues with different versions of Spark.  The `hudi-utilities-slim-bundle` should be used along with a Hudi Spark bundle 
 corresponding to the Spark version used, e.g., 
-`--packages org.apache.hudi:hudi-utilities-slim-bundle_2.12:0.11.0,org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0`,
+`--packages org.apache.hudi:hudi-utilities-slim-bundle_2.12:0.11.1,org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1`,
 if using `hudi-utilities-bundle` solely in Spark encounters compatibility issues.
 
  - **Run Once Mode** : In this mode, Deltastreamer performs one ingestion round which includes incrementally pulling events from upstream sources and ingesting them to hudi table. Background operations like cleaning old file versions and archiving hoodie timeline are automatically executed as part of the run. For Merge-On-Read tables, Compaction is also run inline as part of ingestion unless disabled by passing the flag "--disable-compaction". By default, Compaction is run inline for eve [...]
@@ -41,7 +41,7 @@ if using `hudi-utilities-bundle` solely in Spark encounters compatibility issues
 Here is an example invocation for reading from kafka topic in a single-run mode and writing to Merge On Read table type in a yarn cluster.
 
 ```java
-[hoodie]$ spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.11.0 \
+[hoodie]$ spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.11.1 \
  --master yarn \
  --deploy-mode cluster \
  --num-executors 10 \
@@ -89,7 +89,7 @@ Here is an example invocation for reading from kafka topic in a single-run mode
 Here is an example invocation for reading from kafka topic in a continuous mode and writing to Merge On Read table type in a yarn cluster.
 
 ```java
-[hoodie]$ spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.11.0 \
+[hoodie]$ spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.11.1 \
  --master yarn \
  --deploy-mode cluster \
  --num-executors 10 \
diff --git a/website/versioned_docs/version-0.11.1/disaster_recovery.md b/website/versioned_docs/version-0.11.1/disaster_recovery.md
new file mode 100644
index 0000000000..c2f53bc8cd
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/disaster_recovery.md
@@ -0,0 +1,296 @@
+---
+title: Disaster Recovery
+toc: true
+---
+
+Disaster Recovery is very much mission critical for any software. Especially when it comes to data systems, the impact could be very serious
+leading to delay in business decisions or even wrong business decisions at times. Apache Hudi has two operations to assist you in recovering
+data from a previous state: "savepoint" and "restore".
+
+## Savepoint
+
+As the name suggest, "savepoint" saves the table as of the commit time, so that it lets you restore the table to this 
+savepoint at a later point in time if need be. Care is taken to ensure cleaner will not clean up any files that are savepointed. 
+On similar lines, savepoint cannot be triggered on a commit that is already cleaned up. In simpler terms, this is synonymous 
+to taking a backup, just that we don't make a new copy of the table, but just save the state of the table elegantly so that 
+we can restore it later when in need. 
+
+## Restore
+
+This operation lets you restore your table to one of the savepoint commit. This operation cannot be undone (or reversed) and so care 
+should be taken before doing a restore. Hudi will delete all data files and commit files (timeline files) greater than the
+savepoint commit to which the table is being restored. You should pause all writes to the table when performing
+a restore since they are likely to fail while the restore is in progress. Also, reads could also fail since snapshot queries 
+will be hitting latest files which has high possibility of getting deleted with restore. 
+
+## Runbook
+
+Savepoint and restore can only be triggered from hudi-cli. Lets walk through an example of how one can take savepoint 
+and later restore the state of the table. 
+
+Lets create a hudi table via spark-shell. I am going to trigger few batches of inserts. 
+
+```scala
+import org.apache.hudi.QuickstartUtils._
+import scala.collection.JavaConversions._
+import org.apache.spark.sql.SaveMode._
+import org.apache.hudi.DataSourceReadOptions._
+import org.apache.hudi.DataSourceWriteOptions._
+import org.apache.hudi.config.HoodieWriteConfig._
+
+val tableName = "hudi_trips_cow"
+val basePath = "file:///tmp/hudi_trips_cow"
+val dataGen = new DataGenerator
+
+// spark-shell
+val inserts = convertToStringList(dataGen.generateInserts(10))
+val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
+df.write.format("hudi").
+  options(getQuickstartWriteConfigs).
+  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+  option(TABLE_NAME, tableName).
+  mode(Overwrite).
+  save(basePath)
+```
+
+Each batch inserst 10 records. Repeating for 4 more batches. 
+```scala
+
+val inserts = convertToStringList(dataGen.generateInserts(10))
+val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
+df.write.format("hudi").
+  options(getQuickstartWriteConfigs).
+  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+  option(TABLE_NAME, tableName).
+  mode(Append).
+  save(basePath)
+```
+
+Total record count should be 50. 
+```scala
+val tripsSnapshotDF = spark.
+  read.
+  format("hudi").
+  load(basePath)
+tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
+
+spark.sql("select count(partitionpath, uuid) from  hudi_trips_snapshot ").show()
+
++--------------------------+
+|count(partitionpath, uuid)|
+  +--------------------------+
+|                        50|
+  +--------------------------+
+```
+Let's take a look at the timeline after 5 batch of inserts. 
+```shell
+ls -ltr /tmp/hudi_trips_cow/.hoodie 
+total 128
+drwxr-xr-x  2 nsb  wheel    64 Jan 28 16:00 archived
+-rw-r--r--  1 nsb  wheel   546 Jan 28 16:00 hoodie.properties
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:00 20220128160040171.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:00 20220128160040171.inflight
+-rw-r--r--  1 nsb  wheel  4374 Jan 28 16:00 20220128160040171.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:01 20220128160124637.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:01 20220128160124637.inflight
+-rw-r--r--  1 nsb  wheel  4414 Jan 28 16:01 20220128160124637.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160226172.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160226172.inflight
+-rw-r--r--  1 nsb  wheel  4427 Jan 28 16:02 20220128160226172.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160229636.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160229636.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160229636.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160245447.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160245447.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160245447.commit
+```
+
+Let's trigger a savepoint as of the latest commit. Savepoint can only be done via hudi-cli.
+
+```sh
+./hudi-cli.sh
+
+connect --path /tmp/hudi_trips_cow/
+commits show
+set --conf SPARK_HOME=<SPARK_HOME>
+savepoint create --commit 20220128160245447 --sparkMaster local[2]
+```
+
+Let's check the timeline after savepoint. 
+```shell
+ls -ltr /tmp/hudi_trips_cow/.hoodie
+total 136
+drwxr-xr-x  2 nsb  wheel    64 Jan 28 16:00 archived
+-rw-r--r--  1 nsb  wheel   546 Jan 28 16:00 hoodie.properties
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:00 20220128160040171.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:00 20220128160040171.inflight
+-rw-r--r--  1 nsb  wheel  4374 Jan 28 16:00 20220128160040171.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:01 20220128160124637.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:01 20220128160124637.inflight
+-rw-r--r--  1 nsb  wheel  4414 Jan 28 16:01 20220128160124637.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160226172.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160226172.inflight
+-rw-r--r--  1 nsb  wheel  4427 Jan 28 16:02 20220128160226172.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160229636.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160229636.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160229636.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160245447.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160245447.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160245447.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:05 20220128160245447.savepoint.inflight
+-rw-r--r--  1 nsb  wheel  1168 Jan 28 16:05 20220128160245447.savepoint
+```
+
+You could notice that savepoint meta files are added which keeps track of the files that are part of the latest table snapshot. 
+
+Now, lets continue adding few more batches of inserts. 
+Repeat below commands for 3 times.
+```scala
+val inserts = convertToStringList(dataGen.generateInserts(10))
+val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
+df.write.format("hudi").
+  options(getQuickstartWriteConfigs).
+  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+  option(TABLE_NAME, tableName).
+  mode(Append).
+  save(basePath)
+```
+
+Total record count will be 80 since we have done 8 batches in total. (5 until savepoint and 3 after savepoint)
+```scala
+val tripsSnapshotDF = spark.
+  read.
+  format("hudi").
+  load(basePath)
+tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
+
+spark.sql("select count(partitionpath, uuid) from  hudi_trips_snapshot ").show()
++--------------------------+
+|count(partitionpath, uuid)|
+  +--------------------------+
+|                        80|
+  +--------------------------+
+```
+
+Let's say something bad happened and you want to restore your table to a older snapshot. As we called out earlier, we can
+trigger restore only from hudi-cli. And do remember to bring down all of your writer processes while doing a restore. 
+
+Lets checkout timeline once, before we trigger the restore.
+```shell
+ls -ltr /tmp/hudi_trips_cow/.hoodie
+total 208
+drwxr-xr-x  2 nsb  wheel    64 Jan 28 16:00 archived
+-rw-r--r--  1 nsb  wheel   546 Jan 28 16:00 hoodie.properties
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:00 20220128160040171.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:00 20220128160040171.inflight
+-rw-r--r--  1 nsb  wheel  4374 Jan 28 16:00 20220128160040171.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:01 20220128160124637.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:01 20220128160124637.inflight
+-rw-r--r--  1 nsb  wheel  4414 Jan 28 16:01 20220128160124637.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160226172.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160226172.inflight
+-rw-r--r--  1 nsb  wheel  4427 Jan 28 16:02 20220128160226172.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160229636.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160229636.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160229636.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160245447.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160245447.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160245447.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:05 20220128160245447.savepoint.inflight
+-rw-r--r--  1 nsb  wheel  1168 Jan 28 16:05 20220128160245447.savepoint
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:06 20220128160620557.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:06 20220128160620557.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:06 20220128160620557.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:06 20220128160627501.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:06 20220128160627501.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:06 20220128160627501.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:06 20220128160630785.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:06 20220128160630785.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:06 20220128160630785.commit
+```
+
+If you are continuing in the same hudi-cli session, you can just execute "refresh" so that table state gets refreshed to 
+its latest state. If not, connect to the table again. 
+
+```shell
+./hudi-cli.sh
+
+connect --path /tmp/hudi_trips_cow/
+commits show
+set --conf SPARK_HOME=<SPARK_HOME>
+savepoints show
+╔═══════════════════╗
+║ SavepointTime     ║
+╠═══════════════════╣
+║ 20220128160245447 ║
+╚═══════════════════╝
+savepoint rollback --savepoint 20220128160245447 --sparkMaster local[2]
+```
+
+Hudi table should have been restored to the savepointed commit 20220128160245447. Both data files and timeline files should have 
+been deleted. 
+```shell
+ls -ltr /tmp/hudi_trips_cow/.hoodie
+total 152
+drwxr-xr-x  2 nsb  wheel    64 Jan 28 16:00 archived
+-rw-r--r--  1 nsb  wheel   546 Jan 28 16:00 hoodie.properties
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:00 20220128160040171.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:00 20220128160040171.inflight
+-rw-r--r--  1 nsb  wheel  4374 Jan 28 16:00 20220128160040171.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:01 20220128160124637.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:01 20220128160124637.inflight
+-rw-r--r--  1 nsb  wheel  4414 Jan 28 16:01 20220128160124637.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160226172.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160226172.inflight
+-rw-r--r--  1 nsb  wheel  4427 Jan 28 16:02 20220128160226172.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160229636.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160229636.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160229636.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160245447.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160245447.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160245447.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:05 20220128160245447.savepoint.inflight
+-rw-r--r--  1 nsb  wheel  1168 Jan 28 16:05 20220128160245447.savepoint
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:07 20220128160732437.restore.inflight
+-rw-r--r--  1 nsb  wheel  4152 Jan 28 16:07 20220128160732437.restore
+```
+
+Lets check the total record count in the table. Should match the records we had, just before we triggered the savepoint. 
+```scala
+val tripsSnapshotDF = spark.
+  read.
+  format("hudi").
+  load(basePath)
+tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
+
+spark.sql("select count(partitionpath, uuid) from  hudi_trips_snapshot ").show()
++--------------------------+
+|count(partitionpath, uuid)|
+  +--------------------------+
+|                        50|
+  +--------------------------+
+```
+
+As you could see, entire table state is restored back to the commit which was savepointed. Users can choose to trigger savepoint 
+at regular cadence and keep deleting older savepoints when new ones are created. Hudi-cli has a command "savepoint delete" 
+to assist in deleting a savepoint. Please do remember that cleaner may not clean the files that are savepointed. And so users 
+should ensure they delete the savepoints from time to time. If not, the storage reclamation may not happen. 
+
+Note: Savepoint and restore for MOR table is available only from 0.11. 
+
+
+
+
+
+
+
+
+
+
+
diff --git a/website/versioned_docs/version-0.11.1/docker_demo.md b/website/versioned_docs/version-0.11.1/docker_demo.md
new file mode 100644
index 0000000000..26d41251bc
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/docker_demo.md
@@ -0,0 +1,1429 @@
+---
+title: Docker Demo
+keywords: [ hudi, docker, demo]
+toc: true
+last_modified_at: 2019-12-30T15:59:57-04:00
+---
+
+## A Demo using docker containers
+
+Lets use a real world example to see how hudi works end to end. For this purpose, a self contained
+data infrastructure is brought up in a local docker cluster within your computer.
+
+The steps have been tested on a Mac laptop
+
+### Prerequisites
+
+  * Docker Setup :  For Mac, Please follow the steps as defined in [https://docs.docker.com/v17.12/docker-for-mac/install/]. For running Spark-SQL queries, please ensure atleast 6 GB and 4 CPUs are allocated to Docker (See Docker -> Preferences -> Advanced). Otherwise, spark-SQL queries could be killed because of memory issues.
+  * kcat : A command-line utility to publish/consume from kafka topics. Use `brew install kcat` to install kcat.
+  * /etc/hosts : The demo references many services running in container by the hostname. Add the following settings to /etc/hosts
+
+    ```java
+    127.0.0.1 adhoc-1
+    127.0.0.1 adhoc-2
+    127.0.0.1 namenode
+    127.0.0.1 datanode1
+    127.0.0.1 hiveserver
+    127.0.0.1 hivemetastore
+    127.0.0.1 kafkabroker
+    127.0.0.1 sparkmaster
+    127.0.0.1 zookeeper
+    ```
+  * Java : Java SE Development Kit 8.
+  * Maven : A build automation tool for Java projects.
+  * jq : A lightweight and flexible command-line JSON processor. Use `brew install jq` to install jq.
+  
+Also, this has not been tested on some environments like Docker on Windows.
+
+
+## Setting up Docker Cluster
+
+
+### Build Hudi
+
+The first step is to build hudi. **Note** This step builds hudi on default supported scala version - 2.11.
+```java
+cd <HUDI_WORKSPACE>
+mvn package -DskipTests
+```
+
+### Bringing up Demo Cluster
+
+The next step is to run the docker compose script and setup configs for bringing up the cluster.
+This should pull the docker images from docker hub and setup docker cluster.
+
+```java
+cd docker
+./setup_demo.sh
+....
+....
+....
+[+] Running 10/13
+⠿ Container zookeeper             Removed                 8.6s
+⠿ Container datanode1             Removed                18.3s
+⠿ Container trino-worker-1        Removed                50.7s
+⠿ Container spark-worker-1        Removed                16.7s
+⠿ Container adhoc-2               Removed                16.9s
+⠿ Container graphite              Removed                16.9s
+⠿ Container kafkabroker           Removed                14.1s
+⠿ Container adhoc-1               Removed                14.1s
+⠿ Container presto-worker-1       Removed                11.9s
+⠿ Container presto-coordinator-1  Removed                34.6s
+.......
+......
+[+] Running 17/17
+⠿ adhoc-1 Pulled                                          2.9s
+⠿ graphite Pulled                                         2.8s
+⠿ spark-worker-1 Pulled                                   3.0s
+⠿ kafka Pulled                                            2.9s
+⠿ datanode1 Pulled                                        2.9s
+⠿ hivemetastore Pulled                                    2.9s
+⠿ hiveserver Pulled                                       3.0s
+⠿ hive-metastore-postgresql Pulled                        2.8s
+⠿ presto-coordinator-1 Pulled                             2.9s
+⠿ namenode Pulled                                         2.9s
+⠿ trino-worker-1 Pulled                                   2.9s
+⠿ sparkmaster Pulled                                      2.9s
+⠿ presto-worker-1 Pulled                                  2.9s
+⠿ zookeeper Pulled                                        2.8s
+⠿ adhoc-2 Pulled                                          2.9s
+⠿ historyserver Pulled                                    2.9s
+⠿ trino-coordinator-1 Pulled                              2.9s
+[+] Running 17/17
+⠿ Container zookeeper                  Started           41.0s
+⠿ Container kafkabroker                Started           41.7s
+⠿ Container graphite                   Started           41.5s
+⠿ Container hive-metastore-postgresql  Running            0.0s
+⠿ Container namenode                   Running            0.0s
+⠿ Container hivemetastore              Running            0.0s
+⠿ Container trino-coordinator-1        Runni...           0.0s
+⠿ Container presto-coordinator-1       Star...           42.1s
+⠿ Container historyserver              Started           41.0s
+⠿ Container datanode1                  Started           49.9s
+⠿ Container hiveserver                 Running            0.0s
+⠿ Container trino-worker-1             Started           42.1s
+⠿ Container sparkmaster                Started           41.9s
+⠿ Container spark-worker-1             Started           50.2s
+⠿ Container adhoc-2                    Started           38.5s
+⠿ Container adhoc-1                    Started           38.5s
+⠿ Container presto-worker-1            Started           38.4s
+Copying spark default config and setting up configs
+Copying spark default config and setting up configs
+$ docker ps
+```
+
+At this point, the docker cluster will be up and running. The demo cluster brings up the following services
+
+   * HDFS Services (NameNode, DataNode)
+   * Spark Master and Worker
+   * Hive Services (Metastore, HiveServer2 along with PostgresDB)
+   * Kafka Broker and a Zookeeper Node (Kafka will be used as upstream source for the demo)
+   * Containers for Presto setup (Presto coordinator and worker)
+   * Containers for Trino setup (Trino coordinator and worker)
+   * Adhoc containers to run Hudi/Hive CLI commands
+
+## Demo
+
+Stock Tracker data will be used to showcase different Hudi query types and the effects of Compaction.
+
+Take a look at the directory `docker/demo/data`. There are 2 batches of stock data - each at 1 minute granularity.
+The first batch contains stocker tracker data for some stock symbols during the first hour of trading window
+(9:30 a.m to 10:30 a.m). The second batch contains tracker data for next 30 mins (10:30 - 11 a.m). Hudi will
+be used to ingest these batches to a table which will contain the latest stock tracker data at hour level granularity.
+The batches are windowed intentionally so that the second batch contains updates to some of the rows in the first batch.
+
+### Step 1 : Publish the first batch to Kafka
+
+Upload the first batch to Kafka topic 'stock ticks' `cat docker/demo/data/batch_1.json | kcat -b kafkabroker -t stock_ticks -P`
+
+To check if the new topic shows up, use
+```java
+kcat -b kafkabroker -L -J | jq .
+{
+  "originating_broker": {
+    "id": 1001,
+    "name": "kafkabroker:9092/1001"
+  },
+  "query": {
+    "topic": "*"
+  },
+  "brokers": [
+    {
+      "id": 1001,
+      "name": "kafkabroker:9092"
+    }
+  ],
+  "topics": [
+    {
+      "topic": "stock_ticks",
+      "partitions": [
+        {
+          "partition": 0,
+          "leader": 1001,
+          "replicas": [
+            {
+              "id": 1001
+            }
+          ],
+          "isrs": [
+            {
+              "id": 1001
+            }
+          ]
+        }
+      ]
+    }
+  ]
+}
+```
+
+### Step 2: Incrementally ingest data from Kafka topic
+
+Hudi comes with a tool named DeltaStreamer. This tool can connect to variety of data sources (including Kafka) to
+pull changes and apply to Hudi table using upsert/insert primitives. Here, we will use the tool to download
+json data from kafka topic and ingest to both COW and MOR tables we initialized in the previous step. This tool
+automatically initializes the tables in the file-system if they do not exist yet.
+
+```java
+docker exec -it adhoc-2 /bin/bash
+
+# Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_cow table in HDFS
+spark-submit \
+  --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
+  --table-type COPY_ON_WRITE \
+  --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
+  --source-ordering-field ts  \
+  --target-base-path /user/hive/warehouse/stock_ticks_cow \
+  --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties \
+  --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
+
+# Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_mor table in HDFS
+spark-submit \
+  --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
+  --table-type MERGE_ON_READ \
+  --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
+  --source-ordering-field ts \
+  --target-base-path /user/hive/warehouse/stock_ticks_mor \
+  --target-table stock_ticks_mor \
+  --props /var/demo/config/kafka-source.properties \
+  --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
+  --disable-compaction
+
+# As part of the setup (Look at setup_demo.sh), the configs needed for DeltaStreamer is uploaded to HDFS. The configs
+# contain mostly Kafa connectivity settings, the avro-schema to be used for ingesting along with key and partitioning fields.
+
+exit
+```
+
+You can use HDFS web-browser to look at the tables
+`http://namenode:50070/explorer.html#/user/hive/warehouse/stock_ticks_cow`.
+
+You can explore the new partition folder created in the table along with a "commit" / "deltacommit"
+file under .hoodie which signals a successful commit.
+
+There will be a similar setup when you browse the MOR table
+`http://namenode:50070/explorer.html#/user/hive/warehouse/stock_ticks_mor`
+
+
+### Step 3: Sync with Hive
+
+At this step, the tables are available in HDFS. We need to sync with Hive to create new Hive tables and add partitions
+inorder to run Hive queries against those tables.
+
+```java
+docker exec -it adhoc-2 /bin/bash
+
+# This command takes in HiveServer URL and COW Hudi table location in HDFS and sync the HDFS state to Hive
+/var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh \
+  --jdbc-url jdbc:hive2://hiveserver:10000 \
+  --user hive \
+  --pass hive \
+  --partitioned-by dt \
+  --base-path /user/hive/warehouse/stock_ticks_cow \
+  --database default \
+  --table stock_ticks_cow
+.....
+2020-01-25 19:51:28,953 INFO  [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(129)) - Sync complete for stock_ticks_cow
+.....
+
+# Now run hive-sync for the second data-set in HDFS using Merge-On-Read (MOR table type)
+/var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh \
+  --jdbc-url jdbc:hive2://hiveserver:10000 \
+  --user hive \
+  --pass hive \
+  --partitioned-by dt \
+  --base-path /user/hive/warehouse/stock_ticks_mor \
+  --database default \
+  --table stock_ticks_mor
+...
+2020-01-25 19:51:51,066 INFO  [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(129)) - Sync complete for stock_ticks_mor_ro
+...
+2020-01-25 19:51:51,569 INFO  [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(129)) - Sync complete for stock_ticks_mor_rt
+....
+
+exit
+```
+After executing the above command, you will notice
+
+1. A hive table named `stock_ticks_cow` created which supports Snapshot and Incremental queries on Copy On Write table.
+2. Two new tables `stock_ticks_mor_rt` and `stock_ticks_mor_ro` created for the Merge On Read table. The former
+supports Snapshot and Incremental queries (providing near-real time data) while the later supports ReadOptimized queries.
+
+
+### Step 4 (a): Run Hive Queries
+
+Run a hive query to find the latest timestamp ingested for stock symbol 'GOOG'. You will notice that both snapshot 
+(for both COW and MOR _rt table) and read-optimized queries (for MOR _ro table) give the same value "10:29 a.m" as Hudi create a
+parquet file for the first batch of data.
+
+```java
+docker exec -it adhoc-2 /bin/bash
+beeline -u jdbc:hive2://hiveserver:10000 \
+  --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat \
+  --hiveconf hive.stats.autogather=false
+
+# List Tables
+0: jdbc:hive2://hiveserver:10000> show tables;
++---------------------+--+
+|      tab_name       |
++---------------------+--+
+| stock_ticks_cow     |
+| stock_ticks_mor_ro  |
+| stock_ticks_mor_rt  |
++---------------------+--+
+3 rows selected (1.199 seconds)
+0: jdbc:hive2://hiveserver:10000>
+
+
+# Look at partitions that were added
+0: jdbc:hive2://hiveserver:10000> show partitions stock_ticks_mor_rt;
++----------------+--+
+|   partition    |
++----------------+--+
+| dt=2018-08-31  |
++----------------+--+
+1 row selected (0.24 seconds)
+
+
+# COPY-ON-WRITE Queries:
+=========================
+
+
+0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_cow group by symbol HAVING symbol = 'GOOG';
++---------+----------------------+--+
+| symbol  |         _c1          |
++---------+----------------------+--+
+| GOOG    | 2018-08-31 10:29:00  |
++---------+----------------------+--+
+
+Now, run a projection query:
+
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_cow where  symbol = 'GOOG';
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| 20180924221953       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924221953       | GOOG    | 2018-08-31 10:29:00  | 3391    | 1230.1899  | 1230.085  |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+
+
+# Merge-On-Read Queries:
+==========================
+
+Lets run similar queries against M-O-R table. Lets look at both 
+ReadOptimized and Snapshot(realtime data) queries supported by M-O-R table
+
+# Run ReadOptimized Query. Notice that the latest timestamp is 10:29
+0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
+WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
++---------+----------------------+--+
+| symbol  |         _c1          |
++---------+----------------------+--+
+| GOOG    | 2018-08-31 10:29:00  |
++---------+----------------------+--+
+1 row selected (6.326 seconds)
+
+
+# Run Snapshot Query. Notice that the latest timestamp is again 10:29
+
+0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG';
+WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
++---------+----------------------+--+
+| symbol  |         _c1          |
++---------+----------------------+--+
+| GOOG    | 2018-08-31 10:29:00  |
++---------+----------------------+--+
+1 row selected (1.606 seconds)
+
+
+# Run Read Optimized and Snapshot project queries
+
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| 20180924222155       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924222155       | GOOG    | 2018-08-31 10:29:00  | 3391    | 1230.1899  | 1230.085  |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_rt where  symbol = 'GOOG';
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| 20180924222155       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924222155       | GOOG    | 2018-08-31 10:29:00  | 3391    | 1230.1899  | 1230.085  |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+
+exit
+```
+
+### Step 4 (b): Run Spark-SQL Queries
+Hudi support Spark as query processor just like Hive. Here are the same hive queries
+running in spark-sql
+
+```java
+docker exec -it adhoc-1 /bin/bash
+$SPARK_INSTALL/bin/spark-shell \
+  --jars $HUDI_SPARK_BUNDLE \
+  --master local[2] \
+  --driver-class-path $HADOOP_CONF_DIR \
+  --conf spark.sql.hive.convertMetastoreParquet=false \
+  --deploy-mode client \
+  --driver-memory 1G \
+  --executor-memory 3G \
+  --num-executors 1
+...
+
+Welcome to
+      ____              __
+     / __/__  ___ _____/ /__
+    _\ \/ _ \/ _ `/ __/  '_/
+   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
+      /_/
+
+Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)
+Type in expressions to have them evaluated.
+Type :help for more information.
+
+scala> spark.sql("show tables").show(100, false)
++--------+------------------+-----------+
+|database|tableName         |isTemporary|
++--------+------------------+-----------+
+|default |stock_ticks_cow   |false      |
+|default |stock_ticks_mor_ro|false      |
+|default |stock_ticks_mor_rt|false      |
++--------+------------------+-----------+
+
+# Copy-On-Write Table
+
+## Run max timestamp query against COW table
+
+scala> spark.sql("select symbol, max(ts) from stock_ticks_cow group by symbol HAVING symbol = 'GOOG'").show(100, false)
+[Stage 0:>                                                          (0 + 1) / 1]SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
+SLF4J: Defaulting to no-operation (NOP) logger implementation
+SLF4J: See http://www.slf4j.org/codes#StaticLoggerBinder for further details.
++------+-------------------+
+|symbol|max(ts)            |
++------+-------------------+
+|GOOG  |2018-08-31 10:29:00|
++------+-------------------+
+
+## Projection Query
+
+scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_cow where  symbol = 'GOOG'").show(100, false)
++-------------------+------+-------------------+------+---------+--------+
+|_hoodie_commit_time|symbol|ts                 |volume|open     |close   |
++-------------------+------+-------------------+------+---------+--------+
+|20180924221953     |GOOG  |2018-08-31 09:59:00|6330  |1230.5   |1230.02 |
+|20180924221953     |GOOG  |2018-08-31 10:29:00|3391  |1230.1899|1230.085|
++-------------------+------+-------------------+------+---------+--------+
+
+# Merge-On-Read Queries:
+==========================
+
+Lets run similar queries against M-O-R table. Lets look at both
+ReadOptimized and Snapshot queries supported by M-O-R table
+
+# Run ReadOptimized Query. Notice that the latest timestamp is 10:29
+scala> spark.sql("select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG'").show(100, false)
++------+-------------------+
+|symbol|max(ts)            |
++------+-------------------+
+|GOOG  |2018-08-31 10:29:00|
++------+-------------------+
+
+
+# Run Snapshot Query. Notice that the latest timestamp is again 10:29
+
+scala> spark.sql("select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG'").show(100, false)
++------+-------------------+
+|symbol|max(ts)            |
++------+-------------------+
+|GOOG  |2018-08-31 10:29:00|
++------+-------------------+
+
+# Run Read Optimized and Snapshot project queries
+
+scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG'").show(100, false)
++-------------------+------+-------------------+------+---------+--------+
+|_hoodie_commit_time|symbol|ts                 |volume|open     |close   |
++-------------------+------+-------------------+------+---------+--------+
+|20180924222155     |GOOG  |2018-08-31 09:59:00|6330  |1230.5   |1230.02 |
+|20180924222155     |GOOG  |2018-08-31 10:29:00|3391  |1230.1899|1230.085|
++-------------------+------+-------------------+------+---------+--------+
+
+scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_rt where  symbol = 'GOOG'").show(100, false)
++-------------------+------+-------------------+------+---------+--------+
+|_hoodie_commit_time|symbol|ts                 |volume|open     |close   |
++-------------------+------+-------------------+------+---------+--------+
+|20180924222155     |GOOG  |2018-08-31 09:59:00|6330  |1230.5   |1230.02 |
+|20180924222155     |GOOG  |2018-08-31 10:29:00|3391  |1230.1899|1230.085|
++-------------------+------+-------------------+------+---------+--------+
+```
+
+### Step 4 (c): Run Presto Queries
+
+Here are the Presto queries for similar Hive and Spark queries. Currently, Presto does not support snapshot or incremental queries on Hudi tables.
+
+```java
+docker exec -it presto-worker-1 presto --server presto-coordinator-1:8090
+presto> show catalogs;
+  Catalog
+-----------
+ hive
+ jmx
+ localfile
+ system
+(4 rows)
+
+Query 20190817_134851_00000_j8rcz, FINISHED, 1 node
+Splits: 19 total, 19 done (100.00%)
+0:04 [0 rows, 0B] [0 rows/s, 0B/s]
+
+presto> use hive.default;
+USE
+presto:default> show tables;
+       Table
+--------------------
+ stock_ticks_cow
+ stock_ticks_mor_ro
+ stock_ticks_mor_rt
+(3 rows)
+
+Query 20190822_181000_00001_segyw, FINISHED, 2 nodes
+Splits: 19 total, 19 done (100.00%)
+0:05 [3 rows, 99B] [0 rows/s, 18B/s]
+
+
+# COPY-ON-WRITE Queries:
+=========================
+
+
+presto:default> select symbol, max(ts) from stock_ticks_cow group by symbol HAVING symbol = 'GOOG';
+ symbol |        _col1
+--------+---------------------
+ GOOG   | 2018-08-31 10:29:00
+(1 row)
+
+Query 20190822_181011_00002_segyw, FINISHED, 1 node
+Splits: 49 total, 49 done (100.00%)
+0:12 [197 rows, 613B] [16 rows/s, 50B/s]
+
+presto:default> select "_hoodie_commit_time", symbol, ts, volume, open, close from stock_ticks_cow where symbol = 'GOOG';
+ _hoodie_commit_time | symbol |         ts          | volume |   open    |  close
+---------------------+--------+---------------------+--------+-----------+----------
+ 20190822180221      | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02
+ 20190822180221      | GOOG   | 2018-08-31 10:29:00 |   3391 | 1230.1899 | 1230.085
+(2 rows)
+
+Query 20190822_181141_00003_segyw, FINISHED, 1 node
+Splits: 17 total, 17 done (100.00%)
+0:02 [197 rows, 613B] [109 rows/s, 341B/s]
+
+
+# Merge-On-Read Queries:
+==========================
+
+Lets run similar queries against M-O-R table. 
+
+# Run ReadOptimized Query. Notice that the latest timestamp is 10:29
+    presto:default> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
+ symbol |        _col1
+--------+---------------------
+ GOOG   | 2018-08-31 10:29:00
+(1 row)
+
+Query 20190822_181158_00004_segyw, FINISHED, 1 node
+Splits: 49 total, 49 done (100.00%)
+0:02 [197 rows, 613B] [110 rows/s, 343B/s]
+
+
+presto:default>  select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
+ _hoodie_commit_time | symbol |         ts          | volume |   open    |  close
+---------------------+--------+---------------------+--------+-----------+----------
+ 20190822180250      | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02
+ 20190822180250      | GOOG   | 2018-08-31 10:29:00 |   3391 | 1230.1899 | 1230.085
+(2 rows)
+
+Query 20190822_181256_00006_segyw, FINISHED, 1 node
+Splits: 17 total, 17 done (100.00%)
+0:02 [197 rows, 613B] [92 rows/s, 286B/s]
+
+presto:default> exit
+```
+
+### Step 4 (d): Run Trino Queries
+
+Here are the similar queries with Trino. Currently, Trino does not support snapshot or incremental queries on Hudi tables.
+```java
+docker exec -it adhoc-2 trino --server trino-coordinator-1:8091
+trino> show catalogs;
+ Catalog 
+---------
+ hive    
+ system  
+(2 rows)
+
+Query 20220112_055038_00000_sac73, FINISHED, 1 node
+Splits: 19 total, 19 done (100.00%)
+3.74 [0 rows, 0B] [0 rows/s, 0B/s]
+
+trino> use hive.default;
+USE
+trino:default> show tables;
+       Table        
+--------------------
+ stock_ticks_cow    
+ stock_ticks_mor_ro 
+ stock_ticks_mor_rt 
+(3 rows)
+
+Query 20220112_055050_00003_sac73, FINISHED, 2 nodes
+Splits: 19 total, 19 done (100.00%)
+1.84 [3 rows, 102B] [1 rows/s, 55B/s]
+
+# COPY-ON-WRITE Queries:
+=========================
+    
+trino:default> select symbol, max(ts) from stock_ticks_cow group by symbol HAVING symbol = 'GOOG';
+ symbol |        _col1        
+--------+---------------------
+ GOOG   | 2018-08-31 10:29:00 
+(1 row)
+
+Query 20220112_055101_00005_sac73, FINISHED, 1 node
+Splits: 49 total, 49 done (100.00%)
+4.08 [197 rows, 442KB] [48 rows/s, 108KB/s]
+
+trino:default> select "_hoodie_commit_time", symbol, ts, volume, open, close from stock_ticks_cow where symbol = 'GOOG';
+ _hoodie_commit_time | symbol |         ts          | volume |   open    |  close   
+---------------------+--------+---------------------+--------+-----------+----------
+ 20220112054822108   | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02 
+ 20220112054822108   | GOOG   | 2018-08-31 10:29:00 |   3391 | 1230.1899 | 1230.085 
+(2 rows)
+
+Query 20220112_055113_00006_sac73, FINISHED, 1 node
+Splits: 17 total, 17 done (100.00%)
+0.40 [197 rows, 450KB] [487 rows/s, 1.09MB/s]
+
+# Merge-On-Read Queries:
+==========================
+
+Lets run similar queries against MOR table.
+
+# Run ReadOptimized Query. Notice that the latest timestamp is 10:29
+    
+trino:default> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
+ symbol |        _col1        
+--------+---------------------
+ GOOG   | 2018-08-31 10:29:00 
+(1 row)
+
+Query 20220112_055125_00007_sac73, FINISHED, 1 node
+Splits: 49 total, 49 done (100.00%)
+0.50 [197 rows, 442KB] [395 rows/s, 888KB/s]
+
+trino:default> select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
+ _hoodie_commit_time | symbol |         ts          | volume |   open    |  close   
+---------------------+--------+---------------------+--------+-----------+----------
+ 20220112054844841   | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02 
+ 20220112054844841   | GOOG   | 2018-08-31 10:29:00 |   3391 | 1230.1899 | 1230.085 
+(2 rows)
+
+Query 20220112_055136_00008_sac73, FINISHED, 1 node
+Splits: 17 total, 17 done (100.00%)
+0.49 [197 rows, 450KB] [404 rows/s, 924KB/s]
+
+trino:default> exit
+```
+
+### Step 5: Upload second batch to Kafka and run DeltaStreamer to ingest
+
+Upload the second batch of data and ingest this batch using delta-streamer. As this batch does not bring in any new
+partitions, there is no need to run hive-sync
+
+```java
+cat docker/demo/data/batch_2.json | kcat -b kafkabroker -t stock_ticks -P
+
+# Within Docker container, run the ingestion command
+docker exec -it adhoc-2 /bin/bash
+
+# Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_cow table in HDFS
+spark-submit \
+  --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
+  --table-type COPY_ON_WRITE \
+  --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
+  --source-ordering-field ts \
+  --target-base-path /user/hive/warehouse/stock_ticks_cow \
+  --target-table stock_ticks_cow \
+  --props /var/demo/config/kafka-source.properties \
+  --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
+
+# Run the following spark-submit command to execute the delta-streamer and ingest to stock_ticks_mor table in HDFS
+spark-submit \
+  --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
+  --table-type MERGE_ON_READ \
+  --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
+  --source-ordering-field ts \
+  --target-base-path /user/hive/warehouse/stock_ticks_mor \
+  --target-table stock_ticks_mor \
+  --props /var/demo/config/kafka-source.properties \
+  --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
+  --disable-compaction
+
+exit
+```
+
+With Copy-On-Write table, the second ingestion by DeltaStreamer resulted in a new version of Parquet file getting created.
+See `http://namenode:50070/explorer.html#/user/hive/warehouse/stock_ticks_cow/2018/08/31`
+
+With Merge-On-Read table, the second ingestion merely appended the batch to an unmerged delta (log) file.
+Take a look at the HDFS filesystem to get an idea: `http://namenode:50070/explorer.html#/user/hive/warehouse/stock_ticks_mor/2018/08/31`
+
+### Step 6 (a): Run Hive Queries
+
+With Copy-On-Write table, the Snapshot query immediately sees the changes as part of second batch once the batch
+got committed as each ingestion creates newer versions of parquet files.
+
+With Merge-On-Read table, the second ingestion merely appended the batch to an unmerged delta (log) file.
+This is the time, when ReadOptimized and Snapshot queries will provide different results. ReadOptimized query will still
+return "10:29 am" as it will only read from the Parquet file. Snapshot query will do on-the-fly merge and return
+latest committed data which is "10:59 a.m".
+
+```java
+docker exec -it adhoc-2 /bin/bash
+beeline -u jdbc:hive2://hiveserver:10000 \
+  --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat \
+  --hiveconf hive.stats.autogather=false
+
+# Copy On Write Table:
+
+0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_cow group by symbol HAVING symbol = 'GOOG';
+WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
++---------+----------------------+--+
+| symbol  |         _c1          |
++---------+----------------------+--+
+| GOOG    | 2018-08-31 10:59:00  |
++---------+----------------------+--+
+1 row selected (1.932 seconds)
+
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_cow where  symbol = 'GOOG';
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| 20180924221953       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924224524       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+
+As you can notice, the above queries now reflect the changes that came as part of ingesting second batch.
+
+
+# Merge On Read Table:
+
+# Read Optimized Query
+0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
+WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
++---------+----------------------+--+
+| symbol  |         _c1          |
++---------+----------------------+--+
+| GOOG    | 2018-08-31 10:29:00  |
++---------+----------------------+--+
+1 row selected (1.6 seconds)
+
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| 20180924222155       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924222155       | GOOG    | 2018-08-31 10:29:00  | 3391    | 1230.1899  | 1230.085  |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+
+# Snapshot Query
+0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG';
+WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
++---------+----------------------+--+
+| symbol  |         _c1          |
++---------+----------------------+--+
+| GOOG    | 2018-08-31 10:59:00  |
++---------+----------------------+--+
+
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_rt where  symbol = 'GOOG';
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| 20180924222155       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924224537       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+
+exit
+```
+
+### Step 6 (b): Run Spark SQL Queries
+
+Running the same queries in Spark-SQL:
+
+```java
+docker exec -it adhoc-1 /bin/bash
+$SPARK_INSTALL/bin/spark-shell \
+  --jars $HUDI_SPARK_BUNDLE \
+  --driver-class-path $HADOOP_CONF_DIR \
+  --conf spark.sql.hive.convertMetastoreParquet=false \
+  --deploy-mode client \
+  --driver-memory 1G \
+  --master local[2] \
+  --executor-memory 3G \
+  --num-executors 1
+
+# Copy On Write Table:
+
+scala> spark.sql("select symbol, max(ts) from stock_ticks_cow group by symbol HAVING symbol = 'GOOG'").show(100, false)
++------+-------------------+
+|symbol|max(ts)            |
++------+-------------------+
+|GOOG  |2018-08-31 10:59:00|
++------+-------------------+
+
+scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_cow where  symbol = 'GOOG'").show(100, false)
+
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| 20180924221953       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924224524       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+
+As you can notice, the above queries now reflect the changes that came as part of ingesting second batch.
+
+
+# Merge On Read Table:
+
+# Read Optimized Query
+scala> spark.sql("select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG'").show(100, false)
++---------+----------------------+
+| symbol  |         _c1          |
++---------+----------------------+
+| GOOG    | 2018-08-31 10:29:00  |
++---------+----------------------+
+1 row selected (1.6 seconds)
+
+scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG'").show(100, false)
++----------------------+---------+----------------------+---------+------------+-----------+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+
+| 20180924222155       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924222155       | GOOG    | 2018-08-31 10:29:00  | 3391    | 1230.1899  | 1230.085  |
++----------------------+---------+----------------------+---------+------------+-----------+
+
+# Snapshot Query
+scala> spark.sql("select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG'").show(100, false)
++---------+----------------------+
+| symbol  |         _c1          |
++---------+----------------------+
+| GOOG    | 2018-08-31 10:59:00  |
++---------+----------------------+
+
+scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_rt where  symbol = 'GOOG'").show(100, false)
++----------------------+---------+----------------------+---------+------------+-----------+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+
+| 20180924222155       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924224537       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
++----------------------+---------+----------------------+---------+------------+-----------+
+
+exit
+```
+
+### Step 6 (c): Run Presto Queries
+
+Running the same queries on Presto for ReadOptimized queries. 
+
+```java
+docker exec -it presto-worker-1 presto --server presto-coordinator-1:8090
+presto> use hive.default;
+USE
+
+# Copy On Write Table:
+
+presto:default>select symbol, max(ts) from stock_ticks_cow group by symbol HAVING symbol = 'GOOG';
+ symbol |        _col1
+--------+---------------------
+ GOOG   | 2018-08-31 10:59:00
+(1 row)
+
+Query 20190822_181530_00007_segyw, FINISHED, 1 node
+Splits: 49 total, 49 done (100.00%)
+0:02 [197 rows, 613B] [125 rows/s, 389B/s]
+
+presto:default>select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_cow where  symbol = 'GOOG';
+ _hoodie_commit_time | symbol |         ts          | volume |   open    |  close
+---------------------+--------+---------------------+--------+-----------+----------
+ 20190822180221      | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02
+ 20190822181433      | GOOG   | 2018-08-31 10:59:00 |   9021 | 1227.1993 | 1227.215
+(2 rows)
+
+Query 20190822_181545_00008_segyw, FINISHED, 1 node
+Splits: 17 total, 17 done (100.00%)
+0:02 [197 rows, 613B] [106 rows/s, 332B/s]
+
+As you can notice, the above queries now reflect the changes that came as part of ingesting second batch.
+
+
+# Merge On Read Table:
+
+# Read Optimized Query
+presto:default> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
+ symbol |        _col1
+--------+---------------------
+ GOOG   | 2018-08-31 10:29:00
+(1 row)
+
+Query 20190822_181602_00009_segyw, FINISHED, 1 node
+Splits: 49 total, 49 done (100.00%)
+0:01 [197 rows, 613B] [139 rows/s, 435B/s]
+
+presto:default>select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
+ _hoodie_commit_time | symbol |         ts          | volume |   open    |  close
+---------------------+--------+---------------------+--------+-----------+----------
+ 20190822180250      | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02
+ 20190822180250      | GOOG   | 2018-08-31 10:29:00 |   3391 | 1230.1899 | 1230.085
+(2 rows)
+
+Query 20190822_181615_00010_segyw, FINISHED, 1 node
+Splits: 17 total, 17 done (100.00%)
+0:01 [197 rows, 613B] [154 rows/s, 480B/s]
+
+presto:default> exit
+```
+
+### Step 6 (d): Run Trino Queries
+
+Running the same queries on Trino for Read-Optimized queries.
+
+```java
+docker exec -it adhoc-2 trino --server trino-coordinator-1:8091
+trino> use hive.default;
+USE
+    
+# Copy On Write Table:
+
+trino:default> select symbol, max(ts) from stock_ticks_cow group by symbol HAVING symbol = 'GOOG';
+ symbol |        _col1        
+--------+---------------------
+ GOOG   | 2018-08-31 10:59:00 
+(1 row)
+
+Query 20220112_055443_00012_sac73, FINISHED, 1 node
+Splits: 49 total, 49 done (100.00%)
+0.63 [197 rows, 442KB] [310 rows/s, 697KB/s]
+
+trino:default> select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_cow where  symbol = 'GOOG';
+ _hoodie_commit_time | symbol |         ts          | volume |   open    |  close   
+---------------------+--------+---------------------+--------+-----------+----------
+ 20220112054822108   | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02 
+ 20220112055352654   | GOOG   | 2018-08-31 10:59:00 |   9021 | 1227.1993 | 1227.215 
+(2 rows)
+
+Query 20220112_055450_00013_sac73, FINISHED, 1 node
+Splits: 17 total, 17 done (100.00%)
+0.65 [197 rows, 450KB] [303 rows/s, 692KB/s]
+
+As you can notice, the above queries now reflect the changes that came as part of ingesting second batch.
+
+# Merge On Read Table:
+# Read Optimized Query
+    
+trino:default> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
+ symbol |        _col1        
+--------+---------------------
+ GOOG   | 2018-08-31 10:29:00 
+(1 row)
+
+Query 20220112_055500_00014_sac73, FINISHED, 1 node
+Splits: 49 total, 49 done (100.00%)
+0.59 [197 rows, 442KB] [336 rows/s, 756KB/s]
+
+trino:default> select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
+ _hoodie_commit_time | symbol |         ts          | volume |   open    |  close   
+---------------------+--------+---------------------+--------+-----------+----------
+ 20220112054844841   | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02 
+ 20220112054844841   | GOOG   | 2018-08-31 10:29:00 |   3391 | 1230.1899 | 1230.085 
+(2 rows)
+
+Query 20220112_055506_00015_sac73, FINISHED, 1 node
+Splits: 17 total, 17 done (100.00%)
+0.35 [197 rows, 450KB] [556 rows/s, 1.24MB/s]
+
+trino:default> exit
+```
+
+### Step 7 (a): Incremental Query for COPY-ON-WRITE Table
+
+With 2 batches of data ingested, lets showcase the support for incremental queries in Hudi Copy-On-Write tables
+
+Lets take the same projection query example
+
+```java
+docker exec -it adhoc-2 /bin/bash
+beeline -u jdbc:hive2://hiveserver:10000 \
+  --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat \
+  --hiveconf hive.stats.autogather=false
+
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_cow where  symbol = 'GOOG';
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| 20180924064621       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924065039       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+```
+
+As you notice from the above queries, there are 2 commits - 20180924064621 and 20180924065039 in timeline order.
+When you follow the steps, you will be getting different timestamps for commits. Substitute them
+in place of the above timestamps.
+
+To show the effects of incremental-query, let us assume that a reader has already seen the changes as part of
+ingesting first batch. Now, for the reader to see effect of the second batch, he/she has to keep the start timestamp to
+the commit time of the first batch (20180924064621) and run incremental query
+
+Hudi incremental mode provides efficient scanning for incremental queries by filtering out files that do not have any
+candidate rows using hudi-managed metadata.
+
+```java
+docker exec -it adhoc-2 /bin/bash
+beeline -u jdbc:hive2://hiveserver:10000 \
+  --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat \
+  --hiveconf hive.stats.autogather=false
+
+0: jdbc:hive2://hiveserver:10000> set hoodie.stock_ticks_cow.consume.mode=INCREMENTAL;
+No rows affected (0.009 seconds)
+0: jdbc:hive2://hiveserver:10000> set hoodie.stock_ticks_cow.consume.max.commits=3;
+No rows affected (0.009 seconds)
+0: jdbc:hive2://hiveserver:10000> set hoodie.stock_ticks_cow.consume.start.timestamp=20180924064621;
+```
+
+With the above setting, file-ids that do not have any updates from the commit 20180924065039 is filtered out without scanning.
+Here is the incremental query :
+
+```java
+0: jdbc:hive2://hiveserver:10000>
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_cow where  symbol = 'GOOG' and `_hoodie_commit_time` > '20180924064621';
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| 20180924065039       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+1 row selected (0.83 seconds)
+0: jdbc:hive2://hiveserver:10000>
+```
+
+### Step 7 (b): Incremental Query with Spark SQL:
+
+```java
+docker exec -it adhoc-1 /bin/bash
+$SPARK_INSTALL/bin/spark-shell \
+  --jars $HUDI_SPARK_BUNDLE \
+  --driver-class-path $HADOOP_CONF_DIR \
+  --conf spark.sql.hive.convertMetastoreParquet=false \
+  --deploy-mode client \
+  --driver-memory 1G \
+  --master local[2] \
+  --executor-memory 3G \
+  --num-executors 1
+
+Welcome to
+      ____              __
+     / __/__  ___ _____/ /__
+    _\ \/ _ \/ _ `/ __/  '_/
+   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
+      /_/
+
+Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)
+Type in expressions to have them evaluated.
+Type :help for more information.
+
+scala> import org.apache.hudi.DataSourceReadOptions
+import org.apache.hudi.DataSourceReadOptions
+
+# In the below query, 20180925045257 is the first commit's timestamp
+scala> val hoodieIncViewDF =  spark.read.format("org.apache.hudi").option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY, DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL).option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "20180924064621").load("/user/hive/warehouse/stock_ticks_cow")
+SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
+SLF4J: Defaulting to no-operation (NOP) logger implementation
+SLF4J: See http://www.slf4j.org/codes#StaticLoggerBinder for further details.
+hoodieIncViewDF: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 15 more fields]
+
+scala> hoodieIncViewDF.registerTempTable("stock_ticks_cow_incr_tmp1")
+warning: there was one deprecation warning; re-run with -deprecation for details
+
+scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_cow_incr_tmp1 where  symbol = 'GOOG'").show(100, false);
++----------------------+---------+----------------------+---------+------------+-----------+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+
+| 20180924065039       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
++----------------------+---------+----------------------+---------+------------+-----------+
+```
+
+### Step 8: Schedule and Run Compaction for Merge-On-Read table
+
+Lets schedule and run a compaction to create a new version of columnar  file so that read-optimized readers will see fresher data.
+Again, You can use Hudi CLI to manually schedule and run compaction
+
+```java
+docker exec -it adhoc-1 /bin/bash
+root@adhoc-1:/opt# /var/hoodie/ws/hudi-cli/hudi-cli.sh
+...
+Table command getting loaded
+HoodieSplashScreen loaded
+===================================================================
+*         ___                          ___                        *
+*        /\__\          ___           /\  \           ___         *
+*       / /  /         /\__\         /  \  \         /\  \        *
+*      / /__/         / /  /        / /\ \  \        \ \  \       *
+*     /  \  \ ___    / /  /        / /  \ \__\       /  \__\      *
+*    / /\ \  /\__\  / /__/  ___   / /__/ \ |__|     / /\/__/      *
+*    \/  \ \/ /  /  \ \  \ /\__\  \ \  \ / /  /  /\/ /  /         *
+*         \  /  /    \ \  / /  /   \ \  / /  /   \  /__/          *
+*         / /  /      \ \/ /  /     \ \/ /  /     \ \__\          *
+*        / /  /        \  /  /       \  /  /       \/__/          *
+*        \/__/          \/__/         \/__/    Apache Hudi CLI    *
+*                                                                 *
+===================================================================
+
+Welcome to Apache Hudi CLI. Please type help if you are looking for help.
+hudi->connect --path /user/hive/warehouse/stock_ticks_mor
+18/09/24 06:59:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+18/09/24 06:59:35 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
+18/09/24 06:59:35 INFO util.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://namenode:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1261652683_11, ugi=root (auth:SIMPLE)]]]
+18/09/24 06:59:35 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
+18/09/24 06:59:36 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
+Metadata for table stock_ticks_mor loaded
+hoodie:stock_ticks_mor->compactions show all
+20/02/10 03:41:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[20200210015059__clean__COMPLETED], [20200210015059__deltacommit__COMPLETED], [20200210022758__clean__COMPLETED], [20200210022758__deltacommit__COMPLETED], [==>20200210023843__compaction__REQUESTED]]
+___________________________________________________________________
+| Compaction Instant Time| State    | Total FileIds to be Compacted|
+|==================================================================|
+
+# Schedule a compaction. This will use Spark Launcher to schedule compaction
+hoodie:stock_ticks_mor->compaction schedule --hoodieConfigs hoodie.compact.inline.max.delta.commits=1
+....
+Compaction successfully completed for 20180924070031
+
+# Now refresh and check again. You will see that there is a new compaction requested
+
+hoodie:stock_ticks->refresh
+18/09/24 07:01:16 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
+18/09/24 07:01:16 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
+18/09/24 07:01:16 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
+Metadata for table stock_ticks_mor loaded
+
+hoodie:stock_ticks_mor->compactions show all
+18/09/24 06:34:12 INFO timeline.HoodieActiveTimeline: Loaded instants [[20180924041125__clean__COMPLETED], [20180924041125__deltacommit__COMPLETED], [20180924042735__clean__COMPLETED], [20180924042735__deltacommit__COMPLETED], [==>20180924063245__compaction__REQUESTED]]
+___________________________________________________________________
+| Compaction Instant Time| State    | Total FileIds to be Compacted|
+|==================================================================|
+| 20180924070031         | REQUESTED| 1                            |
+
+# Execute the compaction. The compaction instant value passed below must be the one displayed in the above "compactions show all" query
+hoodie:stock_ticks_mor->compaction run --compactionInstant  20180924070031 --parallelism 2 --sparkMemory 1G  --schemaFilePath /var/demo/config/schema.avsc --retry 1  
+....
+Compaction successfully completed for 20180924070031
+
+## Now check if compaction is completed
+
+hoodie:stock_ticks_mor->refresh
+18/09/24 07:03:00 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
+18/09/24 07:03:00 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
+18/09/24 07:03:00 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
+Metadata for table stock_ticks_mor loaded
+
+hoodie:stock_ticks->compactions show all
+18/09/24 07:03:15 INFO timeline.HoodieActiveTimeline: Loaded instants [[20180924064636__clean__COMPLETED], [20180924064636__deltacommit__COMPLETED], [20180924065057__clean__COMPLETED], [20180924065057__deltacommit__COMPLETED], [20180924070031__commit__COMPLETED]]
+___________________________________________________________________
+| Compaction Instant Time| State    | Total FileIds to be Compacted|
+|==================================================================|
+| 20180924070031         | COMPLETED| 1                            |
+
+```
+
+### Step 9: Run Hive Queries including incremental queries
+
+You will see that both ReadOptimized and Snapshot queries will show the latest committed data.
+Lets also run the incremental query for MOR table.
+From looking at the below query output, it will be clear that the fist commit time for the MOR table is 20180924064636
+and the second commit time is 20180924070031
+
+```java
+docker exec -it adhoc-2 /bin/bash
+beeline -u jdbc:hive2://hiveserver:10000 \
+  --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat \
+  --hiveconf hive.stats.autogather=false
+
+# Read Optimized Query
+0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
+WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
++---------+----------------------+--+
+| symbol  |         _c1          |
++---------+----------------------+--+
+| GOOG    | 2018-08-31 10:59:00  |
++---------+----------------------+--+
+1 row selected (1.6 seconds)
+
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| 20180924064636       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924070031       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+
+# Snapshot Query
+0: jdbc:hive2://hiveserver:10000> select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG';
+WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
++---------+----------------------+--+
+| symbol  |         _c1          |
++---------+----------------------+--+
+| GOOG    | 2018-08-31 10:59:00  |
++---------+----------------------+--+
+
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_rt where  symbol = 'GOOG';
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| 20180924064636       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924070031       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+
+# Incremental Query:
+
+0: jdbc:hive2://hiveserver:10000> set hoodie.stock_ticks_mor.consume.mode=INCREMENTAL;
+No rows affected (0.008 seconds)
+# Max-Commits covers both second batch and compaction commit
+0: jdbc:hive2://hiveserver:10000> set hoodie.stock_ticks_mor.consume.max.commits=3;
+No rows affected (0.007 seconds)
+0: jdbc:hive2://hiveserver:10000> set hoodie.stock_ticks_mor.consume.start.timestamp=20180924064636;
+No rows affected (0.013 seconds)
+# Query:
+0: jdbc:hive2://hiveserver:10000> select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG' and `_hoodie_commit_time` > '20180924064636';
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+| 20180924070031       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
++----------------------+---------+----------------------+---------+------------+-----------+--+
+
+exit
+```
+
+### Step 10: Read Optimized and Snapshot queries for MOR with Spark-SQL after compaction
+
+```java
+docker exec -it adhoc-1 /bin/bash
+$SPARK_INSTALL/bin/spark-shell \
+  --jars $HUDI_SPARK_BUNDLE \
+  --driver-class-path $HADOOP_CONF_DIR \
+  --conf spark.sql.hive.convertMetastoreParquet=false \
+  --deploy-mode client \
+  --driver-memory 1G \
+  --master local[2] \
+  --executor-memory 3G \
+  --num-executors 1
+
+# Read Optimized Query
+scala> spark.sql("select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG'").show(100, false)
++---------+----------------------+
+| symbol  |        max(ts)       |
++---------+----------------------+
+| GOOG    | 2018-08-31 10:59:00  |
++---------+----------------------+
+1 row selected (1.6 seconds)
+
+scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG'").show(100, false)
++----------------------+---------+----------------------+---------+------------+-----------+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+
+| 20180924064636       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924070031       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
++----------------------+---------+----------------------+---------+------------+-----------+
+
+# Snapshot Query
+scala> spark.sql("select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG'").show(100, false)
++---------+----------------------+
+| symbol  |     max(ts)          |
++---------+----------------------+
+| GOOG    | 2018-08-31 10:59:00  |
++---------+----------------------+
+
+scala> spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_rt where  symbol = 'GOOG'").show(100, false)
++----------------------+---------+----------------------+---------+------------+-----------+
+| _hoodie_commit_time  | symbol  |          ts          | volume  |    open    |   close   |
++----------------------+---------+----------------------+---------+------------+-----------+
+| 20180924064636       | GOOG    | 2018-08-31 09:59:00  | 6330    | 1230.5     | 1230.02   |
+| 20180924070031       | GOOG    | 2018-08-31 10:59:00  | 9021    | 1227.1993  | 1227.215  |
++----------------------+---------+----------------------+---------+------------+-----------+
+```
+
+### Step 11:  Presto Read Optimized queries on MOR table after compaction
+
+```java
+docker exec -it presto-worker-1 presto --server presto-coordinator-1:8090
+presto> use hive.default;
+USE
+
+# Read Optimized Query
+resto:default> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
+  symbol |        _col1
+--------+---------------------
+ GOOG   | 2018-08-31 10:59:00
+(1 row)
+
+Query 20190822_182319_00011_segyw, FINISHED, 1 node
+Splits: 49 total, 49 done (100.00%)
+0:01 [197 rows, 613B] [133 rows/s, 414B/s]
+
+presto:default> select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
+ _hoodie_commit_time | symbol |         ts          | volume |   open    |  close
+---------------------+--------+---------------------+--------+-----------+----------
+ 20190822180250      | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02
+ 20190822181944      | GOOG   | 2018-08-31 10:59:00 |   9021 | 1227.1993 | 1227.215
+(2 rows)
+
+Query 20190822_182333_00012_segyw, FINISHED, 1 node
+Splits: 17 total, 17 done (100.00%)
+0:02 [197 rows, 613B] [98 rows/s, 307B/s]
+
+presto:default>
+```
+
+
+This brings the demo to an end.
+
+## Testing Hudi in Local Docker environment
+
+You can bring up a hadoop docker environment containing Hadoop, Hive and Spark services with support for hudi.
+```java
+$ mvn pre-integration-test -DskipTests
+```
+The above command builds docker images for all the services with
+current Hudi source installed at /var/hoodie/ws and also brings up the services using a compose file. We
+currently use Hadoop (v2.8.4), Hive (v2.3.3) and Spark (v2.4.4) in docker images.
+
+To bring down the containers
+```java
+$ cd hudi-integ-test
+$ mvn docker-compose:down
+```
+
+If you want to bring up the docker containers, use
+```java
+$ cd hudi-integ-test
+$ mvn docker-compose:up -DdetachedMode=true
+```
+
+Hudi is a library that is operated in a broader data analytics/ingestion environment
+involving Hadoop, Hive and Spark. Interoperability with all these systems is a key objective for us. We are
+actively adding integration-tests under __hudi-integ-test/src/test/java__ that makes use of this
+docker environment (See __hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestHoodieSanity.java__ )
+
+
+### Building Local Docker Containers:
+
+The docker images required for demo and running integration test are already in docker-hub. The docker images
+and compose scripts are carefully implemented so that they serve dual-purpose
+
+1. The docker images have inbuilt hudi jar files with environment variable pointing to those jars (HUDI_HADOOP_BUNDLE, ...)
+2. For running integration-tests, we need the jars generated locally to be used for running services within docker. The
+   docker-compose scripts (see `docker/compose/docker-compose_hadoop284_hive233_spark244.yml`) ensures local jars override
+   inbuilt jars by mounting local HUDI workspace over the docker location
+3. As these docker containers have mounted local HUDI workspace, any changes that happen in the workspace would automatically 
+   reflect in the containers. This is a convenient way for developing and verifying Hudi for
+   developers who do not own a distributed environment. Note that this is how integration tests are run.
+
+This helps avoid maintaining separate docker images and avoids the costly step of building HUDI docker images locally.
+But if users want to test hudi from locations with lower network bandwidth, they can still build local images
+run the script
+`docker/build_local_docker_images.sh` to build local docker images before running `docker/setup_demo.sh`
+
+Here are the commands:
+
+```java
+cd docker
+./build_local_docker_images.sh
+.....
+
+[INFO] Reactor Summary:
+[INFO]
+[INFO] Hudi ............................................... SUCCESS [  2.507 s]
+[INFO] hudi-common ........................................ SUCCESS [ 15.181 s]
+[INFO] hudi-aws ........................................... SUCCESS [  2.621 s]
+[INFO] hudi-timeline-service .............................. SUCCESS [  1.811 s]
+[INFO] hudi-client ........................................ SUCCESS [  0.065 s]
+[INFO] hudi-client-common ................................. SUCCESS [  8.308 s]
+[INFO] hudi-hadoop-mr ..................................... SUCCESS [  3.733 s]
+[INFO] hudi-spark-client .................................. SUCCESS [ 18.567 s]
+[INFO] hudi-sync-common ................................... SUCCESS [  0.794 s]
+[INFO] hudi-hive-sync ..................................... SUCCESS [  3.691 s]
+[INFO] hudi-spark-datasource .............................. SUCCESS [  0.121 s]
+[INFO] hudi-spark-common_2.11 ............................. SUCCESS [ 12.979 s]
+[INFO] hudi-spark2_2.11 ................................... SUCCESS [ 12.516 s]
+[INFO] hudi-spark_2.11 .................................... SUCCESS [ 35.649 s]
+[INFO] hudi-utilities_2.11 ................................ SUCCESS [  5.881 s]
+[INFO] hudi-utilities-bundle_2.11 ......................... SUCCESS [ 12.661 s]
+[INFO] hudi-cli ........................................... SUCCESS [ 19.858 s]
+[INFO] hudi-java-client ................................... SUCCESS [  3.221 s]
+[INFO] hudi-flink-client .................................. SUCCESS [  5.731 s]
+[INFO] hudi-spark3_2.12 ................................... SUCCESS [  8.627 s]
+[INFO] hudi-dla-sync ...................................... SUCCESS [  1.459 s]
+[INFO] hudi-sync .......................................... SUCCESS [  0.053 s]
+[INFO] hudi-hadoop-mr-bundle .............................. SUCCESS [  5.652 s]
+[INFO] hudi-hive-sync-bundle .............................. SUCCESS [  1.623 s]
+[INFO] hudi-spark-bundle_2.11 ............................. SUCCESS [ 10.930 s]
+[INFO] hudi-presto-bundle ................................. SUCCESS [  3.652 s]
+[INFO] hudi-timeline-server-bundle ........................ SUCCESS [  4.804 s]
+[INFO] hudi-trino-bundle .................................. SUCCESS [  5.991 s]
+[INFO] hudi-hadoop-docker ................................. SUCCESS [  2.061 s]
+[INFO] hudi-hadoop-base-docker ............................ SUCCESS [ 53.372 s]
+[INFO] hudi-hadoop-base-java11-docker ..................... SUCCESS [ 48.545 s]
+[INFO] hudi-hadoop-namenode-docker ........................ SUCCESS [  6.098 s]
+[INFO] hudi-hadoop-datanode-docker ........................ SUCCESS [  4.825 s]
+[INFO] hudi-hadoop-history-docker ......................... SUCCESS [  3.829 s]
+[INFO] hudi-hadoop-hive-docker ............................ SUCCESS [ 52.660 s]
+[INFO] hudi-hadoop-sparkbase-docker ....................... SUCCESS [01:02 min]
+[INFO] hudi-hadoop-sparkmaster-docker ..................... SUCCESS [ 12.661 s]
+[INFO] hudi-hadoop-sparkworker-docker ..................... SUCCESS [  4.350 s]
+[INFO] hudi-hadoop-sparkadhoc-docker ...................... SUCCESS [ 59.083 s]
+[INFO] hudi-hadoop-presto-docker .......................... SUCCESS [01:31 min]
+[INFO] hudi-hadoop-trinobase-docker ....................... SUCCESS [02:40 min]
+[INFO] hudi-hadoop-trinocoordinator-docker ................ SUCCESS [ 14.003 s]
+[INFO] hudi-hadoop-trinoworker-docker ..................... SUCCESS [ 12.100 s]
+[INFO] hudi-integ-test .................................... SUCCESS [ 13.581 s]
+[INFO] hudi-integ-test-bundle ............................. SUCCESS [ 27.212 s]
+[INFO] hudi-examples ...................................... SUCCESS [  8.090 s]
+[INFO] hudi-flink_2.11 .................................... SUCCESS [  4.217 s]
+[INFO] hudi-kafka-connect ................................. SUCCESS [  2.966 s]
+[INFO] hudi-flink-bundle_2.11 ............................. SUCCESS [ 11.155 s]
+[INFO] hudi-kafka-connect-bundle .......................... SUCCESS [ 12.369 s]
+[INFO] ------------------------------------------------------------------------
+[INFO] BUILD SUCCESS
+[INFO] ------------------------------------------------------------------------
+[INFO] Total time:  14:35 min
+[INFO] Finished at: 2022-01-12T18:41:27-08:00
+[INFO] ------------------------------------------------------------------------
+```
diff --git a/website/versioned_docs/version-0.11.1/encryption.md b/website/versioned_docs/version-0.11.1/encryption.md
new file mode 100644
index 0000000000..f6483420aa
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/encryption.md
@@ -0,0 +1,73 @@
+---
+title: Encryption
+keywords: [ hudi, security ]
+summary: This section offers an overview of encryption feature in Hudi
+toc: true
+last_modified_at: 2022-02-14T15:59:57-04:00
+---
+
+Since Hudi 0.11.0, Spark 3.2 support has been added and accompanying that, Parquet 1.12 has been included, which brings encryption feature to Hudi. In this section, we will show a guide on how to enable encryption in Hudi tables.
+
+## Encrypt Copy-on-Write tables
+
+First, make sure Hudi Spark 3.2 bundle jar is used.
+
+Then, set the following Parquet configurations to make data written to Hudi COW tables encrypted.
+
+```java
+// Activate Parquet encryption, driven by Hadoop properties
+jsc.hadoopConfiguration().set("parquet.crypto.factory.class", "org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory")
+// Explicit master keys (base64 encoded) - required only for mock InMemoryKMS
+jsc.hadoopConfiguration().set("parquet.encryption.kms.client.class" , "org.apache.parquet.crypto.keytools.mocks.InMemoryKMS")
+jsc.hadoopConfiguration().set("parquet.encryption.key.list", "k1:AAECAwQFBgcICQoLDA0ODw==, k2:AAECAAECAAECAAECAAECAA==")
+// Write encrypted dataframe files. 
+// Column "rider" will be protected with master key "key2".
+// Parquet file footers will be protected with master key "key1"
+jsc.hadoopConfiguration().set("parquet.encryption.footer.key", "k1")
+jsc.hadoopConfiguration().set("parquet.encryption.column.keys", "k2:rider")
+    
+spark.read().format("org.apache.hudi").load("path").show();
+```
+
+Here is an example.
+
+```java
+JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
+jsc.hadoopConfiguration().set("parquet.crypto.factory.class", "org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory");
+jsc.hadoopConfiguration().set("parquet.encryption.kms.client.class" , "org.apache.parquet.crypto.keytools.mocks.InMemoryKMS");
+jsc.hadoopConfiguration().set("parquet.encryption.footer.key", "k1");
+jsc.hadoopConfiguration().set("parquet.encryption.column.keys", "k2:rider");
+jsc.hadoopConfiguration().set("parquet.encryption.key.list", "k1:AAECAwQFBgcICQoLDA0ODw==, k2:AAECAAECAAECAAECAAECAA==");
+
+QuickstartUtils.DataGenerator dataGen = new QuickstartUtils.DataGenerator();
+List<String> inserts = convertToStringList(dataGen.generateInserts(3));
+Dataset<Row> inputDF1 = spark.read().json(jsc.parallelize(inserts, 1));
+inputDF1.write().format("org.apache.hudi")
+	.option("hoodie.table.name", "encryption_table")
+    .option("hoodie.upsert.shuffle.parallelism","2")
+    .option("hoodie.insert.shuffle.parallelism","2")
+    .option("hoodie.delete.shuffle.parallelism","2")
+    .option("hoodie.bulkinsert.shuffle.parallelism","2")
+    .mode(SaveMode.Overwrite)
+    .save("path");
+
+spark.read().format("org.apache.hudi").load("path").select("rider").show();
+```
+
+Reading the table works if configured correctly
+
+```
++---------+
+|rider    |
++---------+
+|rider-213|
+|rider-213|
+|rider-213|
++---------+
+```
+
+Read more from [Spark docs](https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#columnar-encryption) and [Parquet docs](https://github.com/apache/parquet-format/blob/master/Encryption.md).
+
+### Note
+
+This feature is currently only available for COW tables due to only Parquet base files present there.
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.11.1/faq.md b/website/versioned_docs/version-0.11.1/faq.md
new file mode 100644
index 0000000000..52137d1be5
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/faq.md
@@ -0,0 +1,520 @@
+---
+title: FAQs
+keywords: [hudi, writing, reading]
+last_modified_at: 2021-08-18T15:59:57-04:00
+---
+# FAQs
+
+## General
+
+### When is Hudi useful for me or my organization?
+   
+If you are looking to quickly ingest data onto HDFS or cloud storage, Hudi can provide you tools to [help](https://hudi.apache.org/docs/writing_data/). Also, if you have ETL/hive/spark jobs which are slow/taking up a lot of resources, Hudi can potentially help by providing an incremental approach to reading and writing data.
+
+As an organization, Hudi can help you build an [efficient data lake](https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM/edit#slide=id.p), solving some of the most complex, low-level storage management problems, while putting data into hands of your data analysts, engineers and scientists much quicker.
+
+### What are some non-goals for Hudi?
+
+Hudi is not designed for any OLTP use-cases, where typically you are using existing NoSQL/RDBMS data stores. Hudi cannot replace your in-memory analytical database (at-least not yet!). Hudi support near-real time ingestion in the order of few minutes, trading off latency for efficient batching. If you truly desirable sub-minute processing delays, then stick with your favorite stream processing solution. 
+
+### What is incremental processing? Why does Hudi docs/talks keep talking about it?
+
+Incremental processing was first introduced by Vinoth Chandar, in the O'reilly [blog](https://www.oreilly.com/content/ubers-case-for-incremental-processing-on-hadoop/), that set off most of this effort. In purely technical terms, incremental processing merely refers to writing mini-batch programs in streaming processing style. Typical batch jobs consume **all input** and recompute **all output**, every few hours. Typical stream processing jobs consume some **new input** and recompute **n [...]
+
+While we can merely refer to this as stream processing, we call it *incremental processing*, to distinguish from purely stream processing pipelines built using Apache Flink, Apache Apex or Apache Kafka Streams.
+
+### What is the difference between copy-on-write (COW) vs merge-on-read (MOR) storage types?
+
+**Copy On Write** - This storage type enables clients to ingest data on columnar file formats, currently parquet. Any new data that is written to the Hudi dataset using COW storage type, will write new parquet files. Updating an existing set of rows will result in a rewrite of the entire parquet files that collectively contain the affected rows being updated. Hence, all writes to such datasets are limited by parquet writing performance, the larger the parquet file, the higher is the time [...]
+
+**Merge On Read** - This storage type enables clients to  ingest data quickly onto row based data format such as avro. Any new data that is written to the Hudi dataset using MOR table type, will write new log/delta files that internally store the data as avro encoded bytes. A compaction process (configured as inline or asynchronous) will convert log file format to columnar file format (parquet). Two different InputFormats expose 2 different views of this data, Read Optimized view exposes [...]
+
+More details can be found [here](https://hudi.apache.org/docs/concepts/) and also [Design And Architecture](https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture).
+
+### How do I choose a storage type for my workload?
+
+A key goal of Hudi is to provide **upsert functionality** that is orders of magnitude faster than rewriting entire tables or partitions.
+
+Choose Copy-on-write storage if :
+
+ - You are looking for a simple alternative, that replaces your existing parquet tables without any need for real-time data.
+ - Your current job is rewriting entire table/partition to deal with updates, while only a few files actually change in each partition.
+ - You are happy keeping things operationally simpler (no compaction etc), with the ingestion/write performance bound by the [parquet file size](https://hudi.apache.org/docs/configurations#hoodieparquetmaxfilesize) and the number of such files affected/dirtied by updates
+ - Your workload is fairly well-understood and does not have sudden bursts of large amount of update or inserts to older partitions. COW absorbs all the merging cost on the writer side and thus these sudden changes can clog up your ingestion and interfere with meeting normal mode ingest latency targets.
+
+Choose merge-on-read storage if :
+
+ - You want the data to be ingested as quickly & queryable as much as possible.
+ - Your workload can have sudden spikes/changes in pattern (e.g bulk updates to older transactions in upstream database causing lots of updates to old partitions on DFS). Asynchronous compaction helps amortize the write amplification caused by such scenarios, while normal ingestion keeps up with incoming stream of changes.
+
+Immaterial of what you choose, Hudi provides
+
+ - Snapshot isolation and atomic write of batch of records
+ - Incremental pulls
+ - Ability to de-duplicate data
+
+Find more [here](https://hudi.apache.org/docs/concepts/).
+
+### Is Hudi an analytical database?
+
+A typical database has a bunch of long running storage servers always running, which takes writes and reads. Hudi's architecture is very different and for good reasons. It's highly decoupled where writes and queries/reads can be scaled independently to be able to handle the scale challenges. So, it may not always seems like a database.
+
+Nonetheless, Hudi is designed very much like a database and provides similar functionality (upserts, change capture) and semantics (transactional writes, snapshot isolated reads).
+
+### How do I model the data stored in Hudi?
+
+When writing data into Hudi, you model the records like how you would on a key-value store - specify a key field (unique for a single partition/across dataset), a partition field (denotes partition to place key into) and preCombine/combine logic that specifies how to handle duplicates in a batch of records written. This model enables Hudi to enforce primary key constraints like you would get on a database table. See [here](https://hudi.apache.org/docs/writing_data/) for an example.
+
+When querying/reading data, Hudi just presents itself as a json-like hierarchical table, everyone is used to querying using Hive/Spark/Presto over Parquet/Json/Avro. 
+
+### Why does Hudi require a key field to be configured?
+Hudi was designed to support fast record level Upserts and thus requires a key to identify whether an incoming record is 
+an insert or update or delete, and process accordingly. Additionally, Hudi automatically maintains indexes on this primary 
+key and for many use-cases like CDC, ensuring such primary key constraints is crucial to ensure data quality. In this context, 
+pre combine key helps reconcile multiple records with same key in a single batch of input records. Even for append-only data 
+streams, Hudi supports key based de-duplication before inserting records. For e-g; you may have atleast once data integration 
+systems like Kafka MirrorMaker that can introduce duplicates during failures. Even for plain old batch pipelines, keys 
+help eliminate duplication that could be caused by backfill pipelines, where commonly it's unclear what set of records 
+need to be re-written. We are actively working on making keys easier by only requiring them for Upsert and/or automatically
+generate the key internally (much like RDBMS row_ids)
+
+### Does Hudi support cloud storage/object stores?
+
+Yes. Generally speaking, Hudi is able to provide its functionality on any Hadoop FileSystem implementation and thus can read and write datasets on [Cloud stores](https://hudi.apache.org/docs/cloud) (Amazon S3 or Microsoft Azure or Google Cloud Storage). Over time, Hudi has also incorporated specific design aspects that make building Hudi datasets on the cloud easy, such as [consistency checks for s3](https://hudi.apache.org/docs/configurations#hoodieconsistencycheckenabled), Zero moves/r [...]
+
+### What versions of Hive/Spark/Hadoop are support by Hudi?
+
+As of September 2019, Hudi can support Spark 2.1+, Hive 2.x, Hadoop 2.7+ (not Hadoop 3).
+
+### How does Hudi actually store data inside a dataset?
+
+At a high level, Hudi is based on MVCC design that writes data to versioned parquet/base files and log files that contain changes to the base file. All the files are stored under a partitioning scheme for the dataset, which closely resembles how Apache Hive tables are laid out on DFS. Please refer [here](https://hudi.apache.org/docs/concepts/) for more details.
+
+## Using Hudi
+
+### What are some ways to write a Hudi dataset?
+
+Typically, you obtain a set of partial updates/inserts from your source and issue [write operations](https://hudi.apache.org/docs/write_operations/) against a Hudi dataset.  If you ingesting data from any of the standard sources like Kafka, or tailing DFS, the [delta streamer](https://hudi.apache.org/docs/hoodie_deltastreamer#deltastreamer) tool is invaluable and provides an easy, self-managed solution to getting data written into Hudi. You can also write your own code to capture data fr [...]
+
+### How is a Hudi job deployed?
+
+The nice thing about Hudi writing is that it just runs like any other spark job would on a YARN/Mesos or even a K8S cluster. So you could simply use the Spark UI to get visibility into write operations.
+
+### How can I now query the Hudi dataset I just wrote?
+
+Unless Hive sync is enabled, the dataset written by Hudi using one of the methods above can simply be queries via the Spark datasource like any other source. 
+
+```scala
+val hoodieROView = spark.read.format("org.apache.hudi").load(basePath + "/path/to/partitions/*")
+val hoodieIncViewDF = spark.read().format("org.apache.hudi")
+     .option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY(), DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL())
+     .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(), <beginInstantTime>)
+     .load(basePath);
+```
+
+```java
+Limitations:
+
+Note that currently the reading realtime view natively out of the Spark datasource is not supported. Please use the Hive path below
+```
+
+if Hive Sync is enabled in the [deltastreamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/docker/demo/sparksql-incremental.commands#L50) tool or [datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncenable), the dataset is available in Hive as a couple of tables, that can now be read using HiveQL, Presto or SparkSQL. See [here](https://hudi.apache.org/docs/querying_data/) for more.
+
+### How does Hudi handle duplicate record keys in an input?
+
+When issuing an `upsert` operation on a dataset and the batch of records provided contains multiple entries for a given key, then all of them are reduced into a single final value by repeatedly calling payload class's [preCombine()](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java#L40) method . By default, we pick the record with the greatest value (determined by calling .compareTo [...]
+
+For an insert or bulk_insert operation, no such pre-combining is performed. Thus, if your input contains duplicates, the dataset would also contain duplicates. If you don't want duplicate records either issue an upsert or consider specifying option to de-duplicate input in either [datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcewriteinsertdropduplicates) or [deltastreamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/hudi-utilities/ [...]
+
+### Can I implement my own logic for how input records are merged with record on storage?
+
+Here is the payload interface that is used in Hudi to represent any hudi record. 
+
+```java
+public interface HoodieRecordPayload<T extends HoodieRecordPayload> extends Serializable {
+ /**
+   * When more than one HoodieRecord have the same HoodieKey, this function combines them before attempting to insert/upsert by taking in a property map.
+   * Implementation can leverage the property to decide their business logic to do preCombine.
+   * @param another instance of another {@link HoodieRecordPayload} to be combined with.
+   * @param properties Payload related properties. For example pass the ordering field(s) name to extract from value in storage.
+   * @return the combined value
+   */
+  default T preCombine(T another, Properties properties);
+ 
+/**
+   * This methods lets you write custom merging/combining logic to produce new values as a function of current value on storage and whats contained
+   * in this object. Implementations can leverage properties if required.
+   * <p>
+   * eg:
+   * 1) You are updating counters, you may want to add counts to currentValue and write back updated counts
+   * 2) You may be reading DB redo logs, and merge them with current image for a database row on storage
+   * </p>
+   *
+   * @param currentValue Current value in storage, to merge/combine this payload with
+   * @param schema Schema used for record
+   * @param properties Payload related properties. For example pass the ordering field(s) name to extract from value in storage.
+   * @return new combined/merged value to be written back to storage. EMPTY to skip writing this record.
+   */
+  default Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties) throws IOException;
+   
+/**
+   * Generates an avro record out of the given HoodieRecordPayload, to be written out to storage. Called when writing a new value for the given
+   * HoodieKey, wherein there is no existing record in storage to be combined against. (i.e insert) Return EMPTY to skip writing this record.
+   * Implementations can leverage properties if required.
+   * @param schema Schema used for record
+   * @param properties Payload related properties. For example pass the ordering field(s) name to extract from value in storage.
+   * @return the {@link IndexedRecord} to be inserted.
+   */
+  @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE)
+  default Option<IndexedRecord> getInsertValue(Schema schema, Properties properties) throws IOException;
+ 
+/**
+   * This method can be used to extract some metadata from HoodieRecordPayload. The metadata is passed to {@code WriteStatus.markSuccess()} and
+   * {@code WriteStatus.markFailure()} in order to compute some aggregate metrics using the metadata in the context of a write success or failure.
+   * @return the metadata in the form of Map<String, String> if any.
+   */
+  @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE)
+  default Option<Map<String, String>> getMetadata() {
+    return Option.empty();
+  }
+ 
+}
+```
+
+As you could see, ([combineAndGetUpdateValue(), getInsertValue()](https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java)) that control how the record on storage is combined with the incoming update/insert to generate the final value to be written back to storage. preCombine() is used to merge records within the same incoming batch. 
+
+### How do I delete records in the dataset using Hudi?
+
+GDPR has made deletes a must-have tool in everyone's data management toolbox. Hudi supports both soft and hard deletes. For details on how to actually perform them, see [here](https://hudi.apache.org/docs/writing_data/#deletes).
+
+### Does deleted records appear in Hudi's incremental query results?
+
+Soft Deletes (unlike hard deletes) do appear in the incremental pull query results. So, if you need a mechanism to propagate deletes to downstream tables, you can use Soft deletes.
+
+### How do I migrate my data to Hudi?
+
+Hudi provides built in support for rewriting your entire dataset into Hudi one-time using the HDFSParquetImporter tool available from the hudi-cli . You could also do this via a simple read and write of the dataset using the Spark datasource APIs. Once migrated, writes can be performed using normal means discussed [here](https://hudi.apache.org/learn/faq#what-are-some-ways-to-write-a-hudi-dataset). This topic is discussed in detail [here](https://hudi.apache.org/docs/migration_guide/), i [...]
+
+### How can I pass hudi configurations to my spark job?
+
+Hudi configuration options covering the datasource and low level Hudi write client (which both deltastreamer & datasource internally call) are [here](https://hudi.apache.org/docs/configurations/). Invoking *--help* on any tool such as DeltaStreamer would print all the usage options. A lot of the options that control upsert, file sizing behavior are defined at the write client level and below is how we pass them to different options available for writing data.
+
+ - For Spark DataSource, you can use the "options" API of DataFrameWriter to pass in these configs. 
+
+```scala
+inputDF.write().format("org.apache.hudi")
+  .options(clientOpts) // any of the Hudi client opts can be passed in as well
+  .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "_row_key")
+  ...
+```
+
+ - When using `HoodieWriteClient` directly, you can simply construct HoodieWriteConfig object with the configs in the link you mentioned.
+
+ - When using HoodieDeltaStreamer tool to ingest, you can set the configs in properties file and pass the file as the cmdline argument "*--props*"
+
+### How to create Hive style partition folder structure?
+
+By default Hudi creates the partition folders with just the partition values, but if would like to create partition folders similar to the way Hive will generate the structure, with paths that contain key value pairs, like country=us/… or datestr=2021-04-20. This is Hive style (or format) partitioning. The paths include both the names of the partition keys and the values that each path represents.
+
+To enable hive style partitioning, you need to add this hoodie config when you write your data:
+```java
+hoodie.datasource.write.hive_style_partitioning: true
+```
+
+### How do I pass hudi configurations to my beeline Hive queries?
+
+If Hudi's input format is not picked the returned results may be incorrect. To ensure correct inputformat is picked, please use `org.apache.hadoop.hive.ql.io.HiveInputFormat` or `org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat` for `hive.input.format` config. This can be set like shown below:
+```java
+set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat
+```
+
+or
+
+```java
+set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat
+```
+
+### Can I register my Hudi dataset with Apache Hive metastore?
+
+Yes. This can be performed either via the standalone [Hive Sync tool](https://hudi.apache.org/docs/syncing_metastore#hive-sync-tool) or using options in [deltastreamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/docker/demo/sparksql-incremental.commands#L50) tool or [datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncenable).
+
+### How does the Hudi indexing work & what are its benefits? 
+
+The indexing component is a key part of the Hudi writing and it maps a given recordKey to a fileGroup inside Hudi consistently. This enables faster identification of the file groups that are affected/dirtied by a given write operation.
+
+Hudi supports a few options for indexing as below
+
+ - *HoodieBloomIndex (default)* : Uses a bloom filter and ranges information placed in the footer of parquet/base files (and soon log files as well)
+ - *HoodieGlobalBloomIndex* : The default indexing only enforces uniqueness of a key inside a single partition i.e the user is expected to know the partition under which a given record key is stored. This helps the indexing scale very well for even [very large datasets](https://eng.uber.com/uber-big-data-platform/). However, in some cases, it might be necessary instead to do the de-duping/enforce uniqueness across all partitions and the global bloom index does exactly that. If this is us [...]
+ - *HBaseIndex* : Apache HBase is a key value store, typically found in close proximity to HDFS. You can also store the index inside HBase, which could be handy if you are already operating HBase.
+
+You can implement your own index if you'd like, by subclassing the `HoodieIndex` class and configuring the index class name in configs. 
+
+### What does the Hudi cleaner do?
+
+The Hudi cleaner process often runs right after a commit and deltacommit and goes about deleting old files that are no longer needed. If you are using the incremental pull feature, then ensure you configure the cleaner to [retain sufficient amount of last commits](https://hudi.apache.org/docs/configurations#hoodiecleanercommitsretained) to rewind. Another consideration is to provide sufficient time for your long running jobs to finish running. Otherwise, the cleaner could delete a file t [...]
+
+### What's Hudi's schema evolution story?
+
+Hudi uses Avro as the internal canonical representation for records, primarily due to its nice [schema compatibility & evolution](https://docs.confluent.io/platform/current/schema-registry/avro.html) properties. This is a key aspect of having reliability in your ingestion or ETL pipelines. As long as the schema passed to Hudi (either explicitly in DeltaStreamer schema provider configs or implicitly by Spark Datasource's Dataset schemas) is backwards compatible (e.g no field deletes, only [...]
+
+### How do I run compaction for a MOR dataset?
+
+Simplest way to run compaction on MOR dataset is to run the [compaction inline](https://hudi.apache.org/docs/configurations#hoodiecompactinline), at the cost of spending more time ingesting; This could be particularly useful, in common cases where you have small amount of late arriving data trickling into older partitions. In such a scenario, you may want to just aggressively compact the last N partitions while waiting for enough logs to accumulate for older partitions. The net effect is [...]
+
+That said, for obvious reasons of not blocking ingesting for compaction, you may want to run it asynchronously as well. This can be done either via a separate [compaction job](https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java) that is scheduled by your workflow scheduler/notebook independently. If you are using delta streamer, then you can run in [continuous mode](https://github.com/apache/hudi/blob/d3edac4612bde2fa9dec [...]
+
+### What performance/ingest latency can I expect for Hudi writing?
+
+The speed at which you can write into Hudi depends on the [write operation](https://hudi.apache.org/docs/write_operations) and some trade-offs you make along the way like file sizing. Just like how databases incur overhead over direct/raw file I/O on disks,  Hudi operations may have overhead from supporting  database like features compared to reading/writing raw DFS files. That said, Hudi implements advanced techniques from database literature to keep these minimal. User is encouraged to [...]
+
+| Storage Type | Type of workload | Performance | Tips |
+|-------|--------|--------|--------|
+| copy on write | bulk_insert | Should match vanilla spark writing + an additional sort to properly size files | properly size [bulk insert parallelism](https://hudi.apache.org/docs/configurations#hoodiebulkinsertshuffleparallelism) to get right number of files. use insert if you want this auto tuned |
+| copy on write | insert | Similar to bulk insert, except the file sizes are auto tuned requiring input to be cached into memory and custom partitioned. | Performance would be bound by how parallel you can write the ingested data. Tune [this limit](https://hudi.apache.org/docs/configurations#hoodieinsertshuffleparallelism) up, if you see that writes are happening from only a few executors. |
+| copy on write | upsert/ de-duplicate & insert | Both of these would involve index lookup.  Compared to naively using Spark (or similar framework)'s JOIN to identify the affected records, Hudi indexing is often 7-10x faster as long as you have ordered keys (discussed below) or <50% updates. Compared to naively overwriting entire partitions, Hudi write can be several magnitudes faster depending on how many files in a given partition is actually updated. For e.g, if a partition has 1000 f [...]
+| merge on read | bulk insert | Currently new data only goes to parquet files and thus performance here should be similar to copy_on_write bulk insert. This has the nice side-effect of getting data into parquet directly for query performance. [HUDI-86](https://issues.apache.org/jira/browse/HUDI-86) will add support for logging inserts directly and this up drastically. | |
+| merge on read | insert | Similar to above | |
+| merge on read | upsert/ de-duplicate & insert | Indexing performance would remain the same as copy-on-write, while ingest latency for updates (costliest I/O operation in copy_on_write) are sent to log files and thus with asynchronous compaction provides very very good ingest performance with low write amplification. | |
+
+Like with many typical system that manage time-series data, Hudi performs much better if your keys have a timestamp prefix or monotonically increasing/decreasing. You can almost always achieve this. Even if you have UUID keys, you can follow tricks like [this](https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/) to get keys that are ordered. See also [Tuning Guide](https://cwiki.apache.org/confluence/display/HUDI/Tuning+Guide) for more tips on JVM and other configurations. 
+
+### What performance can I expect for Hudi reading/queries?
+
+ - For ReadOptimized views, you can expect the same best in-class columnar query performance as a standard parquet table in Hive/Spark/Presto
+ - For incremental views, you can expect speed up relative to how much data usually changes in a given time window and how much time your entire scan takes. For e.g, if only 100 files changed in the last hour in a partition of 1000 files, then you can expect a speed of 10x using incremental pull in Hudi compared to full scanning the partition to find out new data.
+ - For real time views, you can expect performance similar to the same avro backed table in Hive/Spark/Presto 
+
+### How do I to avoid creating tons of small files?
+
+A key design decision in Hudi was to avoid creating small files and always write properly sized files.
+
+There are 2 ways to avoid creating tons of small files in Hudi and both of them have different trade-offs:
+
+a) **Auto Size small files during ingestion**: This solution trades ingest/writing time to keep queries always efficient. Common approaches to writing very small files and then later stitching them together only solve for system scalability issues posed by small files and also let queries slow down by exposing small files to them anyway.
+
+Hudi has the ability to maintain a configured target file size, when performing **upsert/insert** operations. (Note: **bulk_insert** operation does not provide this functionality and is designed as a simpler replacement for normal `spark.write.parquet`  )
+
+For **copy-on-write**, this is as simple as configuring the [maximum size for a base/parquet file](https://hudi.apache.org/docs/configurations#hoodieparquetmaxfilesize) and the [soft limit](https://hudi.apache.org/docs/configurations#hoodieparquetsmallfilelimit) below which a file should be considered a small file. For the initial bootstrap to Hudi table, tuning record size estimate is also important to ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hu [...]
+
+For **merge-on-read**, there are few more configs to set. MergeOnRead works differently for different INDEX choices.
+
+ - Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can configure the [maximum log size](https://hudi.apache.org/docs/configurations#hoodielogfilemaxsize) and a [factor](https://hudi.apache.org/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in size when data moves from avro to parquet files.
+ - Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the same configurations as above for the COPY_ON_WRITE case applies.
+
+NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction has been p [...]
+
+b) **[Clustering](https://hudi.apache.org/blog/2021/01/27/hudi-clustering-intro)** : This is a feature in Hudi to group small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, clustering  [...]
+
+*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will always create a newer version of the smaller file, resulting in 2 versions of the same file. The cleaner service will later kick in and delte the older version small file and keep the latest one.* 
+
+### Why does Hudi retain at-least one previous commit even after setting hoodie.cleaner.commits.retained': 1 ?
+
+Hudi runs cleaner to remove old file versions as part of writing data either in inline or in asynchronous mode (0.6.0 onwards). Hudi Cleaner retains at-least one previous commit when cleaning old file versions. This is to prevent the case when concurrently running queries which are reading the latest file versions suddenly  see those files getting deleted by cleaner because a new file version got added . In other words, retaining at-least one previous commit is needed for ensuring snapsh [...]
+
+### How do I use DeltaStreamer or Spark DataSource API to write to a Non-partitioned Hudi dataset ?
+
+Hudi supports writing to non-partitioned datasets. For writing to a non-partitioned Hudi dataset and performing hive table syncing, you need to set the below configurations in the properties passed:
+
+```java
+hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
+hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor
+```
+
+### Why do we have to set 2 different ways of configuring Spark to work with Hudi?
+
+Non-Hive engines tend to do their own listing of DFS to query datasets. For e.g Spark starts reading the paths direct from the file system (HDFS or S3).
+
+From Spark the calls would be as below:
+- org.apache.spark.rdd.NewHadoopRDD.getPartitions
+- org.apache.parquet.hadoop.ParquetInputFormat.getSplits
+- org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits
+
+Without understanding of Hudi's file layout, engines would just plainly reading all the parquet files and displaying the data within them, with massive amounts of duplicates in the result.
+
+At a high level, there are two ways of configuring a query engine to properly read Hudi datasets
+
+A) Making them invoke methods in `HoodieParquetInputFormat#getSplits` and `HoodieParquetInputFormat#getRecordReader`
+
+- Hive does this natively, since the InputFormat is the abstraction in Hive to plugin new table formats. HoodieParquetInputFormat extends MapredParquetInputFormat which is nothing but a input format for hive and we register Hudi tables to Hive metastore backed by these input formats
+- Presto also falls back to calling the input format when it sees a `UseFileSplitsFromInputFormat` annotation, to just obtain splits, but then goes on to use its own optimized/vectorized parquet reader for queries on Copy-on-Write tables
+- Spark can be forced into falling back to the HoodieParquetInputFormat class, using --conf spark.sql.hive.convertMetastoreParquet=false
+
+
+B) Making the engine invoke a path filter or other means to directly call Hudi classes to filter the files on DFS and pick out the latest file slice
+
+- Even though we can force Spark to fallback to using the InputFormat class, we could lose ability to use Spark's optimized parquet reader path by doing so. 
+- To keep benefits of native parquet read performance, we set the  `HoodieROTablePathFilter` as a path filter, explicitly set this in the Spark Hadoop Configuration.There is logic in the file: to ensure that folders (paths) or files for Hoodie related files always ensures that latest file slice is selected. This filters out duplicate entries and shows latest entries for each record. 
+
+### I have an existing dataset and want to evaluate Hudi using portion of that data ?
+
+You can bulk import portion of that data to a new hudi table. For example, if you want to try on a month of data -
+
+```java
+spark.read.parquet("your_data_set/path/to/month")
+     .write.format("org.apache.hudi")
+     .option("hoodie.datasource.write.operation", "bulk_insert")
+     .option("hoodie.datasource.write.storage.type", "storage_type") // COPY_ON_WRITE or MERGE_ON_READ
+     .option(RECORDKEY_FIELD_OPT_KEY, "<your key>").
+     .option(PARTITIONPATH_FIELD_OPT_KEY, "<your_partition>")
+     ...
+     .mode(SaveMode.Append)
+     .save(basePath);
+```
+
+Once you have the initial copy, you can simply run upsert operations on this by selecting some sample of data every round
+
+```java
+spark.read.parquet("your_data_set/path/to/month").limit(n) // Limit n records
+     .write.format("org.apache.hudi")
+     .option("hoodie.datasource.write.operation", "upsert")
+     .option(RECORDKEY_FIELD_OPT_KEY, "<your key>").
+     .option(PARTITIONPATH_FIELD_OPT_KEY, "<your_partition>")
+     ...
+     .mode(SaveMode.Append)
+     .save(basePath);
+```
+
+For merge on read table, you may want to also try scheduling and running compaction jobs. You can run compaction directly using spark submit on org.apache.hudi.utilities.HoodieCompactor or by using [HUDI CLI](https://hudi.apache.org/docs/cli).
+
+### If I keep my file versions at 1, with this configuration will i be able to do a roll back (to the last commit) when write fail?
+
+Yes, Commits happen before cleaning. Any failed commits will not cause any side-effects and Hudi will guarantee snapshot isolation.
+
+### Does AWS GLUE  support Hudi ?
+
+AWS Glue jobs can write, read and update Glue Data Catalog for hudi tables. In order to successfully integrate with Glue Data Catalog, you need to subscribe to one of the AWS provided Glue connectors named "AWS Glue Connector for Apache Hudi". Glue job needs to have "Use Glue data catalog as the Hive metastore" option ticked. Detailed steps with a sample scripts is available on this article provided by AWS - https://aws.amazon.com/blogs/big-data/writing-to-apache-hudi-tables-using-aws-gl [...]
+
+In case if your using either notebooks or Zeppelin through Glue dev-endpoints, your script might not be able to integrate with Glue DataCatalog when writing to hudi tables.
+
+### How to override Hudi jars in EMR?
+
+If you are looking to override Hudi jars in your EMR clusters one way to achieve this is by providing the Hudi jars through a bootstrap script. 
+Here are the example steps for overriding Hudi version 0.7.0 in EMR 0.6.2. 
+
+**Build Hudi Jars:**
+```shell script
+# Git clone
+git clone https://github.com/apache/hudi.git && cd hudi   
+
+# Get version 0.7.0
+git checkout --track origin/release-0.7.0
+
+# Build jars with spark 3.0.0 and scala 2.12 (since emr 6.2.0 uses spark 3 which requires scala 2.12):
+mvn clean package -DskipTests -Dspark3  -Dscala-2.12 -T 30 
+```
+
+**Copy jars to s3:**
+These are the jars we are interested in after build completes. Copy them to a temp location first.
+
+```shell script
+mkdir -p ~/Downloads/hudi-jars
+cp packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.7.0.jar ~/Downloads/hudi-jars/
+cp packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.7.0.jar ~/Downloads/hudi-jars/
+cp packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.12-0.7.0.jar ~/Downloads/hudi-jars/
+cp packaging/hudi-timeline-server-bundle/target/hudi-timeline-server-bundle-0.7.0.jar ~/Downloads/hudi-jars/
+cp packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.7.0.jar ~/Downloads/hudi-jars/
+```
+
+Upload  all jars from ~/Downloads/hudi-jars/ to the s3 location s3://xxx/yyy/hudi-jars
+
+**Include Hudi jars as part of the emr bootstrap script:**
+Below script downloads Hudi jars from above s3 location. Use this script as part `bootstrap-actions` when launching the EMR cluster to install the jars in each node.
+
+```shell script
+#!/bin/bash
+sudo mkdir -p /mnt1/hudi-jars
+
+sudo aws s3 cp s3://xxx/yyy/hudi-jars /mnt1/hudi-jars --recursive
+
+# create symlinks
+cd /mnt1/hudi-jars
+sudo ln -sf hudi-hadoop-mr-bundle-0.7.0.jar hudi-hadoop-mr-bundle.jar
+sudo ln -sf hudi-hive-sync-bundle-0.7.0.jar hudi-hive-sync-bundle.jar
+sudo ln -sf hudi-spark-bundle_2.12-0.7.0.jar hudi-spark-bundle.jar
+sudo ln -sf hudi-timeline-server-bundle-0.7.0.jar hudi-timeline-server-bundle.jar
+sudo ln -sf hudi-utilities-bundle_2.12-0.7.0.jar hudi-utilities-bundle.jar
+```
+
+**Using the overriden jar in Deltastreamer:**
+When invoking DeltaStreamer specify the above jar location as part of spark-submit command.
+
+### Why partition fields are also stored in parquet files in addition to the partition path ?
+
+Hudi supports customizable partition values which could be a derived value of another field. Also, storing the partition value only as part of the field results in losing type information when queried by various query engines.
+
+### I am seeing lot of archive files. How do I control the number of archive commit files generated?
+
+Please note that in cloud stores that do not support log append operations, Hudi is forced to create new archive files to archive old metadata operations.  You can increase hoodie.commits.archival.batch moving forward to increase the number of commits archived per archive file. In addition, you can increase the difference between the 2 watermark configurations : hoodie.keep.max.commits (default : 30) and hoodie.keep.min.commits (default : 20). This way, you can reduce the number of archi [...]
+
+### How do I configure Bloom filter (when Bloom/Global_Bloom index is used)? 
+
+Bloom filters are used in bloom indexes to look up the location of record keys in write path. Bloom filters are used only when the index type is chosen as “BLOOM” or “GLOBAL_BLOOM”. Hudi has few config knobs that users can use to tune their bloom filters.
+
+On a high level, hudi has two types of blooms: Simple and Dynamic.
+
+Simple, as the name suggests, is simple. Size is statically allocated based on few configs.
+
+`hoodie.bloom.index.filter.type`: SIMPLE
+
+`hoodie.index.bloom.num_entries` refers to the total number of entries per bloom filter, which refers to one file slice. Default value is 60000.
+
+`hoodie.index.bloom.fpp` refers to the false positive probability with the bloom filter. Default value: 1*10^-9.
+
+Size of the bloom filter depends on these two values. This is statically allocated and here is the formula that determines the size of bloom. Until the total number of entries added to the bloom is within the configured `hoodie.index.bloom.num_entries` value, the fpp will be honored. i.e. with default values of 60k and 1*10^-9, bloom filter serialized size = 430kb. But if more entries are added, then the false positive probability will not be honored. Chances that more false positives co [...]
+
+Hudi suggests to have roughly 100 to 120 mb sized files for better query performance. So, based on the record size, one could determine how many records could fit into one data file.
+
+Lets say your data file max size is 128Mb and default avg record size is 1024 bytes. Hence, roughly this translates to 130k entries per data file. For this config, you should set num_entries to ~130k.
+
+Dynamic bloom filter:
+
+`hoodie.bloom.index.filter.type` : DYNAMIC
+
+This is an advanced version of the bloom filter which grows dynamically as the number of entries grows. So, users are expected to set two values wrt num_entries. `hoodie.index.bloom.num_entries` will determine the starting size of the bloom. `hoodie.bloom.index.filter.dynamic.max.entries` will determine the max size to which the bloom can grow upto. And fpp needs to be set similar to “Simple” bloom filter. Bloom size will be allotted based on the first config `hoodie.index.bloom.num_entr [...]
+
+### How to tune shuffle parallelism of Hudi jobs ?
+
+First, let's understand what the term parallelism means in the context of Hudi jobs. For any Hudi job using Spark, parallelism equals to the number of spark partitions that should be generated for a particular stage in the DAG. To understand more about spark partitions, read this [article](https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297). In spark, each spark partition is mapped to a spark task that can be executed on an executor. Typicall [...]
+
+(*Spark Application → N Spark Jobs → M Spark Stages → T Spark Tasks*) on (*E executors with C cores*)
+
+A spark application can be given E number of executors to run the spark application on. Each executor might hold 1 or more spark cores. Every spark task will require atleast 1 core to execute, so imagine T number of tasks to be done in Z time depending on C cores. The higher C, Z is smaller.
+
+With this understanding, if you want your DAG stage to run faster, *bring T as close or higher to C*. Additionally, this parallelism finally controls the number of output files you write using a Hudi based job. Let's understand the different kinds of knobs available:
+
+[BulkInsertParallelism](https://hudi.apache.org/docs/configurations#hoodiebulkinsertshuffleparallelism) → This is used to control the parallelism with which output files will be created by a Hudi job. The higher this parallelism, the more number of tasks are created and hence the more number of output files will eventually be created. Even if you define [parquet-max-file-size](https://hudi.apache.org/docs/configurations#hoodieparquetmaxfilesize) to be of a high value, if you make paralle [...]
+
+[Upsert](https://hudi.apache.org/docs/configurations#hoodieupsertshuffleparallelism) / [Insert Parallelism](https://hudi.apache.org/docs/configurations#hoodieinsertshuffleparallelism) → This is used to control how fast the read process should be when reading data into the job. Find more details [here](https://hudi.apache.org/docs/configurations/).  
+
+### INT96, INT64 and timestamp compatibility
+
+https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncsupport_timestamp
+
+### How to convert an existing COW table to MOR? 
+
+All you need to do is to edit the table type property in hoodie.properties (located at hudi_table_path/.hoodie/hoodie.properties). 
+But manually changing it will result in checksum errors. So, we have to go via hudi-cli. 
+
+1. Copy existing hoodie.properties to a new location. 
+2. Edit table type to MERGE_ON_READ
+3. launch hudi-cli 
+   1. connect --path hudi_table_path
+   2. repair overwrite-hoodie-props --new-props-file new_hoodie.properties
+
+### Can I get notified when new commits happen in my Hudi table?
+
+Yes. Hudi provides the ability to post a callback notification about a write commit. You can use a http hook or choose to 
+be notified via a Kafka/pulsar topic or plug in your own implementation to get notified. Please refer [here](https://hudi.apache.org/docs/next/writing_data/#commit-notifications)
+for details
+
+## Contributing to FAQ
+
+A good and usable FAQ should be community-driven and crowd source questions/thoughts across everyone.
+
+You can improve the FAQ by the following processes
+
+- Raise a PR to spot inaccuracies, typos on this page and leave suggestions.
+- Raise a PR to propose new questions with answers.
+- Lean towards making it very understandable and simple, and heavily link to parts of documentation as needed
+- One committer on the project will review new questions and incorporate them upon review.
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.11.1/file_layouts.md b/website/versioned_docs/version-0.11.1/file_layouts.md
new file mode 100644
index 0000000000..2be9bb9300
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/file_layouts.md
@@ -0,0 +1,16 @@
+---
+title: File Layouts
+toc: true
+---
+
+The following describes the general file layout structure for Apache Hudi
+* Hudi organizes data tables into a directory structure under a base path on a distributed file system
+* Tables are broken up into partitions
+* Within each partition, files are organized into file groups, uniquely identified by a file ID
+* Each file group contains several file slices 
+* Each slice contains a base file (*.parquet) produced at a certain commit/compaction instant time, along with set of log files (*.log.*) that contain inserts/updates to the base file since the base file was produced. 
+
+Hudi adopts Multiversion Concurrency Control (MVCC), where [compaction](/docs/next/compaction) action merges logs and base files to produce new 
+file slices and [cleaning](/docs/next/hoodie_cleaner) action gets rid of unused/older file slices to reclaim space on the file system.
+
+![Partition On HDFS](/assets/images/hudi_partitions_HDFS.png)
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.11.1/file_sizing.md b/website/versioned_docs/version-0.11.1/file_sizing.md
new file mode 100644
index 0000000000..6f8baa90bf
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/file_sizing.md
@@ -0,0 +1,53 @@
+---
+title: "File Sizing"
+toc: true
+---
+
+This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
+avoid creating small files in the first place and always write properly sized files. 
+There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
+
+## Auto-Size During ingestion
+
+You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
+it ensures that read queries are always efficient as soon as a write is committed. If you don't 
+manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
+ 
+(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
+
+### For Copy-On-Write 
+This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
+and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
+be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
+ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
+record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
+configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
+files < 100MB and try to get them upto 120MB.
+
+### For Merge-On-Read 
+MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
+
+- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
+configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
+[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
+size when data moves from avro to parquet files.
+- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
+same configurations as above for the COPY_ON_WRITE case applies.
+
+NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
+that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
+that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
+ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
+has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
+
+## Auto-Size With Clustering
+**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
+small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
+a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
+ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
+clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
+small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
+
+*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
+always create a newer version of the smaller file, resulting in 2 versions of the same file. 
+The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delte the older version small file and keep the latest one.*
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.11.1/flink-quick-start-guide.md b/website/versioned_docs/version-0.11.1/flink-quick-start-guide.md
new file mode 100644
index 0000000000..e9ca0c3df5
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/flink-quick-start-guide.md
@@ -0,0 +1,191 @@
+---
+title: "Flink Guide"
+toc: true
+last_modified_at: 2020-08-12T15:19:57+08:00
+---
+
+This page introduces Flink-Hudi integration. We can feel the unique charm of how Flink brings in the power of streaming into Hudi.
+This guide helps you quickly start using Flink on Hudi, and learn different modes for reading/writing Hudi by Flink:
+
+- **Quick Start** : Read [Quick Start](#quick-start) to get started quickly Flink sql client to write to(read from) Hudi.
+- **Configuration** : For [Global Configuration](flink_configuration#global-configurations), sets up through `$FLINK_HOME/conf/flink-conf.yaml`. For per job configuration, sets up through [Table Option](flink_configuration#table-options).
+- **Writing Data** : Flink supports different modes for writing, such as [CDC Ingestion](hoodie_deltastreamer#cdc-ingestion), [Bulk Insert](hoodie_deltastreamer#bulk-insert), [Index Bootstrap](hoodie_deltastreamer#index-bootstrap), [Changelog Mode](hoodie_deltastreamer#changelog-mode) and [Append Mode](hoodie_deltastreamer#append-mode).
+- **Querying Data** : Flink supports different modes for reading, such as [Streaming Query](hoodie_deltastreamer#streaming-query) and [Incremental Query](hoodie_deltastreamer#incremental-query).
+- **Tuning** : For write/read tasks, this guide gives some tuning suggestions, such as [Memory Optimization](flink_configuration#memory-optimization) and [Write Rate Limit](flink_configuration#write-rate-limit).
+- **Optimization**: Offline compaction is supported [Offline Compaction](compaction#flink-offline-compaction).
+- **Query Engines**: Besides Flink, many other engines are integrated: [Hive Query](syncing_metastore#flink-setup), [Presto Query](query_engine_setup#prestodb).
+
+## Quick Start
+
+### Setup
+
+We use the [Flink Sql Client](https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sqlclient/) because it's a good
+quick start tool for SQL users.
+
+#### Step.1 download Flink jar
+
+Hudi works with both Flink 1.13 and Flink 1.14. You can follow the
+instructions [here](https://flink.apache.org/downloads) for setting up Flink. Then choose the desired Hudi-Flink bundle
+jar to work with different Flink and Scala versions:
+
+- `hudi-flink1.13-bundle_2.11`
+- `hudi-flink1.13-bundle_2.12`
+- `hudi-flink1.14-bundle_2.11`
+- `hudi-flink1.14-bundle_2.12`
+
+#### Step.2 start Flink cluster
+Start a standalone Flink cluster within hadoop environment.
+Before you start up the cluster, we suggest to config the cluster as follows:
+
+- in `$FLINK_HOME/conf/flink-conf.yaml`, add config option `taskmanager.numberOfTaskSlots: 4`
+- in `$FLINK_HOME/conf/flink-conf.yaml`, [add other global configurations according to the characteristics of your task](flink_configuration#global-configurations)
+- in `$FLINK_HOME/conf/workers`, add item `localhost` as 4 lines so that there are 4 workers on the local cluster
+
+Now starts the cluster:
+
+```bash
+# HADOOP_HOME is your hadoop root directory after unpack the binary package.
+export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`
+
+# Start the Flink standalone cluster
+./bin/start-cluster.sh
+```
+#### Step.3 start Flink SQL client
+
+Hudi has a prepared bundle jar for Flink, which should be loaded in the Flink SQL Client when it starts up.
+You can build the jar manually under path `hudi-source-dir/packaging/hudi-flink-bundle`, or download it from the
+[Apache Official Repository](https://repo.maven.apache.org/maven2/org/apache/hudi/hudi-flink-bundle_2.11/).
+
+Now starts the SQL CLI:
+
+```bash
+# HADOOP_HOME is your hadoop root directory after unpack the binary package.
+export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`
+
+./bin/sql-client.sh embedded -j .../hudi-flink-bundle_2.1?-*.*.*.jar shell
+```
+
+<div className="notice--info">
+  <h4>Please note the following: </h4>
+<ul>
+  <li>We suggest hadoop 2.9.x+ version because some of the object storage has filesystem implementation only after that</li>
+  <li>The flink-parquet and flink-avro formats are already packaged into the hudi-flink-bundle jar</li>
+</ul>
+</div>
+
+Setup table name, base path and operate using SQL for this guide.
+The SQL CLI only executes the SQL line by line.
+
+### Insert Data
+
+Creates a Flink Hudi table first and insert data into the Hudi table using SQL `VALUES` as below.
+
+```sql
+-- sets up the result mode to tableau to show the results directly in the CLI
+set execution.result-mode=tableau;
+
+CREATE TABLE t1(
+  uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
+  name VARCHAR(10),
+  age INT,
+  ts TIMESTAMP(3),
+  `partition` VARCHAR(20)
+)
+PARTITIONED BY (`partition`)
+WITH (
+  'connector' = 'hudi',
+  'path' = '${path}',
+  'table.type' = 'MERGE_ON_READ' -- this creates a MERGE_ON_READ table, by default is COPY_ON_WRITE
+);
+
+-- insert data using values
+INSERT INTO t1 VALUES
+  ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
+  ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
+  ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
+  ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
+  ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
+  ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
+  ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
+  ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
+```
+
+### Query Data
+
+```sql
+-- query from the Hudi table
+select * from t1;
+```
+
+This query provides snapshot querying of the ingested data. 
+Refer to [Table types and queries](/docs/concepts#table-types--queries) for more info on all table types and query types supported.
+
+### Update Data
+
+This is similar to inserting new data.
+
+```sql
+-- this would update the record with key 'id1'
+insert into t1 values
+  ('id1','Danny',27,TIMESTAMP '1970-01-01 00:00:01','par1');
+```
+
+Notice that the save mode is now `Append`. In general, always use append mode unless you are trying to create the table for the first time.
+[Querying](#query-data) the data again will now show updated records. Each write operation generates a new [commit](/docs/concepts) 
+denoted by the timestamp. Look for changes in `_hoodie_commit_time`, `age` fields for the same `_hoodie_record_key`s in previous commit.
+
+### Streaming Query
+
+Hudi Flink also provides capability to obtain a stream of records that changed since given commit timestamp. 
+This can be achieved using Hudi's streaming querying and providing a start time from which changes need to be streamed. 
+We do not need to specify endTime, if we want all changes after the given commit (as is the common case). 
+
+```sql
+CREATE TABLE t1(
+  uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
+  name VARCHAR(10),
+  age INT,
+  ts TIMESTAMP(3),
+  `partition` VARCHAR(20)
+)
+PARTITIONED BY (`partition`)
+WITH (
+  'connector' = 'hudi',
+  'path' = '${path}',
+  'table.type' = 'MERGE_ON_READ',
+  'read.streaming.enabled' = 'true',  -- this option enable the streaming read
+  'read.start-commit' = '20210316134557', -- specifies the start commit instant time
+  'read.streaming.check-interval' = '4' -- specifies the check interval for finding new source commits, default 60s.
+);
+
+-- Then query the table in stream mode
+select * from t1;
+``` 
+
+This will give all changes that happened after the `read.streaming.start-commit` commit. The unique thing about this
+feature is that it now lets you author streaming pipelines on streaming or batch data source.
+
+### Delete Data {#deletes}
+
+When consuming data in streaming query, Hudi Flink source can also accepts the change logs from the underneath data source,
+it can then applies the UPDATE and DELETE by per-row level. You can then sync a NEAR-REAL-TIME snapshot on Hudi for all kinds
+of RDBMS.
+
+## Where To Go From Here?
+Check out the [Flink Setup](/docs/next/flink_configuration) how-to page for deeper dive into configuration settings. 
+
+If you are relatively new to Apache Hudi, it is important to be familiar with a few core concepts:
+  - [Hudi Timeline](/docs/next/timeline) – How Hudi manages transactions and other table services
+  - [Hudi File Layout](/docs/next/file_layouts) - How the files are laid out on storage
+  - [Hudi Table Types](/docs/next/table_types) – `COPY_ON_WRITE` and `MERGE_ON_READ`
+  - [Hudi Query Types](/docs/next/table_types#query-types) – Snapshot Queries, Incremental Queries, Read-Optimized Queries
+
+See more in the "Concepts" section of the docs.
+
+Take a look at recent [blog posts](/blog) that go in depth on certain topics or use cases.
+
+Hudi tables can be queried from query engines like Hive, Spark, Flink, Presto and much more. We have put together a 
+[demo video](https://www.youtube.com/watch?v=VhNgUsxdrD0) that show cases all of this on a docker based setup with all 
+dependent systems running locally. We recommend you replicate the same setup and run the demo yourself, by following 
+steps [here](/docs/docker_demo) to get a taste for it. Also, if you are looking for ways to migrate your existing data 
+to Hudi, refer to [migration guide](/docs/migration_guide). 
diff --git a/website/versioned_docs/version-0.11.1/flink_configuration.md b/website/versioned_docs/version-0.11.1/flink_configuration.md
new file mode 100644
index 0000000000..d615281a6b
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/flink_configuration.md
@@ -0,0 +1,117 @@
+---
+title: Flink Setup
+toc: true
+---
+
+## Global Configurations
+When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`
+
+### Parallelism
+
+|  Option Name  | Default | Type | Description |
+|  -----------  | -------  | ------- | ------- |
+| `taskmanager.numberOfTaskSlots` | `1` | `Integer` | The number of parallel operator or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data |
+| `parallelism.default` | `1` | `Integer` | The default parallelism used when no parallelism is specified anywhere (default: 1). For example, If the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used |
+
+### Memory
+
+|  Option Name  | Default | Type | Description |
+|  -----------  | -------  | ------- | ------- |
+| `jobmanager.memory.process.size` | `(none)` | `MemorySize` | Total Process Memory size for the JobManager. This includes all the memory that a JobManager JVM process consumes, consisting of Total Flink Memory, JVM Metaspace, and JVM Overhead |
+| `taskmanager.memory.task.heap.size` | `(none)` | `MemorySize` | Task Heap Memory size for TaskExecutors. This is the size of JVM heap memory reserved for write cache |
+| `taskmanager.memory.managed.size`  |  `(none)`  | `MemorySize` | Managed Memory size for TaskExecutors. This is the size of off-heap memory managed by the memory manager, reserved for sorting and RocksDB state backend. If you choose RocksDB as the state backend, you need to set this memory |
+
+### Checkpoint
+
+|  Option Name  | Default | Type | Description |
+|  -----------  | -------  | ------- | ------- |
+| `execution.checkpointing.interval` | `(none)` | `Duration` | Setting this value as `execution.checkpointing.interval = 150000ms`, 150000ms = 2.5min. Configuring this parameter is equivalent to enabling the checkpoint |
+| `state.backend` | `(none)` | `String` | The state backend to be used to store state. We recommend setting store state as `rocksdb` : `state.backend: rocksdb`  |
+| `state.backend.rocksdb.localdir` | `(none)` | `String` | The local directory (on the TaskManager) where RocksDB puts its files |
+| `state.checkpoints.dir` | `(none)` | `String` | The default directory used for storing the data files and meta data of checkpoints in a Flink supported filesystem. The storage path must be accessible from all participating processes/nodes(i.e. all TaskManagers and JobManagers), like hdfs and oss path |
+| `state.backend.incremental`  |  `false`  | `Boolean` | Option whether the state backend should create incremental checkpoints, if possible. For an incremental checkpoint, only a diff from the previous checkpoint is stored, rather than the complete checkpoint state. If store state is setting as `rocksdb`, recommending to turn on |
+
+## Table Options
+
+Flink SQL jobs can be configured through options in the `WITH` clause.
+The actual datasource level configs are listed below.
+
+### Memory
+
+:::note
+When optimizing memory, we need to pay attention to the memory configuration
+and the number of taskManagers, parallelism of write tasks (write.tasks : 4) first. After confirm each write task to be
+allocated with enough memory, we can try to set these memory options.
+:::
+
+|  Option Name  | Description | Default | Remarks |
+|  -----------  | -------  | ------- | ------- |
+| `write.task.max.size` | Maximum memory in MB for a write task, when the threshold hits, it flushes the max size data bucket to avoid OOM. Default `1024MB` | `1024D` | The memory reserved for write buffer is `write.task.max.size` - `compaction.max_memory`. When total buffer of write tasks reach the threshold, the largest buffer in the memory will be flushed |
+| `write.batch.size`  | In order to improve the efficiency of writing, Flink write task will cache data in buffer according to the write bucket until the memory reaches the threshold. When reached threshold, the data buffer would be flushed out. Default `64MB` | `64D` |  Recommend to use the default settings  |
+| `write.log_block.size` | The log writer of Hudi will not flush the data immediately after receiving data. The writer flush data to the disk in the unit of `LogBlock`. Before `LogBlock` reached threshold, records will be buffered in the writer in form of serialized bytes. Default `128MB`  | `128` |  Recommend to use the default settings  |
+| `write.merge.max_memory` | If write type is `COPY_ON_WRITE`, Hudi will merge the incremental data and base file data. The incremental data will be cached and spilled to disk. this threshold controls the max heap size that can be used. Default `100MB`  | `100` | Recommend to use the default settings |
+| `compaction.max_memory` | Same as `write.merge.max_memory`, but occurs during compaction. Default `100MB` | `100` | If it is online compaction, it can be turned up when resources are sufficient, such as setting as `1024MB` |
+
+### Parallelism
+
+|  Option Name  | Description | Default | Remarks |
+|  -----------  | -------  | ------- | ------- |
+| `write.tasks` |  The parallelism of writer tasks. Each write task writes 1 to `N` buckets in sequence. Default `4` | `4` | Increases the parallelism has no effect on the number of small files |
+| `write.bucket_assign.tasks`  |  The parallelism of bucket assigner operators. No default value, using Flink `parallelism.default`  | [`parallelism.default`](#parallelism) |  Increases the parallelism also increases the number of buckets, thus the number of small files (small buckets)  |
+| `write.index_boostrap.tasks` |  The parallelism of index bootstrap. Increasing parallelism can speed up the efficiency of the bootstrap stage. The bootstrap stage will block checkpointing. Therefore, it is necessary to set more checkpoint failure tolerance times. Default using Flink `parallelism.default` | [`parallelism.default`](#parallelism) | It only take effect when `index.bootsrap.enabled` is `true` |
+| `read.tasks` | The parallelism of read operators (batch and stream). Default `4`  | `4` |  |
+| `compaction.tasks` | The parallelism of online compaction. Default `4` | `4` | `Online compaction` will occupy the resources of the write task. It is recommended to use [`offline compaction`](/docs/compaction/#flink-offline-compaction) |
+
+### Compaction
+
+:::note
+These are options only for `online compaction`.
+:::
+
+:::note
+Turn off online compaction by setting `compaction.async.enabled` = `false`, but we still recommend turning on `compaction.schedule.enable` for the writing job. You can then execute the compaction plan by [`offline compaction`](#offline-compaction).
+:::
+
+|  Option Name  | Description | Default | Remarks |
+|  -----------  | -------  | ------- | ------- |
+| `compaction.schedule.enabled` | Whether to generate compaction plan periodically | `true` | Recommend to turn it on, even if `compaction.async.enabled` = `false` |
+| `compaction.async.enabled`  |  Async Compaction, enabled by default for MOR | `true` | Turn off `online compaction` by turning off this option |
+| `compaction.trigger.strategy`  | Strategy to trigger compaction | `num_commits` | Options are `num_commits`: trigger compaction when reach N delta commits; `time_elapsed`: trigger compaction when time elapsed > N seconds since last compaction; `num_and_time`: trigger compaction when both `NUM_COMMITS` and `TIME_ELAPSED` are satisfied; `num_or_time`: trigger compaction when `NUM_COMMITS` or `TIME_ELAPSED` is satisfied. |
+| `compaction.delta_commits` | Max delta commits needed to trigger compaction, default `5` commits | `5` | -- |
+| `compaction.delta_seconds`  |  Max delta seconds time needed to trigger compaction, default `1` hour | `3600` | -- |
+| `compaction.max_memory` | Max memory in MB for compaction spillable map, default `100MB` | `100` | If your have sufficient resources, recommend to adjust to `1024MB` |
+| `compaction.target_io`  |  Target IO per compaction (both read and write), default `500GB`| `512000` | -- |
+
+## Memory Optimization
+
+### MOR
+
+1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
+2. If there is enough memory, `compaction.max_memory` can be set larger (`100MB` by default, and can be adjust to `1024MB`).
+3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
+   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two streamWriteFunction, so each write task
+   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as bucketAssignFunction) will also consume memory.
+4. Pay attention to the memory changes of compaction. `compaction.max_memory` controls the maximum memory that each task can be used when compaction tasks read
+   logs. `compaction.tasks` controls the parallelism of compaction tasks.
+
+### COW
+
+1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
+2. Increase both `write.task.max.size` and `write.merge.max_memory` (`1024MB` and `100MB` by default, adjust to `2014MB` and `1024MB`).
+3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
+   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two write tasks, so each write task
+   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as `BucketAssignFunction`) will also consume memory.
+
+
+## Write Rate Limit
+
+In the existing data synchronization, `snapshot data` and `incremental data` are send to kafka first, and then streaming write
+to Hudi by Flink. Because the direct consumption of `snapshot data` will lead to problems such as high throughput and serious
+disorder (writing partition randomly), which will lead to write performance degradation and throughput glitches. At this time,
+the `write.rate.limit` option can be turned on to ensure smooth writing.
+
+### Options
+
+|  Option Name  | Required | Default | Remarks |
+|  -----------  | -------  | ------- | ------- |
+| `write.rate.limit` | `false` | `0` | Turn off by default |
\ No newline at end of file
diff --git a/website/docs/gcp_bigquery.md b/website/versioned_docs/version-0.11.1/gcp_bigquery.md
similarity index 98%
copy from website/docs/gcp_bigquery.md
copy to website/versioned_docs/version-0.11.1/gcp_bigquery.md
index 8583182042..3651f01d15 100644
--- a/website/docs/gcp_bigquery.md
+++ b/website/versioned_docs/version-0.11.1/gcp_bigquery.md
@@ -38,9 +38,9 @@ Below shows an example for running `BigQuerySyncTool` with `HoodieDeltaStreamer`
 ```shell
 spark-submit --master yarn \
 --packages com.google.cloud:google-cloud-bigquery:2.10.4 \
---jars /opt/hudi-gcp-bundle-0.11.0.jar \
+--jars /opt/hudi-gcp-bundle-0.11.1.jar \
 --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
-/opt/hudi-utilities-bundle_2.12-0.11.0.jar \
+/opt/hudi-utilities-bundle_2.12-0.11.1.jar \
 --target-base-path gs://my-hoodie-table/path \
 --target-table mytable \
 --table-type COPY_ON_WRITE \
diff --git a/website/versioned_docs/version-0.11.1/gcs_hoodie.md b/website/versioned_docs/version-0.11.1/gcs_hoodie.md
new file mode 100644
index 0000000000..f0171aff16
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/gcs_hoodie.md
@@ -0,0 +1,60 @@
+---
+title: Google Cloud
+keywords: [ hudi, hive, google cloud, storage, spark, presto]
+summary: In this page, we go over how to configure hudi with Google Cloud Storage.
+last_modified_at: 2019-12-30T15:59:57-04:00
+---
+For Hudi storage on GCS, **regional** buckets provide an DFS API with strong consistency.
+
+## GCS Configs
+
+There are two configurations required for Hudi GCS compatibility:
+
+- Adding GCS Credentials for Hudi
+- Adding required jars to classpath
+
+### GCS Credentials
+
+Add the required configs in your core-site.xml from where Hudi can fetch them. Replace the `fs.defaultFS` with your GCS bucket name and Hudi should be able to read/write from the bucket.
+
+```xml
+  <property>
+    <name>fs.defaultFS</name>
+    <value>gs://hudi-bucket</value>
+  </property>
+
+  <property>
+    <name>fs.gs.impl</name>
+    <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
+    <description>The FileSystem for gs: (GCS) uris.</description>
+  </property>
+
+  <property>
+    <name>fs.AbstractFileSystem.gs.impl</name>
+    <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
+    <description>The AbstractFileSystem for gs: (GCS) uris.</description>
+  </property>
+
+  <property>
+    <name>fs.gs.project.id</name>
+    <value>GCS_PROJECT_ID</value>
+  </property>
+  <property>
+    <name>google.cloud.auth.service.account.enable</name>
+    <value>true</value>
+  </property>
+  <property>
+    <name>google.cloud.auth.service.account.email</name>
+    <value>GCS_SERVICE_ACCOUNT_EMAIL</value>
+  </property>
+  <property>
+    <name>google.cloud.auth.service.account.keyfile</name>
+    <value>GCS_SERVICE_ACCOUNT_KEYFILE</value>
+  </property>
+```
+
+### GCS Libs
+
+GCS hadoop libraries to add to our classpath
+
+- com.google.cloud.bigdataoss:gcs-connector:1.6.0-hadoop2
diff --git a/website/versioned_docs/version-0.11.1/hoodie_cleaner.md b/website/versioned_docs/version-0.11.1/hoodie_cleaner.md
new file mode 100644
index 0000000000..10f1aa2450
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/hoodie_cleaner.md
@@ -0,0 +1,57 @@
+---
+title: Cleaning
+toc: true
+---
+
+Hoodie Cleaner is a utility that helps you reclaim space and keep your storage costs in check. Apache Hudi provides 
+snapshot isolation between writers and readers by managing multiple files with MVCC concurrency. These file versions 
+provide history and enable time travel and rollbacks, but it is important to manage how much history you keep to balance your costs.
+
+[Automatic Hudi cleaning](/docs/configurations/#hoodiecleanautomatic) is enabled by default. Cleaning is invoked immediately after
+each commit, to delete older file slices. It's recommended to leave this enabled to ensure metadata and data storage growth is bounded. 
+
+### Cleaning Retention Policies 
+When cleaning old files, you should be careful not to remove files that are being actively used by long running queries.
+Hudi cleaner currently supports the below cleaning policies to keep a certain number of commits or file versions:
+
+- **KEEP_LATEST_COMMITS**: This is the default policy. This is a temporal cleaning policy that ensures the effect of 
+having lookback into all the changes that happened in the last X commits. Suppose a writer is ingesting data 
+into a Hudi dataset every 30 minutes and the longest running query can take 5 hours to finish, then the user should 
+retain atleast the last 10 commits. With such a configuration, we ensure that the oldest version of a file is kept on 
+disk for at least 5 hours, thereby preventing the longest running query from failing at any point in time. Incremental cleaning is also possible using this policy.
+- **KEEP_LATEST_FILE_VERSIONS**: This policy has the effect of keeping N number of file versions irrespective of time. 
+This policy is useful when it is known how many MAX versions of the file does one want to keep at any given time. 
+To achieve the same behaviour as before of preventing long running queries from failing, one should do their calculations 
+based on data patterns. Alternatively, this policy is also useful if a user just wants to maintain 1 latest version of the file.
+
+### Configurations
+For details about all possible configurations and their default values see the [configuration docs](https://hudi.apache.org/docs/configurations#Compaction-Configs).
+
+### Run Independently
+Hoodie Cleaner can be run as a separate process or along with your data ingestion. In case you want to run it along with 
+ingesting data, configs are available which enable you to run it [synchronously or asynchronously](https://hudi.apache.org/docs/configurations#hoodiecleanasync).
+
+You can use this command for running the cleaner independently:
+```java
+[hoodie]$ spark-submit --class org.apache.hudi.utilities.HoodieCleaner \
+  --props s3:///temp/hudi-ingestion-config/kafka-source.properties \
+  --target-base-path s3:///temp/hudi \
+  --spark-master yarn-cluster
+```
+
+### Run Asynchronously
+In case you wish to run the cleaner service asynchronously with writing, please configure the below:
+```java
+hoodie.clean.automatic=true
+hoodie.clean.async=true
+```
+
+### CLI
+You can also use [Hudi CLI](/docs/cli) to run Hoodie Cleaner.
+
+CLI provides the below commands for cleaner service:
+- `cleans show`
+- `clean showpartitions`
+- `cleans run`
+
+You can find more details and the relevant code for these commands in [`org.apache.hudi.cli.commands.CleansCommand`](https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CleansCommand.java) class. 
diff --git a/website/docs/hoodie_deltastreamer.md b/website/versioned_docs/version-0.11.1/hoodie_deltastreamer.md
similarity index 99%
copy from website/docs/hoodie_deltastreamer.md
copy to website/versioned_docs/version-0.11.1/hoodie_deltastreamer.md
index 531c412860..938127da31 100644
--- a/website/docs/hoodie_deltastreamer.md
+++ b/website/versioned_docs/version-0.11.1/hoodie_deltastreamer.md
@@ -161,7 +161,7 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.
 From 0.11.0 release, we start to provide a new `hudi-utilities-slim-bundle` which aims to exclude dependencies that can
 cause conflicts and compatibility issues with different versions of Spark.  The `hudi-utilities-slim-bundle` should be
 used along with a Hudi Spark bundle corresponding the Spark version used to make utilities work with Spark, e.g.,
-`--packages org.apache.hudi:hudi-utilities-slim-bundle_2.12:0.11.0,org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0`,
+`--packages org.apache.hudi:hudi-utilities-slim-bundle_2.12:0.11.1,org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1`,
 if using `hudi-utilities-bundle` solely to run `HoodieDeltaStreamer` in Spark encounters compatibility issues.
 
 ### MultiTableDeltaStreamer
diff --git a/website/versioned_docs/version-0.11.1/ibm_cos_hoodie.md b/website/versioned_docs/version-0.11.1/ibm_cos_hoodie.md
new file mode 100644
index 0000000000..5ac743394f
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/ibm_cos_hoodie.md
@@ -0,0 +1,77 @@
+---
+title: IBM Cloud
+keywords: [ hudi, hive, ibm, cos, spark, presto]
+summary: In this page, we go over how to configure Hudi with IBM Cloud Object Storage filesystem.
+last_modified_at: 2020-10-01T11:38:24-10:00
+---
+In this page, we explain how to get your Hudi spark job to store into IBM Cloud Object Storage.
+
+## IBM COS configs
+
+There are two configurations required for Hudi-IBM Cloud Object Storage compatibility:
+
+- Adding IBM COS Credentials for Hudi
+- Adding required Jars to classpath
+
+### IBM Cloud Object Storage Credentials
+
+Simplest way to use Hudi with IBM Cloud Object Storage, is to configure your `SparkSession` or `SparkContext` with IBM Cloud Object Storage credentials using [Stocator](https://github.com/CODAIT/stocator) storage connector for Spark. Hudi will automatically pick this up and talk to IBM Cloud Object Storage.
+
+Alternatively, add the required configs in your `core-site.xml` from where Hudi can fetch them. Replace the `fs.defaultFS` with your IBM Cloud Object Storage bucket name and Hudi should be able to read/write from the bucket.
+
+For example, using HMAC keys and service name `myCOS`:
+```xml
+  <property>
+      <name>fs.defaultFS</name>
+      <value>cos://myBucket.myCOS</value>
+  </property>
+
+  <property>
+      <name>fs.cos.flat.list</name>
+      <value>true</value>
+  </property>
+
+  <property>
+	  <name>fs.stocator.scheme.list</name>
+	  <value>cos</value>
+  </property>
+
+  <property>
+	  <name>fs.cos.impl</name>
+	  <value>com.ibm.stocator.fs.ObjectStoreFileSystem</value>
+  </property>
+
+  <property>
+	  <name>fs.stocator.cos.impl</name>
+	  <value>com.ibm.stocator.fs.cos.COSAPIClient</value>
+  </property>
+
+  <property>
+	  <name>fs.stocator.cos.scheme</name>
+	  <value>cos</value>
+  </property>
+
+  <property>
+	  <name>fs.cos.myCos.access.key</name>
+	  <value>ACCESS KEY</value>
+  </property>
+
+  <property>
+	  <name>fs.cos.myCos.endpoint</name>
+	  <value>http://s3-api.us-geo.objectstorage.softlayer.net</value>
+  </property>
+
+  <property>
+	  <name>fs.cos.myCos.secret.key</name>
+	  <value>SECRET KEY</value>
+  </property>
+
+```
+
+For more options see Stocator [documentation](https://github.com/CODAIT/stocator/blob/master/README.md).
+
+### IBM Cloud Object Storage Libs
+
+IBM Cloud Object Storage hadoop libraries to add to our classpath
+
+ - com.ibm.stocator:stocator:1.1.3
diff --git a/website/versioned_docs/version-0.11.1/indexing.md b/website/versioned_docs/version-0.11.1/indexing.md
new file mode 100644
index 0000000000..224025ca03
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/indexing.md
@@ -0,0 +1,95 @@
+---
+title: Indexing
+toc: true
+---
+
+Hudi provides efficient upserts, by mapping a given hoodie key (record key + partition path) consistently to a file id, via an indexing mechanism.
+This mapping between record key and file group/file id, never changes once the first version of a record has been written to a file. In short, the
+mapped file group contains all versions of a group of records.
+
+For [Copy-On-Write tables](/docs/next/table_types#copy-on-write-table), this enables fast upsert/delete operations, by 
+avoiding the need to join against the entire dataset to determine which files to rewrite.
+For [Merge-On-Read tables](/docs/next/table_types#merge-on-read-table), this design allows Hudi to bound the amount of 
+records any given base file needs to be merged against.
+Specifically, a given base file needs to merged only against updates for records that are part of that base file. In contrast,
+designs without an indexing component (e.g: [Apache Hive ACID](https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions)),
+could end up having to merge all the base files against all incoming updates/delete records:
+
+![Fact table](/assets/images/blog/hudi-indexes/with-and-without-index.png)
+_Figure: Comparison of merge cost for updates (yellow blocks) against base files (white blocks)_
+
+## Index Types in Hudi
+
+Currently, Hudi supports the following indexing options.
+
+- **Bloom Index (default):** Employs bloom filters built out of the record keys, optionally also pruning candidate files using record key ranges.
+- **Simple Index:** Performs a lean join of the incoming update/delete records against keys extracted from the table on storage.
+- **HBase Index:** Manages the index mapping in an external Apache HBase table.
+- **Bring your own implementation:** You can extend this [public API](https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java) 
+to implement custom indexing.
+
+Writers can pick one of these options using `hoodie.index.type` config option. Additionally, a custom index implementation can also be employed
+using `hoodie.index.class` and supplying a subclass of `SparkHoodieIndex` (for Apache Spark writers)
+
+Another key aspect worth understanding is the difference between global and non-global indexes. Both bloom and simple index have
+global options - `hoodie.index.type=GLOBAL_BLOOM` and `hoodie.index.type=GLOBAL_SIMPLE` - respectively. HBase index is by nature a global index.
+
+- **Global index:**  Global indexes enforce uniqueness of keys across all partitions of a table i.e guarantees that exactly
+  one record exists in the table for a given record key. Global indexes offer stronger guarantees, but the update/delete cost grows
+  with size of the table `O(size of table)`, which might still be acceptable for smaller tables.
+
+- **Non Global index:** On the other hand, the default index implementations enforce this constraint only within a specific partition.
+  As one might imagine, non global indexes depends on the writer to provide the same consistent partition path for a given record key during update/delete,
+  but can deliver much better performance since the index lookup operation becomes `O(number of records updated/deleted)` and
+  scales well with write volume.
+
+Since data comes in at different volumes, velocity and has different access patterns, different indices could be used for different workload types.
+Let’s walk through some typical workload types and see how to leverage the right Hudi index for such use-cases. 
+This is based on our experience and you should diligently decide if the same strategies are best for your workloads.
+
+## Indexing Strategies
+### Workload 1: Late arriving updates to fact tables
+Many companies store large volumes of transactional data in NoSQL data stores. For eg, trip tables in case of ride-sharing, buying and selling of shares,
+orders in an e-commerce site. These tables are usually ever growing with random updates on most recent data with long tail updates going to older data, either
+due to transactions settling at a later date/data corrections. In other words, most updates go into the latest partitions with few updates going to older ones.
+
+![Fact table](/assets/images/blog/hudi-indexes/Fact20tables.gif)
+_Figure: Typical update pattern for Fact tables_
+
+For such workloads, the `BLOOM` index performs well, since index look-up will prune a lot of data files based on a well-sized bloom filter.
+Additionally, if the keys can be constructed such that they have a certain ordering, the number of files to be compared is further reduced by range pruning.
+Hudi constructs an interval tree with all the file key ranges and efficiently filters out the files that don't match any key ranges in the updates/deleted records.
+
+In order to efficiently compare incoming record keys against bloom filters i.e with minimal number of bloom filter reads and uniform distribution of work across
+the executors, Hudi leverages caching of input records and employs a custom partitioner that can iron out data skews using statistics. At times, if the bloom filter
+false positive ratio is high, it could increase the amount of data shuffled to perform the lookup. Hudi supports dynamic bloom filters
+(enabled using `hoodie.bloom.index.filter.type=DYNAMIC_V0`), which adjusts its size based on the number of records stored in a given file to deliver the
+configured false positive ratio.
+
+### Workload 2: De-Duplication in event tables
+Event Streaming is everywhere. Events coming from Apache Kafka or similar message bus are typically 10-100x the size of fact tables and often treat "time" (event's arrival time/processing
+time) as a first class citizen. For eg, IoT event stream, click stream data, ad impressions etc. Inserts and updates only span the last few partitions as these are mostly append only data.
+Given duplicate events can be introduced anywhere in the end-end pipeline, de-duplication before storing on the data lake is a common requirement.
+
+![Event table](/assets/images/blog/hudi-indexes/Event20tables.gif)
+_Figure showing the spread of updates for Event table._
+
+In general, this is a very challenging problem to solve at lower cost. Although, we could even employ a key value store to perform this de-duplication with HBASE index, the index storage
+costs would grow linear with number of events and thus can be prohibitively expensive. In fact, `BLOOM` index with range pruning is the optimal solution here. One can leverage the fact
+that time is often a first class citizen and construct a key such as `event_ts + event_id` such that the inserted records have monotonically increasing keys. This yields great returns
+by pruning large amounts of files even within the latest table partitions.
+
+### Workload 3: Random updates/deletes to a dimension table
+These types of tables usually contain high dimensional data and hold reference data e.g user profile, merchant information. These are high fidelity tables where the updates are often small but also spread
+across a lot of partitions and data files ranging across the dataset from old to new. Often times, these tables are also un-partitioned, since there is also not a good way to partition these tables.
+
+![Dimensions table](/assets/images/blog/hudi-indexes/Dimension20tables.gif)
+_Figure showing the spread of updates for Dimensions table._
+
+As discussed before, the `BLOOM` index may not yield benefits if a good number of files cannot be pruned out by comparing ranges/filters. In such a random write workload, updates end up touching
+most files within in the table and thus bloom filters will typically indicate a true positive for all files based on some incoming update. Consequently, we would end up comparing ranges/filter, only
+to finally check the incoming updates against all files. The `SIMPLE` Index will be a better fit as it does not do any upfront pruning based, but directly joins with interested fields from every data file.
+`HBASE` index can be employed, if the operational overhead is acceptable and would provide much better lookup times for these tables.
+
+When using a global index, users should also consider setting `hoodie.bloom.index.update.partition.path=true` or `hoodie.simple.index.update.partition.path=true` to deal with cases where the
+partition path value could change due to an update e.g users table partitioned by home city; user relocates to a different city. These tables are also excellent candidates for the Merge-On-Read table type.
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.11.1/jfs_hoodie.md b/website/versioned_docs/version-0.11.1/jfs_hoodie.md
new file mode 100644
index 0000000000..94bf6e6ea2
--- /dev/null
+++ b/website/versioned_docs/version-0.11.1/jfs_hoodie.md
@@ -0,0 +1,96 @@
+---
+title: JuiceFS
+keywords: [ hudi, hive, juicefs, jfs, spark, flink ]
+summary: In this page, we go over how to configure Hudi with JuiceFS file system.
+last_modified_at: 2021-10-12T10:50:00+08:00
+---
+
+In this page, we explain how to use Hudi with JuiceFS.
+
+## JuiceFS configs
+
+[JuiceFS](https://github.com/juicedata/juicefs) is a high-performance distributed file system. Any data stored into JuiceFS, the data itself will be persisted in object storage (e.g. Amazon S3), and the metadata corresponding to the data can be persisted in various database engines such as Redis, MySQL, and TiKV according to the needs of the scene.
+
+There are three configurations required for Hudi-JuiceFS compatibility:
+
+1. Creating JuiceFS file system
+2. Adding JuiceFS configuration for Hudi
... 4680 lines suppressed ...