You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by vi...@apache.org on 2022/05/02 17:01:02 UTC

[hudi] branch asf-site updated: [DOCS] Changes for 0.11.0 release docs (#5483)

This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 5ba4f094be [DOCS] Changes for 0.11.0 release docs (#5483)
5ba4f094be is described below

commit 5ba4f094be3a6473093fd2ff74a83f88bc10e20c
Author: vinoth chandar <vi...@users.noreply.github.com>
AuthorDate: Mon May 2 10:00:55 2022 -0700

    [DOCS] Changes for 0.11.0 release docs (#5483)
    
    - Moved big query under catalog syncing
     - Renamed async_meta_indexing to metadata_indexing
     - Edits to clarify data skipping docs, added link to rfc
     - Edits on release notes
---
 ...async_meta_indexing.md => metadata_indexing.md} | 22 +++++-----
 website/docs/performance.md                        | 37 +++++++----------
 website/docs/syncing_aws_glue_data_catalog.md      |  2 +-
 website/docs/syncing_datahub.md                    |  2 +-
 website/docs/syncing_metastore.md                  |  2 +-
 website/releases/release-0.11.0.md                 |  4 +-
 website/sidebars.js                                |  8 ++--
 ...async_meta_indexing.md => metadata_indexing.md} | 20 +++++----
 .../versioned_docs/version-0.11.0/performance.md   | 48 +++++++++-------------
 .../syncing_aws_glue_data_catalog.md               |  2 +-
 .../version-0.11.0/syncing_datahub.md              |  2 +-
 .../version-0.11.0/syncing_metastore.md            |  2 +-
 .../version-0.11.0-sidebars.json                   |  8 ++--
 13 files changed, 72 insertions(+), 87 deletions(-)

diff --git a/website/docs/async_meta_indexing.md b/website/docs/metadata_indexing.md
similarity index 93%
rename from website/docs/async_meta_indexing.md
rename to website/docs/metadata_indexing.md
index 406a5978c3..73c091a09f 100644
--- a/website/docs/async_meta_indexing.md
+++ b/website/docs/metadata_indexing.md
@@ -1,15 +1,17 @@
 ---
-title: Async Metadata Indexing
+title: Metadata Indexing
 summary: "In this page, we describe how to run metadata indexing asynchronously."
 toc: true
 last_modified_at:
 ---
 
-We can now create different metadata indexes, including files, bloom filters and column stats, 
-asynchronously in Hudi. Being able to index without blocking ingestion has two benefits, 
-improved ingestion latency (and hence even lesser gap between event time and arrival time), 
-and reduced point of failure on the ingestion path. To learn more about the design of this 
-feature, please check out [RFC-45](https://github.com/apache/hudi/blob/master/rfc/rfc-45/rfc-45.md).
+We can now create different metadata indexes, including files, bloom filters and column stats,
+asynchronously in Hudi, which are then used by queries and writing to improve performance.
+Being able to index without blocking writing has two benefits,
+- improved write latency
+- reduced resource wastage due to contention between writing and indexing.
+
+To learn more about the design of this feature, please check out [RFC-45](https://github.com/apache/hudi/blob/master/rfc/rfc-45/rfc-45.md).
 
 ## Setup Async Indexing
 
@@ -19,7 +21,7 @@ from raw parquet to Hudi table. We used the widely available [NY Taxi dataset](h
   <summary>Ingestion write config</summary>
 <p>
 
-```
+```bash
 hoodie.datasource.write.recordkey.field=VendorID
 hoodie.datasource.write.partitionpath.field=tpep_dropoff_datetime
 hoodie.datasource.write.precombine.field=tpep_dropoff_datetime
@@ -41,7 +43,7 @@ hoodie.write.lock.zookeeper.base_path=<zk_base_path>
   <summary>Run deltastreamer</summary>
 <p>
 
-```
+```bash
 spark-submit \
 --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls /Users/home/path/to/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.12.0-SNAPSHOT.jar` \
 --props `ls /Users/home/path/to/write/config.properties` \
@@ -184,8 +186,8 @@ Asynchronous indexing feature is still evolving. Few points to note from deploym
 - If an index is enabled via async HoodieIndexer, then ensure that index is also enabled in configs corresponding to regular ingestion writers. Otherwise, metadata writer will
   think that particular index was disabled and cleanup the metadata partition.
 - In the case of multi-writers, enable async index and specific index config for all writers.
-- Unlike other table services like compaction and clustering, where we have a separate configuration to run inline, there is no such inline config here. 
+- Unlike other table services like compaction and clustering, where we have a separate configuration to run inline, there is no such inline config here.
   For example, if async indexing is disabled and metadata is enabled along with column stats index type, then both files and column stats index will be created synchronously with ingestion.
 
-Some of these limitations will be overcome in the upcoming releases. Please
+Some of these limitations will be removed in the upcoming releases. Please
 follow [HUDI-2488](https://issues.apache.org/jira/browse/HUDI-2488) for developments on this feature.
diff --git a/website/docs/performance.md b/website/docs/performance.md
index 162b7cc85e..e64b0e551f 100644
--- a/website/docs/performance.md
+++ b/website/docs/performance.md
@@ -20,7 +20,7 @@ Here are some ways to efficiently manage the storage of your Hudi tables.
 - User can also tune the size of the [base/parquet file](/docs/configurations#hoodieparquetmaxfilesize), [log files](/docs/configurations#hoodielogfilemaxsize) & expected [compression ratio](/docs/configurations#hoodieparquetcompressionratio),
   such that sufficient number of inserts are grouped into the same file group, resulting in well sized base files ultimately.
 - Intelligently tuning the [bulk insert parallelism](/docs/configurations#hoodiebulkinsertshuffleparallelism), can again in nicely sized initial file groups. It is in fact critical to get this right, since the file groups
-  once created cannot be deleted, but simply expanded as explained before.
+  once created cannot be changed without re-clustering the table. Writes will simply expand given file groups with new updates/inserts as explained before.
 - For workloads with heavy updates, the [merge-on-read table](/docs/concepts#merge-on-read-table) provides a nice mechanism for ingesting quickly into smaller files and then later merging them into larger base files via compaction.
 
 ## Performance Optimizations
@@ -67,45 +67,36 @@ For e.g , with 100M timestamp prefixed keys (5% updates, 95% inserts) on a event
 
 #### Data Skipping
  
-Data Skipping is a technique (originally introduced in Hudi 0.10) that leverages files metadata to very effectively prune the search space, by
-avoiding reading (even footers of) the files that are known (based on the metadata) to only contain the data that _does not match_ the query's filters.
 
-Data Skipping is leveraging Metadata Table's Column Stats Index bearing column-level statistics (such as min-value, max-value, count of null-values in the column, etc)
+Data Skipping is a technique (originally introduced in Hudi 0.10) that leverages metadata to very effectively prune the search space of a query,
+by eliminating files that cannot possibly contain data matching the query's filters. By maintaining this metadata in the internal Hudi metadata table,
+Hudi avoids reading file footers to obtain this information, which can be costly for queries spanning tens of thousands of files.
+
+Data Skipping leverages metadata table's `col_stats` partition bearing column-level statistics (such as min-value, max-value, count of null-values in the column, etc)
 for every file of the Hudi table. This then allows Hudi for every incoming query instead of enumerating every file in the table and reading its corresponding metadata
 (for ex, Parquet footers) for analysis whether it could contain any data matching the query filters, to simply do a query against a Column Stats Index
 in the Metadata Table (which in turn is a Hudi table itself) and within seconds (even for TBs scale tables, with 10s of thousands of files) obtain the list
 of _all the files that might potentially contain the data_ matching query's filters with crucial property that files that could be ruled out as not containing such data
-(based on their column-level statistics) will be stripped out.
+(based on their column-level statistics) will be stripped out. See [RFC-27](https://github.com/apache/hudi/blob/master/rfc/rfc-27/rfc-27.md) for detailed design.
 
-In spirit, Data Skipping is very similar to Partition Pruning for tables using Physical Partitioning where records in the dataset are partitioned on disk
-into a folder structure based on some column's value or its derivative (clumping records together based on some intrinsic measure), but instead
-of on-disk folder structure, Data Skipping leverages index maintaining a mapping "file &rarr; columns' statistics" for all of the columns persisted 
-within that file.
+Partitioning can be considered a coarse form of indexing and data skipping using the col_stats partition can be thought of as a range index, that databases use to identify potential 
+blocks of data interesting to a query. Unlike partition pruning for tables using physical partitioning where records in the dataset are organized into a folder structure based 
+on some column's value, data skipping using col_stats delivers a logical/virtual partitioning.
 
 For very large tables (1Tb+, 10s of 1000s of files), Data skipping could
 
-1. Substantially improve query execution runtime (by avoiding fruitless Compute churn) in excess of **10x** as compared to the same query on the same dataset but w/o Data Skipping enabled.
+1. Substantially improve query execution runtime **10x** as compared to the same query on the same dataset but w/o Data Skipping enabled.
 2. Help avoid hitting Cloud Storages throttling limits (for issuing too many requests, for ex, AWS limits # of requests / sec that could be issued based on the object's prefix which considerably complicates things for partitioned tables)
 
-If you're interested in learning more details around how Data Skipping is working internally please watch out for a blog-post coming out on this soon!
-
 To unlock the power of Data Skipping you will need to
 
-1. Enable Metadata Table along with Column Stats Index on the _write path_ (See [Async Meta Indexing](/docs/async_meta_indexing)).
-2. Enable Data Skipping in your queries
-
-To enable Metadata Table along with Column Stats Index on the write path, make sure 
-the following configurations are set to `true`:
-
-  - `hoodie.metadata.enable` (to enable Metadata Table on the write path, enabled by default)
-  - `hoodie.metadata.index.column.stats.enable` (to enable Column Stats Index being populated on the write path, disabled by default)
+1. Enable Metadata Table along with Column Stats Index on the _write path_ (See [Metadata Indexing](/docs/metadata_indexing)), using `hoodie.metadata.enable=true` (to enable Metadata Table on the write path, enabled by default)
+2. Enable Data Skipping in your queries, using `hoodie.metadata.index.column.stats.enable=true` (to enable Column Stats Index being populated on the write path, disabled by default)
 
 :::note
-If you're planning on enabling Column Stats Index for already existing table, please check out the [Async Meta Indexing](/docs/async_meta_indexing) guide
-on how to build Metadata Table Indices (such as Column Stats Index) for existing tables.
+If you're planning on enabling Column Stats Index for already existing table, please check out the [Metadata Indexing](/docs/metadata_indexing) guide on how to build Metadata Table Indices (such as Column Stats Index) for existing tables.
 :::
 
-
 To enable Data Skipping in your queries make sure to set following properties to `true` (on the read path): 
 
   - `hoodie.enable.data.skipping` (to enable Data Skipping)
diff --git a/website/docs/syncing_aws_glue_data_catalog.md b/website/docs/syncing_aws_glue_data_catalog.md
index 6cb724b551..0d9075993e 100644
--- a/website/docs/syncing_aws_glue_data_catalog.md
+++ b/website/docs/syncing_aws_glue_data_catalog.md
@@ -1,5 +1,5 @@
 ---
-title: Sync to AWS Glue Data Catalog
+title: AWS Glue Data Catalog
 keywords: [hudi, aws, glue, sync]
 ---
 
diff --git a/website/docs/syncing_datahub.md b/website/docs/syncing_datahub.md
index a294f339f3..75f4ba10bc 100644
--- a/website/docs/syncing_datahub.md
+++ b/website/docs/syncing_datahub.md
@@ -1,5 +1,5 @@
 ---
-title: Sync to DataHub
+title: DataHub
 keywords: [hudi, datahub, sync]
 ---
 
diff --git a/website/docs/syncing_metastore.md b/website/docs/syncing_metastore.md
index 1b2baa0f24..f5204c15c4 100644
--- a/website/docs/syncing_metastore.md
+++ b/website/docs/syncing_metastore.md
@@ -1,5 +1,5 @@
 ---
-title: Sync to Hive Metastore
+title: Hive Metastore
 keywords: [hudi, hive, sync]
 ---
 
diff --git a/website/releases/release-0.11.0.md b/website/releases/release-0.11.0.md
index 6f35c99ded..18eca96dd8 100644
--- a/website/releases/release-0.11.0.md
+++ b/website/releases/release-0.11.0.md
@@ -58,14 +58,14 @@ ingestion. The indexer adds a new action `indexing` on the timeline. While the i
 and non-blocking to writers, a lock provider needs to be configured to safely co-ordinate the process with the inflight
 writers.
 
-*See the [async indexing guide](/docs/async_meta_indexing) for more details.*
+*See the [indexing guide](/docs/metadata_indexing) for more details.*
 
 ### Spark DataSource Improvements
 
 Hudi's Spark low-level integration got considerable overhaul consolidating common flows to share the infrastructure and
 bring both compute and data throughput efficiencies when querying the data.
 
-- Both COW and MOR (except for incremental queries) tables are now leveraging Vectorized Parquet reader while reading
+- MOR queries with no log files (except for incremental queries) tables are now leveraging Vectorized Parquet reader while reading
   the data, meaning that Parquet reader is now able to leverage modern processors vectorized instructions to further
   speed up decoding of the data. Enabled by default.
 - When standard Record Payload implementation is used (e.g., `OverwriteWithLatestAvroPayload`), MOR table will only
diff --git a/website/sidebars.js b/website/sidebars.js
index 05398de483..5747460b5c 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -48,15 +48,15 @@ module.exports = {
                 'writing_data',
                 'hoodie_deltastreamer',
                 'querying_data',
-                'gcp_bigquery',
                 'flink_configuration',
                 {
                     type: 'category',
-                    label: 'Sync to Metastore',
+                    label: 'Syncing to Catalogs',
                     items: [
                         'syncing_aws_glue_data_catalog',
                         'syncing_datahub',
-                        'syncing_metastore'
+                        'syncing_metastore',
+                        "gcp_bigquery"
                     ],
                 }
             ],
@@ -68,7 +68,7 @@ module.exports = {
                 'migration_guide',
                 'compaction',
                 'clustering',
-                'async_meta_indexing',
+                'metadata_indexing',
                 'hoodie_cleaner',
                 'transforms',
                 'markers',
diff --git a/website/versioned_docs/version-0.11.0/async_meta_indexing.md b/website/versioned_docs/version-0.11.0/metadata_indexing.md
similarity index 93%
rename from website/versioned_docs/version-0.11.0/async_meta_indexing.md
rename to website/versioned_docs/version-0.11.0/metadata_indexing.md
index 406a5978c3..585eb29000 100644
--- a/website/versioned_docs/version-0.11.0/async_meta_indexing.md
+++ b/website/versioned_docs/version-0.11.0/metadata_indexing.md
@@ -1,15 +1,17 @@
 ---
-title: Async Metadata Indexing
-summary: "In this page, we describe how to run metadata indexing asynchronously."
+title: Metadata Indexing
+summary: "In this page, we describe how to build metadata indexes asynchronously."
 toc: true
 last_modified_at:
 ---
 
 We can now create different metadata indexes, including files, bloom filters and column stats, 
-asynchronously in Hudi. Being able to index without blocking ingestion has two benefits, 
-improved ingestion latency (and hence even lesser gap between event time and arrival time), 
-and reduced point of failure on the ingestion path. To learn more about the design of this 
-feature, please check out [RFC-45](https://github.com/apache/hudi/blob/master/rfc/rfc-45/rfc-45.md).
+asynchronously in Hudi, which are then used by queries and writing to improve performance. 
+Being able to index without blocking writing has two benefits, 
+ - improved write latency
+ - reduced resource wastage due to contention between writing and indexing.
+
+To learn more about the design of this feature, please check out [RFC-45](https://github.com/apache/hudi/blob/master/rfc/rfc-45/rfc-45.md).
 
 ## Setup Async Indexing
 
@@ -19,7 +21,7 @@ from raw parquet to Hudi table. We used the widely available [NY Taxi dataset](h
   <summary>Ingestion write config</summary>
 <p>
 
-```
+```bash
 hoodie.datasource.write.recordkey.field=VendorID
 hoodie.datasource.write.partitionpath.field=tpep_dropoff_datetime
 hoodie.datasource.write.precombine.field=tpep_dropoff_datetime
@@ -41,7 +43,7 @@ hoodie.write.lock.zookeeper.base_path=<zk_base_path>
   <summary>Run deltastreamer</summary>
 <p>
 
-```
+```bash
 spark-submit \
 --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls /Users/home/path/to/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.12.0-SNAPSHOT.jar` \
 --props `ls /Users/home/path/to/write/config.properties` \
@@ -187,5 +189,5 @@ Asynchronous indexing feature is still evolving. Few points to note from deploym
 - Unlike other table services like compaction and clustering, where we have a separate configuration to run inline, there is no such inline config here. 
   For example, if async indexing is disabled and metadata is enabled along with column stats index type, then both files and column stats index will be created synchronously with ingestion.
 
-Some of these limitations will be overcome in the upcoming releases. Please
+Some of these limitations will be removed in the upcoming releases. Please
 follow [HUDI-2488](https://issues.apache.org/jira/browse/HUDI-2488) for developments on this feature.
diff --git a/website/versioned_docs/version-0.11.0/performance.md b/website/versioned_docs/version-0.11.0/performance.md
index 162b7cc85e..bcca5e014d 100644
--- a/website/versioned_docs/version-0.11.0/performance.md
+++ b/website/versioned_docs/version-0.11.0/performance.md
@@ -20,7 +20,7 @@ Here are some ways to efficiently manage the storage of your Hudi tables.
 - User can also tune the size of the [base/parquet file](/docs/configurations#hoodieparquetmaxfilesize), [log files](/docs/configurations#hoodielogfilemaxsize) & expected [compression ratio](/docs/configurations#hoodieparquetcompressionratio),
   such that sufficient number of inserts are grouped into the same file group, resulting in well sized base files ultimately.
 - Intelligently tuning the [bulk insert parallelism](/docs/configurations#hoodiebulkinsertshuffleparallelism), can again in nicely sized initial file groups. It is in fact critical to get this right, since the file groups
-  once created cannot be deleted, but simply expanded as explained before.
+  once created cannot be changed without re-clustering the table. Writes will simply expand given file groups with new updates/inserts as explained before.
 - For workloads with heavy updates, the [merge-on-read table](/docs/concepts#merge-on-read-table) provides a nice mechanism for ingesting quickly into smaller files and then later merging them into larger base files via compaction.
 
 ## Performance Optimizations
@@ -46,7 +46,7 @@ significant savings on the overall compute cost.
     <img className="docimage" src={require("/assets/images/hudi_upsert_perf2.png").default} alt="hudi_upsert_perf2.png"  />
 </figure>
 
-Hudi upserts have been stress tested upto 4TB in a single commit across the t1 table. 
+Hudi upserts have been stress tested upto 4TB in a single commit across the t1 table.
 See [here](https://cwiki.apache.org/confluence/display/HUDI/Tuning+Guide) for some tuning tips.
 
 #### Indexing
@@ -66,48 +66,38 @@ For e.g , with 100M timestamp prefixed keys (5% updates, 95% inserts) on a event
 ### Read Path
 
 #### Data Skipping
- 
-Data Skipping is a technique (originally introduced in Hudi 0.10) that leverages files metadata to very effectively prune the search space, by
-avoiding reading (even footers of) the files that are known (based on the metadata) to only contain the data that _does not match_ the query's filters.
 
-Data Skipping is leveraging Metadata Table's Column Stats Index bearing column-level statistics (such as min-value, max-value, count of null-values in the column, etc)
+Data Skipping is a technique (originally introduced in Hudi 0.10) that leverages metadata to very effectively prune the search space of a query,
+by eliminating files that cannot possibly contain data matching the query's filters. By maintaining this metadata in the internal Hudi metadata table,
+Hudi avoids reading file footers to obtain this information, which can be costly for queries spanning tens of thousands of files.
+
+Data Skipping leverages metadata table's `col_stats` partition bearing column-level statistics (such as min-value, max-value, count of null-values in the column, etc)
 for every file of the Hudi table. This then allows Hudi for every incoming query instead of enumerating every file in the table and reading its corresponding metadata
 (for ex, Parquet footers) for analysis whether it could contain any data matching the query filters, to simply do a query against a Column Stats Index
 in the Metadata Table (which in turn is a Hudi table itself) and within seconds (even for TBs scale tables, with 10s of thousands of files) obtain the list
 of _all the files that might potentially contain the data_ matching query's filters with crucial property that files that could be ruled out as not containing such data
-(based on their column-level statistics) will be stripped out.
+(based on their column-level statistics) will be stripped out. See [RFC-27](https://github.com/apache/hudi/blob/master/rfc/rfc-27/rfc-27.md) for detailed design.
 
-In spirit, Data Skipping is very similar to Partition Pruning for tables using Physical Partitioning where records in the dataset are partitioned on disk
-into a folder structure based on some column's value or its derivative (clumping records together based on some intrinsic measure), but instead
-of on-disk folder structure, Data Skipping leverages index maintaining a mapping "file &rarr; columns' statistics" for all of the columns persisted 
-within that file.
+Partitioning can be considered a coarse form of indexing and data skipping using the col_stats partition can be thought of as a range index, that databases use to identify potential
+blocks of data interesting to a query. Unlike partition pruning for tables using physical partitioning where records in the dataset are organized into a folder structure based
+on some column's value, data skipping using col_stats delivers a logical/virtual partitioning.
 
 For very large tables (1Tb+, 10s of 1000s of files), Data skipping could
 
-1. Substantially improve query execution runtime (by avoiding fruitless Compute churn) in excess of **10x** as compared to the same query on the same dataset but w/o Data Skipping enabled.
+1. Substantially improve query execution runtime **10x** as compared to the same query on the same dataset but w/o Data Skipping enabled.
 2. Help avoid hitting Cloud Storages throttling limits (for issuing too many requests, for ex, AWS limits # of requests / sec that could be issued based on the object's prefix which considerably complicates things for partitioned tables)
 
-If you're interested in learning more details around how Data Skipping is working internally please watch out for a blog-post coming out on this soon!
-
 To unlock the power of Data Skipping you will need to
 
-1. Enable Metadata Table along with Column Stats Index on the _write path_ (See [Async Meta Indexing](/docs/async_meta_indexing)).
-2. Enable Data Skipping in your queries
-
-To enable Metadata Table along with Column Stats Index on the write path, make sure 
-the following configurations are set to `true`:
-
-  - `hoodie.metadata.enable` (to enable Metadata Table on the write path, enabled by default)
-  - `hoodie.metadata.index.column.stats.enable` (to enable Column Stats Index being populated on the write path, disabled by default)
+1. Enable Metadata Table along with Column Stats Index on the _write path_ (See [Metadata Indexing](/docs/metadata_indexing)), using `hoodie.metadata.enable=true` (to enable Metadata Table on the write path, enabled by default)
+2. Enable Data Skipping in your queries, using `hoodie.metadata.index.column.stats.enable=true` (to enable Column Stats Index being populated on the write path, disabled by default)
 
 :::note
-If you're planning on enabling Column Stats Index for already existing table, please check out the [Async Meta Indexing](/docs/async_meta_indexing) guide
-on how to build Metadata Table Indices (such as Column Stats Index) for existing tables.
+If you're planning on enabling Column Stats Index for already existing table, please check out the [Metadata Indexing](/docs/metadata_indexing) guide on how to build Metadata Table Indices (such as Column Stats Index) for existing tables.
 :::
 
+To enable Data Skipping in your queries make sure to set following properties to `true` (on the read path):
 
-To enable Data Skipping in your queries make sure to set following properties to `true` (on the read path): 
-
-  - `hoodie.enable.data.skipping` (to enable Data Skipping)
-  - `hoodie.metadata.enable` (to enable Metadata Table use on the read path)
-  - `hoodie.metadata.index.column.stats.enable` (to enable Column Stats Index use on the read path)
+- `hoodie.enable.data.skipping` (to enable Data Skipping)
+- `hoodie.metadata.enable` (to enable Metadata Table use on the read path)
+- `hoodie.metadata.index.column.stats.enable` (to enable Column Stats Index use on the read path)
diff --git a/website/versioned_docs/version-0.11.0/syncing_aws_glue_data_catalog.md b/website/versioned_docs/version-0.11.0/syncing_aws_glue_data_catalog.md
index 6cb724b551..0d9075993e 100644
--- a/website/versioned_docs/version-0.11.0/syncing_aws_glue_data_catalog.md
+++ b/website/versioned_docs/version-0.11.0/syncing_aws_glue_data_catalog.md
@@ -1,5 +1,5 @@
 ---
-title: Sync to AWS Glue Data Catalog
+title: AWS Glue Data Catalog
 keywords: [hudi, aws, glue, sync]
 ---
 
diff --git a/website/versioned_docs/version-0.11.0/syncing_datahub.md b/website/versioned_docs/version-0.11.0/syncing_datahub.md
index a294f339f3..75f4ba10bc 100644
--- a/website/versioned_docs/version-0.11.0/syncing_datahub.md
+++ b/website/versioned_docs/version-0.11.0/syncing_datahub.md
@@ -1,5 +1,5 @@
 ---
-title: Sync to DataHub
+title: DataHub
 keywords: [hudi, datahub, sync]
 ---
 
diff --git a/website/versioned_docs/version-0.11.0/syncing_metastore.md b/website/versioned_docs/version-0.11.0/syncing_metastore.md
index 1b2baa0f24..f5204c15c4 100644
--- a/website/versioned_docs/version-0.11.0/syncing_metastore.md
+++ b/website/versioned_docs/version-0.11.0/syncing_metastore.md
@@ -1,5 +1,5 @@
 ---
-title: Sync to Hive Metastore
+title: Hive Metastore
 keywords: [hudi, hive, sync]
 ---
 
diff --git a/website/versioned_sidebars/version-0.11.0-sidebars.json b/website/versioned_sidebars/version-0.11.0-sidebars.json
index d18a962fd5..cbcc5f2dc5 100644
--- a/website/versioned_sidebars/version-0.11.0-sidebars.json
+++ b/website/versioned_sidebars/version-0.11.0-sidebars.json
@@ -41,15 +41,15 @@
         "writing_data",
         "hoodie_deltastreamer",
         "querying_data",
-        "gcp_bigquery",
         "flink_configuration",
         {
           "type": "category",
-          "label": "Sync to Metastore",
+          "label": "Syncing to Catalogs",
           "items": [
             "syncing_aws_glue_data_catalog",
             "syncing_datahub",
-            "syncing_metastore"
+            "syncing_metastore",
+            "gcp_bigquery"
           ]
         }
       ]
@@ -61,7 +61,7 @@
         "migration_guide",
         "compaction",
         "clustering",
-        "async_meta_indexing",
+        "metadata_indexing",
         "hoodie_cleaner",
         "transforms",
         "markers",