You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "vinothchandar (via GitHub)" <gi...@apache.org> on 2023/03/22 03:09:46 UTC
[GitHub] [hudi] vinothchandar commented on a diff in pull request #8093: [HUDI-5886][DOCS] Improve File Sizing, Timeline, and Flink docs

vinothchandar commented on code in PR #8093:
URL: https://github.com/apache/hudi/pull/8093#discussion_r1144168263


##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:

Review Comment:
   ```suggestion
   Solving the [small file problem](https://hudi.apache.org/blog/2021/03/01/hudi-file-sizing/) is fundamental to ensuring great experience on the data lake . If you don’t size the files appropriately, you can slow down the query and write performance. Some of the issues you may encounter with small files include the following:
   ```



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.

Review Comment:
   ```suggestion
   - **Queries slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. A higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering rate-limiting, as well as additional fixed costs for opening/closing them. All of this causes the queries to slow down.
   ```



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion

Review Comment:
   ```suggestion
   - Auto-size during write
   ```



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 

Review Comment:
   ```suggestion
   A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This page will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
   ```



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 
+
+:::note 
+the bulk_insert write operation does not have auto-sizing capabilities during ingestion
+:::
+
+If you need to customize the file sizing, i.e., increase the target file size or change how small files are identified, follow the instructions below for Copy-On-Write and Merge-On-Read.

Review Comment:
   ```suggestion
   If you need to control the file sizing, i.e., increase the target file size or change how small files are identified, follow the instructions below for Copy-On-Write and Merge-On-Read tables.
   ```



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion

Review Comment:
   to fix heading



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 
+
+:::note 
+the bulk_insert write operation does not have auto-sizing capabilities during ingestion
+:::
+
+If you need to customize the file sizing, i.e., increase the target file size or change how small files are identified, follow the instructions below for Copy-On-Write and Merge-On-Read.
+
+### Copy-On-Write (COW)
+To tune the file sizing for a COW table, you can set the small file limit and the maximum Parquet file size. Hudi will try to add enough records to a small file at write time to get it to the configured maximum limit.
+
+ - For example, if the `hoodie.parquet.small.file.limit=104857600` (100MB) and `hoodie.parquet.max.file.size=125829120` (120MB), Hudi will pick all files < 100MB and try to get them up to 120MB.
+
+For creating a Hudi table initially, setting an accurate record size estimate is vital to ensure Hudi can adequately estimate how many records need to be bin-packed in a Parquet file for the first ingestion batch. Then, Hudi automatically uses the average record size for subsequent writes based on previous commits.
+
+ Parameter Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `hoodie.parquet.small.file.limit` | 104857600 (100MB) | During an insert and upsert operation, we opportunistically expand existing small files on storage instead of writing new files to keep the number of files optimum. This config sets the file size limit below which a storage file becomes a candidate to be selected as such a `small file`. By default, treat any file <= 100MB as a small file. Also note that if this is set to <= 0, Hudi will not try to get small files and directly write new files. | Write COW, MOR | 0.4.0 |
+| `hoodie.parquet.max.file.size` |125829120 (120MB) | This config is the target size in bytes for parquet files produced by the Hudi write phases. For DFS, this needs to be aligned with the underlying filesystem block size for optimal performance.  | Write COW, MOR  | 0.4.0 |
+| `hoodie.copyonwrite.record.size.estimate` |1024 (1024B) | The config is the average record size. If it’s not explicitly specified, Hudi will compute the record size estimate compute dynamically based on commit metadata. This is critical in computing the insert parallelism, and bin-packing inserts into small files. | Write COW  | 0.4.0 |  
+
+## Merge-On-Read (MOR) 
+As a MOR table aims to reduce the write amplification, compared to a COW table, when writing to a MOR table, Hudi limits the number of Parquet base files to one for auto file sizing during insert and upsert operation. This limits the number of rewritten files. This can be configured through `hoodie.merge.small.file.group.candidates.limit`.
+
+In addition to file sizing Parquet base files for a MOR table, you can also tune the log files file-sizing with `hoodie.logfile.max.size`. 
+
+:::note
+For the BloomFilter index:  Small files in file groups included in the requested or inflight compaction or clustering under the active timeline, or small files with associated log files are not auto-sized with incoming inserts until the compaction or clustering is complete. For example: 
+:::
+
+- In case 1: If you had a log file and a compaction, C1, was scheduled to convert that log file to Parquet, no more inserts can go into the same file slice. 
+
+- In case 2: If the Hudi table has a file group with a Parquet base file and an associated log file from updates, or this file group is under a requested or inflight compaction, no more inserts can go into this file group to automatically size the Parquet file. Only after the compaction has been performed, and there are NO log files associated with the base Parquet file, can new inserts be sent to auto-size that parquet file.
+
+Here are the essential configurations:
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `hoodie.parquet.small.file.limit` | 104857600 (100MB) | During an insert and upsert operation, we opportunistically expand existing small files on storage instead of writing new files to keep the number of files optimum. This config sets the file size limit below which a storage file becomes a candidate to be selected as such a `small file`. By default, treat any file <= 100MB as a small file. Also note that if this is set to <= 0, Hudi will not try to get small files and directly write new files. | Write COW, MOR | 0.4.0 |
+| `hoodie.parquet.max.file.size` |125829120 (120MB) | This config is the target size in bytes for parquet files produced by the Hudi write phases. For DFS, this needs to be aligned with the underlying filesystem block size for optimal performance.  | Write COW, MOR  | 0.4.0 | 
+| `hoodie.logfile.max.size` | 1073741824 (1GB) | This is the log file max size in bytes. This is the maximum size allowed for a log file before it is rolled over to the next version. | Write MOR  | 0.4.0 | 
+| `hoodie.merge.small.file.group.candidates.limit` | 1 | This limits the number of file groups, whose base file satisfies the small-file limit to be considered for appending records during an upsert operation. This is only applicable for MOR tables. | Write MOR | 0.4.0 |
+
+
+## Auto-Sizing With Clustering
+Clustering is a service that allows you to combine small files into larger ones while at the same time (optionally) changing the data layout by sorting or applying space-filling curves like Z-order or Hilbert curve. We won’t go into all the details about clustering here, but please refer to the [clustering section](https://hudi.apache.org/docs/clustering) for more details. 
+
+Clustering is one way to achieve file sizing so you can have faster queries. When you ingest data, you may still have a lot of small files (depending on your configurations and the data size from ingestion i.e., input batch). In this case, you will want to cluster all the small files to larger files to improve query performance. Clustering can be performed in different ways. Please check out the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 
+
+An example where clustering might be very useful is when a user has a Hudi table with many small files. Then, instead of waiting for multiple ingestion batches to gradually auto-size files, a user can use the clustering service to fix all the file sizes without ingesting any new data.

Review Comment:
   the "waiting for multiple .." contradicts that we do auto file sizing during ingest. May be clarify the scenario to be - user needing a different layout, or when using a write operation like bulk_insert



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 
+
+:::note 
+the bulk_insert write operation does not have auto-sizing capabilities during ingestion
+:::
+
+If you need to customize the file sizing, i.e., increase the target file size or change how small files are identified, follow the instructions below for Copy-On-Write and Merge-On-Read.
+
+### Copy-On-Write (COW)
+To tune the file sizing for a COW table, you can set the small file limit and the maximum Parquet file size. Hudi will try to add enough records to a small file at write time to get it to the configured maximum limit.
+
+ - For example, if the `hoodie.parquet.small.file.limit=104857600` (100MB) and `hoodie.parquet.max.file.size=125829120` (120MB), Hudi will pick all files < 100MB and try to get them up to 120MB.
+
+For creating a Hudi table initially, setting an accurate record size estimate is vital to ensure Hudi can adequately estimate how many records need to be bin-packed in a Parquet file for the first ingestion batch. Then, Hudi automatically uses the average record size for subsequent writes based on previous commits.
+
+ Parameter Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `hoodie.parquet.small.file.limit` | 104857600 (100MB) | During an insert and upsert operation, we opportunistically expand existing small files on storage instead of writing new files to keep the number of files optimum. This config sets the file size limit below which a storage file becomes a candidate to be selected as such a `small file`. By default, treat any file <= 100MB as a small file. Also note that if this is set to <= 0, Hudi will not try to get small files and directly write new files. | Write COW, MOR | 0.4.0 |
+| `hoodie.parquet.max.file.size` |125829120 (120MB) | This config is the target size in bytes for parquet files produced by the Hudi write phases. For DFS, this needs to be aligned with the underlying filesystem block size for optimal performance.  | Write COW, MOR  | 0.4.0 |
+| `hoodie.copyonwrite.record.size.estimate` |1024 (1024B) | The config is the average record size. If it’s not explicitly specified, Hudi will compute the record size estimate compute dynamically based on commit metadata. This is critical in computing the insert parallelism, and bin-packing inserts into small files. | Write COW  | 0.4.0 |  
+
+## Merge-On-Read (MOR) 
+As a MOR table aims to reduce the write amplification, compared to a COW table, when writing to a MOR table, Hudi limits the number of Parquet base files to one for auto file sizing during insert and upsert operation. This limits the number of rewritten files. This can be configured through `hoodie.merge.small.file.group.candidates.limit`.
+
+In addition to file sizing Parquet base files for a MOR table, you can also tune the log files file-sizing with `hoodie.logfile.max.size`. 
+
+:::note
+For the BloomFilter index:  Small files in file groups included in the requested or inflight compaction or clustering under the active timeline, or small files with associated log files are not auto-sized with incoming inserts until the compaction or clustering is complete. For example: 
+:::
+
+- In case 1: If you had a log file and a compaction, C1, was scheduled to convert that log file to Parquet, no more inserts can go into the same file slice. 
+
+- In case 2: If the Hudi table has a file group with a Parquet base file and an associated log file from updates, or this file group is under a requested or inflight compaction, no more inserts can go into this file group to automatically size the Parquet file. Only after the compaction has been performed, and there are NO log files associated with the base Parquet file, can new inserts be sent to auto-size that parquet file.
+
+Here are the essential configurations:
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `hoodie.parquet.small.file.limit` | 104857600 (100MB) | During an insert and upsert operation, we opportunistically expand existing small files on storage instead of writing new files to keep the number of files optimum. This config sets the file size limit below which a storage file becomes a candidate to be selected as such a `small file`. By default, treat any file <= 100MB as a small file. Also note that if this is set to <= 0, Hudi will not try to get small files and directly write new files. | Write COW, MOR | 0.4.0 |
+| `hoodie.parquet.max.file.size` |125829120 (120MB) | This config is the target size in bytes for parquet files produced by the Hudi write phases. For DFS, this needs to be aligned with the underlying filesystem block size for optimal performance.  | Write COW, MOR  | 0.4.0 | 
+| `hoodie.logfile.max.size` | 1073741824 (1GB) | This is the log file max size in bytes. This is the maximum size allowed for a log file before it is rolled over to the next version. | Write MOR  | 0.4.0 | 
+| `hoodie.merge.small.file.group.candidates.limit` | 1 | This limits the number of file groups, whose base file satisfies the small-file limit to be considered for appending records during an upsert operation. This is only applicable for MOR tables. | Write MOR | 0.4.0 |
+
+
+## Auto-Sizing With Clustering
+Clustering is a service that allows you to combine small files into larger ones while at the same time (optionally) changing the data layout by sorting or applying space-filling curves like Z-order or Hilbert curve. We won’t go into all the details about clustering here, but please refer to the [clustering section](https://hudi.apache.org/docs/clustering) for more details. 

Review Comment:
   link to space filling curve blogs?



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.

Review Comment:
   ```suggestion
   - **Storage inefficiencies**: When working with many small files, you can be inefficient in using your storage. For example, many small files can yield a lower compression ratio, increasing storage costs. If you’re indexing the data, that also takes up more storage space to store additional metadata, such as column statistics. If you’re working with a smaller amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
   ```



##########
website/docs/flink_configuration.md:
##########
@@ -3,115 +3,177 @@ title: Flink Setup
 toc: true
 ---
 
-## Global Configurations
-When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`
+[Apache Flink](https://flink.apache.org/what-is-flink/flink-architecture/) is a powerful streaming-batch integrated engine that provides a stream processing framework. Flink can process events at an incredible speed with low latency. Along with Hudi, you can use streaming ingestion and consumption with sources like Kafka; and also perform batch workloads like bulk ingest, snapshot queries and incremental queries. 
 
-### Parallelism
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `taskmanager.numberOfTaskSlots` | `1` | `Integer` | The number of parallel operator or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data |
-| `parallelism.default` | `1` | `Integer` | The default parallelism used when no parallelism is specified anywhere (default: 1). For example, If the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used |
+There are three execution modes a user can configure for Flink, and within each execution mode, users can use Flink SQL writing to configure their job options. The following section describes the necessary configs for different job conditions.   
 
-### Memory
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `jobmanager.memory.process.size` | `(none)` | `MemorySize` | Total Process Memory size for the JobManager. This includes all the memory that a JobManager JVM process consumes, consisting of Total Flink Memory, JVM Metaspace, and JVM Overhead |
-| `taskmanager.memory.task.heap.size` | `(none)` | `MemorySize` | Task Heap Memory size for TaskExecutors. This is the size of JVM heap memory reserved for write cache |
-| `taskmanager.memory.managed.size`  |  `(none)`  | `MemorySize` | Managed Memory size for TaskExecutors. This is the size of off-heap memory managed by the memory manager, reserved for sorting and RocksDB state backend. If you choose RocksDB as the state backend, you need to set this memory |
-
-### Checkpoint
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `execution.checkpointing.interval` | `(none)` | `Duration` | Setting this value as `execution.checkpointing.interval = 150000ms`, 150000ms = 2.5min. Configuring this parameter is equivalent to enabling the checkpoint |
-| `state.backend` | `(none)` | `String` | The state backend to be used to store state. We recommend setting store state as `rocksdb` : `state.backend: rocksdb`  |
-| `state.backend.rocksdb.localdir` | `(none)` | `String` | The local directory (on the TaskManager) where RocksDB puts its files |
-| `state.checkpoints.dir` | `(none)` | `String` | The default directory used for storing the data files and meta data of checkpoints in a Flink supported filesystem. The storage path must be accessible from all participating processes/nodes(i.e. all TaskManagers and JobManagers), like hdfs and oss path |
-| `state.backend.incremental`  |  `false`  | `Boolean` | Option whether the state backend should create incremental checkpoints, if possible. For an incremental checkpoint, only a diff from the previous checkpoint is stored, rather than the complete checkpoint state. If store state is setting as `rocksdb`, recommending to turn on |
-
-## Table Options
-
-Flink SQL jobs can be configured through options in the `WITH` clause.
-The actual datasource level configs are listed below.
-
-### Memory
-
-:::note
-When optimizing memory, we need to pay attention to the memory configuration
-and the number of taskManagers, parallelism of write tasks (write.tasks : 4) first. After confirm each write task to be
-allocated with enough memory, we can try to set these memory options.
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.task.max.size` | Maximum memory in MB for a write task, when the threshold hits, it flushes the max size data bucket to avoid OOM. Default `1024MB` | `1024D` | The memory reserved for write buffer is `write.task.max.size` - `compaction.max_memory`. When total buffer of write tasks reach the threshold, the largest buffer in the memory will be flushed |
-| `write.batch.size`  | In order to improve the efficiency of writing, Flink write task will cache data in buffer according to the write bucket until the memory reaches the threshold. When reached threshold, the data buffer would be flushed out. Default `64MB` | `64D` |  Recommend to use the default settings  |
-| `write.log_block.size` | The log writer of Hudi will not flush the data immediately after receiving data. The writer flush data to the disk in the unit of `LogBlock`. Before `LogBlock` reached threshold, records will be buffered in the writer in form of serialized bytes. Default `128MB`  | `128` |  Recommend to use the default settings  |
-| `write.merge.max_memory` | If write type is `COPY_ON_WRITE`, Hudi will merge the incremental data and base file data. The incremental data will be cached and spilled to disk. this threshold controls the max heap size that can be used. Default `100MB`  | `100` | Recommend to use the default settings |
-| `compaction.max_memory` | Same as `write.merge.max_memory`, but occurs during compaction. Default `100MB` | `100` | If it is online compaction, it can be turned up when resources are sufficient, such as setting as `1024MB` |
-
-### Parallelism
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.tasks` |  The parallelism of writer tasks. Each write task writes 1 to `N` buckets in sequence. Default `4` | `4` | Increases the parallelism has no effect on the number of small files |
-| `write.bucket_assign.tasks`  |  The parallelism of bucket assigner operators. No default value, using Flink `parallelism.default`  | [`parallelism.default`](#parallelism) |  Increases the parallelism also increases the number of buckets, thus the number of small files (small buckets)  |
-| `write.index_boostrap.tasks` |  The parallelism of index bootstrap. Increasing parallelism can speed up the efficiency of the bootstrap stage. The bootstrap stage will block checkpointing. Therefore, it is necessary to set more checkpoint failure tolerance times. Default using Flink `parallelism.default` | [`parallelism.default`](#parallelism) | It only take effect when `index.bootsrap.enabled` is `true` |
-| `read.tasks` | The parallelism of read operators (batch and stream). Default `4`  | `4` |  |
-| `compaction.tasks` | The parallelism of online compaction. Default `4` | `4` | `Online compaction` will occupy the resources of the write task. It is recommended to use [`offline compaction`](/docs/compaction/#flink-offline-compaction) |
-
-### Compaction
-
-:::note
-These are options only for `online compaction`.
-:::
-
-:::note
-Turn off online compaction by setting `compaction.async.enabled` = `false`, but we still recommend turning on `compaction.schedule.enable` for the writing job. You can then execute the compaction plan by [`offline compaction`](#offline-compaction).
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `compaction.schedule.enabled` | Whether to generate compaction plan periodically | `true` | Recommend to turn it on, even if `compaction.async.enabled` = `false` |
-| `compaction.async.enabled`  |  Async Compaction, enabled by default for MOR | `true` | Turn off `online compaction` by turning off this option |
-| `compaction.trigger.strategy`  | Strategy to trigger compaction | `num_commits` | Options are `num_commits`: trigger compaction when reach N delta commits; `time_elapsed`: trigger compaction when time elapsed > N seconds since last compaction; `num_and_time`: trigger compaction when both `NUM_COMMITS` and `TIME_ELAPSED` are satisfied; `num_or_time`: trigger compaction when `NUM_COMMITS` or `TIME_ELAPSED` is satisfied. |
-| `compaction.delta_commits` | Max delta commits needed to trigger compaction, default `5` commits | `5` | -- |
-| `compaction.delta_seconds`  |  Max delta seconds time needed to trigger compaction, default `1` hour | `3600` | -- |
-| `compaction.max_memory` | Max memory in MB for compaction spillable map, default `100MB` | `100` | If your have sufficient resources, recommend to adjust to `1024MB` |
-| `compaction.target_io`  |  Target IO per compaction (both read and write), default `500GB`| `512000` | -- |
-
-## Memory Optimization
-
-### MOR
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. If there is enough memory, `compaction.max_memory` can be set larger (`100MB` by default, and can be adjust to `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two streamWriteFunction, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as bucketAssignFunction) will also consume memory.
-4. Pay attention to the memory changes of compaction. `compaction.max_memory` controls the maximum memory that each task can be used when compaction tasks read
-   logs. `compaction.tasks` controls the parallelism of compaction tasks.
-
-### COW
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. Increase both `write.task.max.size` and `write.merge.max_memory` (`1024MB` and `100MB` by default, adjust to `2014MB` and `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two write tasks, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as `BucketAssignFunction`) will also consume memory.
-
-
-## Write Rate Limit
+## Configure Flink Execution Modes
+You can configure the execution mode via the `execution.runtime-mode` setting. There are three possible modes:
 
-In the existing data synchronization, `snapshot data` and `incremental data` are send to kafka first, and then streaming write
-to Hudi by Flink. Because the direct consumption of `snapshot data` will lead to problems such as high throughput and serious
-disorder (writing partition randomly), which will lead to write performance degradation and throughput glitches. At this time,
-the `write.rate.limit` option can be turned on to ensure smooth writing.
-
-### Options
-
-|  Option Name  | Required | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.rate.limit` | `false` | `0` | Turn off by default |
\ No newline at end of file
+- **STREAMING**: The classic DataStream execution mode. This is the default setting for the `StreamExecutionEnvironment`. 
+- **BATCH**: Batch-style execution on the DataStream API
+- **AUTOMATIC**: Let the system decide based on the boundedness of the sources
+
+You can configured the execution mode via the command line:
+
+```sh
+$ bin/flink run -Dexecution.runtime-mode=BATCH <jarFile>
+
+```
+
+Separately, you can programmatically create and configure the `StreamExecutionEnvironment`, a Flink programming API. This execution environment is how all data pipelines are created and maintained.
+
+You can configure the execution mode programmatically. Below is an example of how to set the `BATCH` mode.
+
+```sh
+StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
+env.setRuntimeMode(RuntimeExecutionMode.BATCH);
+```
+See the [Flink docs](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/) for more details.
+
+## Global Configurations
+
+The global configurations are used to tune Flink for throughput, memory management and/or checkpoints (disaster recovery i.e., data loss). Two of the most important global configurations for a Flink job are parallelism and memory. For a long-running job, the initial resource configuration is crucial because open-source Flink does not support auto-pilot yet, where you can automatically scale up or down resources when there’s high or low data ingestion. So, you might waste or underutilize resources. 
+
+All Hudi-specific parallelism and memory configurations depend on your Flink job resources.
+
+When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`.
+
+### Parallelism
+
+If your system has a lot of data to ingest, increasing the parallelism can improve throughput significantly. Hudi supplies flexible config options for specific operators, but at a high level, a default global parallelism can reduce the complexity of manual configuration. Try the default configuration and adjust as necessary. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `taskmanager.numberOfTaskSlots` | 1 | The is the number of parallel operators or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data | n/a | 0.9.0 |
+| `parallelism.default` | 1 | The is the default parallelism used when no parallelism is specified anywhere (default: 1). For example, if the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used | n/a | 0.9.0.|
+
+### Memory
+The `JobManager` and `TaskManager` memory configuration is very important for a Flink job to work smoothly. Below, we'll describe these configurations. 
+
+#### JobManager
+The JobManager handles all the instants coordination. It keeps an in-memory fs view for all the file handles on the filesystem within its embedded timeline server. We need to ensure enough memory is allocated to avoid OOM errors. The configs below allow you to allocate the necessary memory. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `jobmanager.memory.process.size` | -- |This is the total process memory size for the JobManager. This includes all the memory that a JobManager JVM process consumes: Total Flink Memory, JVM Metaspace, and JVM Overhead | n/a | 0.9.0
+
+
+#### TaskManager
+The TaskManager is a container for the writing and table service tasks. For regular Parquet file flushing, we need to allocate enough memory to read and write files. At the same time, there must be enough resources for  MOR table compaction because it’s memory intensive: we need to read and merge all the log files into an output Parquet file. Below are the configs you can set for the TaskManager to allocate enough memory for these services. 

Review Comment:
   ```suggestion
   The TaskManager is a container for the writing and table service tasks. For regular Parquet file flushing, we need to allocate enough memory to read and write files. At the same time, there must be enough resources for  MOR table compaction because it’s memory intensive: we need to read and merge all the log files into an output base file. Below are the configs you can set for the TaskManager to allocate enough memory for these services. 
   ```



##########
website/docs/flink_configuration.md:
##########
@@ -3,115 +3,177 @@ title: Flink Setup
 toc: true
 ---
 
-## Global Configurations
-When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`
+[Apache Flink](https://flink.apache.org/what-is-flink/flink-architecture/) is a powerful streaming-batch integrated engine that provides a stream processing framework. Flink can process events at an incredible speed with low latency. Along with Hudi, you can use streaming ingestion and consumption with sources like Kafka; and also perform batch workloads like bulk ingest, snapshot queries and incremental queries. 
 
-### Parallelism
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `taskmanager.numberOfTaskSlots` | `1` | `Integer` | The number of parallel operator or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data |
-| `parallelism.default` | `1` | `Integer` | The default parallelism used when no parallelism is specified anywhere (default: 1). For example, If the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used |
+There are three execution modes a user can configure for Flink, and within each execution mode, users can use Flink SQL writing to configure their job options. The following section describes the necessary configs for different job conditions.   
 
-### Memory
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `jobmanager.memory.process.size` | `(none)` | `MemorySize` | Total Process Memory size for the JobManager. This includes all the memory that a JobManager JVM process consumes, consisting of Total Flink Memory, JVM Metaspace, and JVM Overhead |
-| `taskmanager.memory.task.heap.size` | `(none)` | `MemorySize` | Task Heap Memory size for TaskExecutors. This is the size of JVM heap memory reserved for write cache |
-| `taskmanager.memory.managed.size`  |  `(none)`  | `MemorySize` | Managed Memory size for TaskExecutors. This is the size of off-heap memory managed by the memory manager, reserved for sorting and RocksDB state backend. If you choose RocksDB as the state backend, you need to set this memory |
-
-### Checkpoint
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `execution.checkpointing.interval` | `(none)` | `Duration` | Setting this value as `execution.checkpointing.interval = 150000ms`, 150000ms = 2.5min. Configuring this parameter is equivalent to enabling the checkpoint |
-| `state.backend` | `(none)` | `String` | The state backend to be used to store state. We recommend setting store state as `rocksdb` : `state.backend: rocksdb`  |
-| `state.backend.rocksdb.localdir` | `(none)` | `String` | The local directory (on the TaskManager) where RocksDB puts its files |
-| `state.checkpoints.dir` | `(none)` | `String` | The default directory used for storing the data files and meta data of checkpoints in a Flink supported filesystem. The storage path must be accessible from all participating processes/nodes(i.e. all TaskManagers and JobManagers), like hdfs and oss path |
-| `state.backend.incremental`  |  `false`  | `Boolean` | Option whether the state backend should create incremental checkpoints, if possible. For an incremental checkpoint, only a diff from the previous checkpoint is stored, rather than the complete checkpoint state. If store state is setting as `rocksdb`, recommending to turn on |
-
-## Table Options
-
-Flink SQL jobs can be configured through options in the `WITH` clause.
-The actual datasource level configs are listed below.
-
-### Memory
-
-:::note
-When optimizing memory, we need to pay attention to the memory configuration
-and the number of taskManagers, parallelism of write tasks (write.tasks : 4) first. After confirm each write task to be
-allocated with enough memory, we can try to set these memory options.
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.task.max.size` | Maximum memory in MB for a write task, when the threshold hits, it flushes the max size data bucket to avoid OOM. Default `1024MB` | `1024D` | The memory reserved for write buffer is `write.task.max.size` - `compaction.max_memory`. When total buffer of write tasks reach the threshold, the largest buffer in the memory will be flushed |
-| `write.batch.size`  | In order to improve the efficiency of writing, Flink write task will cache data in buffer according to the write bucket until the memory reaches the threshold. When reached threshold, the data buffer would be flushed out. Default `64MB` | `64D` |  Recommend to use the default settings  |
-| `write.log_block.size` | The log writer of Hudi will not flush the data immediately after receiving data. The writer flush data to the disk in the unit of `LogBlock`. Before `LogBlock` reached threshold, records will be buffered in the writer in form of serialized bytes. Default `128MB`  | `128` |  Recommend to use the default settings  |
-| `write.merge.max_memory` | If write type is `COPY_ON_WRITE`, Hudi will merge the incremental data and base file data. The incremental data will be cached and spilled to disk. this threshold controls the max heap size that can be used. Default `100MB`  | `100` | Recommend to use the default settings |
-| `compaction.max_memory` | Same as `write.merge.max_memory`, but occurs during compaction. Default `100MB` | `100` | If it is online compaction, it can be turned up when resources are sufficient, such as setting as `1024MB` |
-
-### Parallelism
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.tasks` |  The parallelism of writer tasks. Each write task writes 1 to `N` buckets in sequence. Default `4` | `4` | Increases the parallelism has no effect on the number of small files |
-| `write.bucket_assign.tasks`  |  The parallelism of bucket assigner operators. No default value, using Flink `parallelism.default`  | [`parallelism.default`](#parallelism) |  Increases the parallelism also increases the number of buckets, thus the number of small files (small buckets)  |
-| `write.index_boostrap.tasks` |  The parallelism of index bootstrap. Increasing parallelism can speed up the efficiency of the bootstrap stage. The bootstrap stage will block checkpointing. Therefore, it is necessary to set more checkpoint failure tolerance times. Default using Flink `parallelism.default` | [`parallelism.default`](#parallelism) | It only take effect when `index.bootsrap.enabled` is `true` |
-| `read.tasks` | The parallelism of read operators (batch and stream). Default `4`  | `4` |  |
-| `compaction.tasks` | The parallelism of online compaction. Default `4` | `4` | `Online compaction` will occupy the resources of the write task. It is recommended to use [`offline compaction`](/docs/compaction/#flink-offline-compaction) |
-
-### Compaction
-
-:::note
-These are options only for `online compaction`.
-:::
-
-:::note
-Turn off online compaction by setting `compaction.async.enabled` = `false`, but we still recommend turning on `compaction.schedule.enable` for the writing job. You can then execute the compaction plan by [`offline compaction`](#offline-compaction).
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `compaction.schedule.enabled` | Whether to generate compaction plan periodically | `true` | Recommend to turn it on, even if `compaction.async.enabled` = `false` |
-| `compaction.async.enabled`  |  Async Compaction, enabled by default for MOR | `true` | Turn off `online compaction` by turning off this option |
-| `compaction.trigger.strategy`  | Strategy to trigger compaction | `num_commits` | Options are `num_commits`: trigger compaction when reach N delta commits; `time_elapsed`: trigger compaction when time elapsed > N seconds since last compaction; `num_and_time`: trigger compaction when both `NUM_COMMITS` and `TIME_ELAPSED` are satisfied; `num_or_time`: trigger compaction when `NUM_COMMITS` or `TIME_ELAPSED` is satisfied. |
-| `compaction.delta_commits` | Max delta commits needed to trigger compaction, default `5` commits | `5` | -- |
-| `compaction.delta_seconds`  |  Max delta seconds time needed to trigger compaction, default `1` hour | `3600` | -- |
-| `compaction.max_memory` | Max memory in MB for compaction spillable map, default `100MB` | `100` | If your have sufficient resources, recommend to adjust to `1024MB` |
-| `compaction.target_io`  |  Target IO per compaction (both read and write), default `500GB`| `512000` | -- |
-
-## Memory Optimization
-
-### MOR
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. If there is enough memory, `compaction.max_memory` can be set larger (`100MB` by default, and can be adjust to `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two streamWriteFunction, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as bucketAssignFunction) will also consume memory.
-4. Pay attention to the memory changes of compaction. `compaction.max_memory` controls the maximum memory that each task can be used when compaction tasks read
-   logs. `compaction.tasks` controls the parallelism of compaction tasks.
-
-### COW
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. Increase both `write.task.max.size` and `write.merge.max_memory` (`1024MB` and `100MB` by default, adjust to `2014MB` and `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two write tasks, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as `BucketAssignFunction`) will also consume memory.
-
-
-## Write Rate Limit
+## Configure Flink Execution Modes
+You can configure the execution mode via the `execution.runtime-mode` setting. There are three possible modes:
 
-In the existing data synchronization, `snapshot data` and `incremental data` are send to kafka first, and then streaming write
-to Hudi by Flink. Because the direct consumption of `snapshot data` will lead to problems such as high throughput and serious
-disorder (writing partition randomly), which will lead to write performance degradation and throughput glitches. At this time,
-the `write.rate.limit` option can be turned on to ensure smooth writing.
-
-### Options
-
-|  Option Name  | Required | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.rate.limit` | `false` | `0` | Turn off by default |
\ No newline at end of file
+- **STREAMING**: The classic DataStream execution mode. This is the default setting for the `StreamExecutionEnvironment`. 
+- **BATCH**: Batch-style execution on the DataStream API
+- **AUTOMATIC**: Let the system decide based on the boundedness of the sources
+
+You can configured the execution mode via the command line:
+
+```sh
+$ bin/flink run -Dexecution.runtime-mode=BATCH <jarFile>
+
+```
+
+Separately, you can programmatically create and configure the `StreamExecutionEnvironment`, a Flink programming API. This execution environment is how all data pipelines are created and maintained.
+
+You can configure the execution mode programmatically. Below is an example of how to set the `BATCH` mode.
+
+```sh
+StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
+env.setRuntimeMode(RuntimeExecutionMode.BATCH);
+```
+See the [Flink docs](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/) for more details.
+
+## Global Configurations
+
+The global configurations are used to tune Flink for throughput, memory management and/or checkpoints (disaster recovery i.e., data loss). Two of the most important global configurations for a Flink job are parallelism and memory. For a long-running job, the initial resource configuration is crucial because open-source Flink does not support auto-pilot yet, where you can automatically scale up or down resources when there’s high or low data ingestion. So, you might waste or underutilize resources. 
+
+All Hudi-specific parallelism and memory configurations depend on your Flink job resources.
+
+When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`.
+
+### Parallelism
+
+If your system has a lot of data to ingest, increasing the parallelism can improve throughput significantly. Hudi supplies flexible config options for specific operators, but at a high level, a default global parallelism can reduce the complexity of manual configuration. Try the default configuration and adjust as necessary. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `taskmanager.numberOfTaskSlots` | 1 | The is the number of parallel operators or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data | n/a | 0.9.0 |
+| `parallelism.default` | 1 | The is the default parallelism used when no parallelism is specified anywhere (default: 1). For example, if the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used | n/a | 0.9.0.|
+
+### Memory
+The `JobManager` and `TaskManager` memory configuration is very important for a Flink job to work smoothly. Below, we'll describe these configurations. 
+
+#### JobManager
+The JobManager handles all the instants coordination. It keeps an in-memory fs view for all the file handles on the filesystem within its embedded timeline server. We need to ensure enough memory is allocated to avoid OOM errors. The configs below allow you to allocate the necessary memory. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `jobmanager.memory.process.size` | -- |This is the total process memory size for the JobManager. This includes all the memory that a JobManager JVM process consumes: Total Flink Memory, JVM Metaspace, and JVM Overhead | n/a | 0.9.0
+
+
+#### TaskManager
+The TaskManager is a container for the writing and table service tasks. For regular Parquet file flushing, we need to allocate enough memory to read and write files. At the same time, there must be enough resources for  MOR table compaction because it’s memory intensive: we need to read and merge all the log files into an output Parquet file. Below are the configs you can set for the TaskManager to allocate enough memory for these services. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `taskmanager.memory.task.heap.size` | -- | This is the task heap memory size for TaskExecutors. This is the size of JVM heap memory reserved for write cache. | n/a | 0.9.0 |
+| `taskmanager.memory.managed.size` | -- | This is the managed memory size for TaskExecutors. This is the size of off-heap memory managed by the memory manager, reserved for sorting and RocksDB state backend. If you choose RocksDB as the state backend, you need to set this memory | n/a | 0.9.0 |
+
+#### Checkpoint
+Checkpoint is a disaster recovery mechanism for Flink. When a job fails, users can choose to recover the job from the latest checkpoint to keep the latest data correctness. To keep the transaction integrity, we flush the memory buffer into a Hudi table for persistence during the checkpointing lifecycle. It’s important to note the Hudi transaction cannot be committed without enabling the checkpoint config.

Review Comment:
   ```suggestion
   Checkpointing is a disaster recovery mechanism for Flink. When a job fails, users can choose to recover the job from the latest checkpoint to keep the latest data correct. To keep the transaction integrity, we flush the memory buffer into a Hudi table for persistence during the checkpointing lifecycle. It’s important to note the Hudi transaction cannot be committed without enabling the checkpoint configuration for Flink.
   ```



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.

Review Comment:
   lets use "queries" vs "reads" since its resonates easier with users?



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 

Review Comment:
   can we use append and mutable - consistently when talking about data models? ideally we define it upfront somewhere. upserts - in this context is not quite correct, since even delete operations go over the same path.



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 
+
+:::note 
+the bulk_insert write operation does not have auto-sizing capabilities during ingestion
+:::
+
+If you need to customize the file sizing, i.e., increase the target file size or change how small files are identified, follow the instructions below for Copy-On-Write and Merge-On-Read.
+
+### Copy-On-Write (COW)
+To tune the file sizing for a COW table, you can set the small file limit and the maximum Parquet file size. Hudi will try to add enough records to a small file at write time to get it to the configured maximum limit.
+
+ - For example, if the `hoodie.parquet.small.file.limit=104857600` (100MB) and `hoodie.parquet.max.file.size=125829120` (120MB), Hudi will pick all files < 100MB and try to get them up to 120MB.
+
+For creating a Hudi table initially, setting an accurate record size estimate is vital to ensure Hudi can adequately estimate how many records need to be bin-packed in a Parquet file for the first ingestion batch. Then, Hudi automatically uses the average record size for subsequent writes based on previous commits.

Review Comment:
   this is same for both CoW and MoR



##########
website/docs/timeline.md:
##########
@@ -3,40 +3,386 @@ title: Timeline
 toc: true
 ---
 
-## Timeline
-At its core, Hudi maintains a `timeline` of all actions performed on the table at different `instants` of time that helps provide instantaneous views of the table,
-while also efficiently supporting retrieval of data in the order of arrival. A Hudi instant consists of the following components
+A Hudi table maintains all operations happened to the table in a single timeline comprised of two parts, an active timeline and an archived timeline. The active timeline stores all the recent instants, while the archived timeline stores the older instants. An instant is a transaction where all respective partitions within a base path have been successfully updated by either a writer or a table service. Instants that get older in the active timeline are moved to archived timeline at various times.

Review Comment:
   instant is not a transaction, just an id for a transaction. This file needs a deeper review still



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:

Review Comment:
   drop the stale analytics piece? if you intending to link this to increased write latency, that's not connecting clearly IMO.



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.

Review Comment:
   ```suggestion
   - **Pipelines slow down**: You can slow down your Spark, Flink or Hive jobs due to excessive scheduling overhead or memory requirements; the more files you have, the more tasks you create.
   ```



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 
+
+:::note 
+the bulk_insert write operation does not have auto-sizing capabilities during ingestion
+:::
+
+If you need to customize the file sizing, i.e., increase the target file size or change how small files are identified, follow the instructions below for Copy-On-Write and Merge-On-Read.
+
+### Copy-On-Write (COW)
+To tune the file sizing for a COW table, you can set the small file limit and the maximum Parquet file size. Hudi will try to add enough records to a small file at write time to get it to the configured maximum limit.
+
+ - For example, if the `hoodie.parquet.small.file.limit=104857600` (100MB) and `hoodie.parquet.max.file.size=125829120` (120MB), Hudi will pick all files < 100MB and try to get them up to 120MB.
+
+For creating a Hudi table initially, setting an accurate record size estimate is vital to ensure Hudi can adequately estimate how many records need to be bin-packed in a Parquet file for the first ingestion batch. Then, Hudi automatically uses the average record size for subsequent writes based on previous commits.
+
+ Parameter Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `hoodie.parquet.small.file.limit` | 104857600 (100MB) | During an insert and upsert operation, we opportunistically expand existing small files on storage instead of writing new files to keep the number of files optimum. This config sets the file size limit below which a storage file becomes a candidate to be selected as such a `small file`. By default, treat any file <= 100MB as a small file. Also note that if this is set to <= 0, Hudi will not try to get small files and directly write new files. | Write COW, MOR | 0.4.0 |
+| `hoodie.parquet.max.file.size` |125829120 (120MB) | This config is the target size in bytes for parquet files produced by the Hudi write phases. For DFS, this needs to be aligned with the underlying filesystem block size for optimal performance.  | Write COW, MOR  | 0.4.0 |
+| `hoodie.copyonwrite.record.size.estimate` |1024 (1024B) | The config is the average record size. If it’s not explicitly specified, Hudi will compute the record size estimate compute dynamically based on commit metadata. This is critical in computing the insert parallelism, and bin-packing inserts into small files. | Write COW  | 0.4.0 |  
+
+## Merge-On-Read (MOR) 
+As a MOR table aims to reduce the write amplification, compared to a COW table, when writing to a MOR table, Hudi limits the number of Parquet base files to one for auto file sizing during insert and upsert operation. This limits the number of rewritten files. This can be configured through `hoodie.merge.small.file.group.candidates.limit`.
+
+In addition to file sizing Parquet base files for a MOR table, you can also tune the log files file-sizing with `hoodie.logfile.max.size`. 

Review Comment:
   this log rollover limit only works on storage systems that support append operation. So on cloud storage like S3/GCS, this may not be respected



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.

Review Comment:
   do we need to repeat these three points. 



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 

Review Comment:
   ```suggestion
   You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some write latency, but it ensures that the queries are always efficient when a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **append** use cases only; **mutable** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 
   ```



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering

Review Comment:
   ```suggestion
   - Clustering after write
   ```



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 
+
+:::note 
+the bulk_insert write operation does not have auto-sizing capabilities during ingestion

Review Comment:
   we can move this up into the section above?



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 

Review Comment:
   to double check: 120 or 100



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 
+
+:::note 
+the bulk_insert write operation does not have auto-sizing capabilities during ingestion
+:::
+
+If you need to customize the file sizing, i.e., increase the target file size or change how small files are identified, follow the instructions below for Copy-On-Write and Merge-On-Read.
+
+### Copy-On-Write (COW)
+To tune the file sizing for a COW table, you can set the small file limit and the maximum Parquet file size. Hudi will try to add enough records to a small file at write time to get it to the configured maximum limit.
+
+ - For example, if the `hoodie.parquet.small.file.limit=104857600` (100MB) and `hoodie.parquet.max.file.size=125829120` (120MB), Hudi will pick all files < 100MB and try to get them up to 120MB.
+
+For creating a Hudi table initially, setting an accurate record size estimate is vital to ensure Hudi can adequately estimate how many records need to be bin-packed in a Parquet file for the first ingestion batch. Then, Hudi automatically uses the average record size for subsequent writes based on previous commits.
+
+ Parameter Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `hoodie.parquet.small.file.limit` | 104857600 (100MB) | During an insert and upsert operation, we opportunistically expand existing small files on storage instead of writing new files to keep the number of files optimum. This config sets the file size limit below which a storage file becomes a candidate to be selected as such a `small file`. By default, treat any file <= 100MB as a small file. Also note that if this is set to <= 0, Hudi will not try to get small files and directly write new files. | Write COW, MOR | 0.4.0 |
+| `hoodie.parquet.max.file.size` |125829120 (120MB) | This config is the target size in bytes for parquet files produced by the Hudi write phases. For DFS, this needs to be aligned with the underlying filesystem block size for optimal performance.  | Write COW, MOR  | 0.4.0 |
+| `hoodie.copyonwrite.record.size.estimate` |1024 (1024B) | The config is the average record size. If it’s not explicitly specified, Hudi will compute the record size estimate compute dynamically based on commit metadata. This is critical in computing the insert parallelism, and bin-packing inserts into small files. | Write COW  | 0.4.0 |  
+
+## Merge-On-Read (MOR) 
+As a MOR table aims to reduce the write amplification, compared to a COW table, when writing to a MOR table, Hudi limits the number of Parquet base files to one for auto file sizing during insert and upsert operation. This limits the number of rewritten files. This can be configured through `hoodie.merge.small.file.group.candidates.limit`.
+
+In addition to file sizing Parquet base files for a MOR table, you can also tune the log files file-sizing with `hoodie.logfile.max.size`. 
+
+:::note
+For the BloomFilter index:  Small files in file groups included in the requested or inflight compaction or clustering under the active timeline, or small files with associated log files are not auto-sized with incoming inserts until the compaction or clustering is complete. For example: 
+:::
+
+- In case 1: If you had a log file and a compaction, C1, was scheduled to convert that log file to Parquet, no more inserts can go into the same file slice. 
+
+- In case 2: If the Hudi table has a file group with a Parquet base file and an associated log file from updates, or this file group is under a requested or inflight compaction, no more inserts can go into this file group to automatically size the Parquet file. Only after the compaction has been performed, and there are NO log files associated with the base Parquet file, can new inserts be sent to auto-size that parquet file.

Review Comment:
   we can drop "Parquet" wherever its not needed per see. All of this is applied to "base" files in general



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 
+
+:::note 
+the bulk_insert write operation does not have auto-sizing capabilities during ingestion
+:::
+
+If you need to customize the file sizing, i.e., increase the target file size or change how small files are identified, follow the instructions below for Copy-On-Write and Merge-On-Read.
+
+### Copy-On-Write (COW)
+To tune the file sizing for a COW table, you can set the small file limit and the maximum Parquet file size. Hudi will try to add enough records to a small file at write time to get it to the configured maximum limit.
+
+ - For example, if the `hoodie.parquet.small.file.limit=104857600` (100MB) and `hoodie.parquet.max.file.size=125829120` (120MB), Hudi will pick all files < 100MB and try to get them up to 120MB.
+
+For creating a Hudi table initially, setting an accurate record size estimate is vital to ensure Hudi can adequately estimate how many records need to be bin-packed in a Parquet file for the first ingestion batch. Then, Hudi automatically uses the average record size for subsequent writes based on previous commits.
+
+ Parameter Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `hoodie.parquet.small.file.limit` | 104857600 (100MB) | During an insert and upsert operation, we opportunistically expand existing small files on storage instead of writing new files to keep the number of files optimum. This config sets the file size limit below which a storage file becomes a candidate to be selected as such a `small file`. By default, treat any file <= 100MB as a small file. Also note that if this is set to <= 0, Hudi will not try to get small files and directly write new files. | Write COW, MOR | 0.4.0 |
+| `hoodie.parquet.max.file.size` |125829120 (120MB) | This config is the target size in bytes for parquet files produced by the Hudi write phases. For DFS, this needs to be aligned with the underlying filesystem block size for optimal performance.  | Write COW, MOR  | 0.4.0 |
+| `hoodie.copyonwrite.record.size.estimate` |1024 (1024B) | The config is the average record size. If it’s not explicitly specified, Hudi will compute the record size estimate compute dynamically based on commit metadata. This is critical in computing the insert parallelism, and bin-packing inserts into small files. | Write COW  | 0.4.0 |  
+
+## Merge-On-Read (MOR) 
+As a MOR table aims to reduce the write amplification, compared to a COW table, when writing to a MOR table, Hudi limits the number of Parquet base files to one for auto file sizing during insert and upsert operation. This limits the number of rewritten files. This can be configured through `hoodie.merge.small.file.group.candidates.limit`.
+
+In addition to file sizing Parquet base files for a MOR table, you can also tune the log files file-sizing with `hoodie.logfile.max.size`. 
+
+:::note
+For the BloomFilter index:  Small files in file groups included in the requested or inflight compaction or clustering under the active timeline, or small files with associated log files are not auto-sized with incoming inserts until the compaction or clustering is complete. For example: 

Review Comment:
   this is same even for simple index for e.g can we remove reference to the index type



##########
website/docs/flink_configuration.md:
##########
@@ -3,115 +3,177 @@ title: Flink Setup
 toc: true
 ---
 
-## Global Configurations
-When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`
+[Apache Flink](https://flink.apache.org/what-is-flink/flink-architecture/) is a powerful streaming-batch integrated engine that provides a stream processing framework. Flink can process events at an incredible speed with low latency. Along with Hudi, you can use streaming ingestion and consumption with sources like Kafka; and also perform batch workloads like bulk ingest, snapshot queries and incremental queries. 
 
-### Parallelism
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `taskmanager.numberOfTaskSlots` | `1` | `Integer` | The number of parallel operator or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data |
-| `parallelism.default` | `1` | `Integer` | The default parallelism used when no parallelism is specified anywhere (default: 1). For example, If the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used |
+There are three execution modes a user can configure for Flink, and within each execution mode, users can use Flink SQL writing to configure their job options. The following section describes the necessary configs for different job conditions.   
 
-### Memory
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `jobmanager.memory.process.size` | `(none)` | `MemorySize` | Total Process Memory size for the JobManager. This includes all the memory that a JobManager JVM process consumes, consisting of Total Flink Memory, JVM Metaspace, and JVM Overhead |
-| `taskmanager.memory.task.heap.size` | `(none)` | `MemorySize` | Task Heap Memory size for TaskExecutors. This is the size of JVM heap memory reserved for write cache |
-| `taskmanager.memory.managed.size`  |  `(none)`  | `MemorySize` | Managed Memory size for TaskExecutors. This is the size of off-heap memory managed by the memory manager, reserved for sorting and RocksDB state backend. If you choose RocksDB as the state backend, you need to set this memory |
-
-### Checkpoint
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `execution.checkpointing.interval` | `(none)` | `Duration` | Setting this value as `execution.checkpointing.interval = 150000ms`, 150000ms = 2.5min. Configuring this parameter is equivalent to enabling the checkpoint |
-| `state.backend` | `(none)` | `String` | The state backend to be used to store state. We recommend setting store state as `rocksdb` : `state.backend: rocksdb`  |
-| `state.backend.rocksdb.localdir` | `(none)` | `String` | The local directory (on the TaskManager) where RocksDB puts its files |
-| `state.checkpoints.dir` | `(none)` | `String` | The default directory used for storing the data files and meta data of checkpoints in a Flink supported filesystem. The storage path must be accessible from all participating processes/nodes(i.e. all TaskManagers and JobManagers), like hdfs and oss path |
-| `state.backend.incremental`  |  `false`  | `Boolean` | Option whether the state backend should create incremental checkpoints, if possible. For an incremental checkpoint, only a diff from the previous checkpoint is stored, rather than the complete checkpoint state. If store state is setting as `rocksdb`, recommending to turn on |
-
-## Table Options
-
-Flink SQL jobs can be configured through options in the `WITH` clause.
-The actual datasource level configs are listed below.
-
-### Memory
-
-:::note
-When optimizing memory, we need to pay attention to the memory configuration
-and the number of taskManagers, parallelism of write tasks (write.tasks : 4) first. After confirm each write task to be
-allocated with enough memory, we can try to set these memory options.
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.task.max.size` | Maximum memory in MB for a write task, when the threshold hits, it flushes the max size data bucket to avoid OOM. Default `1024MB` | `1024D` | The memory reserved for write buffer is `write.task.max.size` - `compaction.max_memory`. When total buffer of write tasks reach the threshold, the largest buffer in the memory will be flushed |
-| `write.batch.size`  | In order to improve the efficiency of writing, Flink write task will cache data in buffer according to the write bucket until the memory reaches the threshold. When reached threshold, the data buffer would be flushed out. Default `64MB` | `64D` |  Recommend to use the default settings  |
-| `write.log_block.size` | The log writer of Hudi will not flush the data immediately after receiving data. The writer flush data to the disk in the unit of `LogBlock`. Before `LogBlock` reached threshold, records will be buffered in the writer in form of serialized bytes. Default `128MB`  | `128` |  Recommend to use the default settings  |
-| `write.merge.max_memory` | If write type is `COPY_ON_WRITE`, Hudi will merge the incremental data and base file data. The incremental data will be cached and spilled to disk. this threshold controls the max heap size that can be used. Default `100MB`  | `100` | Recommend to use the default settings |
-| `compaction.max_memory` | Same as `write.merge.max_memory`, but occurs during compaction. Default `100MB` | `100` | If it is online compaction, it can be turned up when resources are sufficient, such as setting as `1024MB` |
-
-### Parallelism
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.tasks` |  The parallelism of writer tasks. Each write task writes 1 to `N` buckets in sequence. Default `4` | `4` | Increases the parallelism has no effect on the number of small files |
-| `write.bucket_assign.tasks`  |  The parallelism of bucket assigner operators. No default value, using Flink `parallelism.default`  | [`parallelism.default`](#parallelism) |  Increases the parallelism also increases the number of buckets, thus the number of small files (small buckets)  |
-| `write.index_boostrap.tasks` |  The parallelism of index bootstrap. Increasing parallelism can speed up the efficiency of the bootstrap stage. The bootstrap stage will block checkpointing. Therefore, it is necessary to set more checkpoint failure tolerance times. Default using Flink `parallelism.default` | [`parallelism.default`](#parallelism) | It only take effect when `index.bootsrap.enabled` is `true` |
-| `read.tasks` | The parallelism of read operators (batch and stream). Default `4`  | `4` |  |
-| `compaction.tasks` | The parallelism of online compaction. Default `4` | `4` | `Online compaction` will occupy the resources of the write task. It is recommended to use [`offline compaction`](/docs/compaction/#flink-offline-compaction) |
-
-### Compaction
-
-:::note
-These are options only for `online compaction`.
-:::
-
-:::note
-Turn off online compaction by setting `compaction.async.enabled` = `false`, but we still recommend turning on `compaction.schedule.enable` for the writing job. You can then execute the compaction plan by [`offline compaction`](#offline-compaction).
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `compaction.schedule.enabled` | Whether to generate compaction plan periodically | `true` | Recommend to turn it on, even if `compaction.async.enabled` = `false` |
-| `compaction.async.enabled`  |  Async Compaction, enabled by default for MOR | `true` | Turn off `online compaction` by turning off this option |
-| `compaction.trigger.strategy`  | Strategy to trigger compaction | `num_commits` | Options are `num_commits`: trigger compaction when reach N delta commits; `time_elapsed`: trigger compaction when time elapsed > N seconds since last compaction; `num_and_time`: trigger compaction when both `NUM_COMMITS` and `TIME_ELAPSED` are satisfied; `num_or_time`: trigger compaction when `NUM_COMMITS` or `TIME_ELAPSED` is satisfied. |
-| `compaction.delta_commits` | Max delta commits needed to trigger compaction, default `5` commits | `5` | -- |
-| `compaction.delta_seconds`  |  Max delta seconds time needed to trigger compaction, default `1` hour | `3600` | -- |
-| `compaction.max_memory` | Max memory in MB for compaction spillable map, default `100MB` | `100` | If your have sufficient resources, recommend to adjust to `1024MB` |
-| `compaction.target_io`  |  Target IO per compaction (both read and write), default `500GB`| `512000` | -- |
-
-## Memory Optimization
-
-### MOR
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. If there is enough memory, `compaction.max_memory` can be set larger (`100MB` by default, and can be adjust to `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two streamWriteFunction, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as bucketAssignFunction) will also consume memory.
-4. Pay attention to the memory changes of compaction. `compaction.max_memory` controls the maximum memory that each task can be used when compaction tasks read
-   logs. `compaction.tasks` controls the parallelism of compaction tasks.
-
-### COW
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. Increase both `write.task.max.size` and `write.merge.max_memory` (`1024MB` and `100MB` by default, adjust to `2014MB` and `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two write tasks, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as `BucketAssignFunction`) will also consume memory.
-
-
-## Write Rate Limit
+## Configure Flink Execution Modes
+You can configure the execution mode via the `execution.runtime-mode` setting. There are three possible modes:
 
-In the existing data synchronization, `snapshot data` and `incremental data` are send to kafka first, and then streaming write
-to Hudi by Flink. Because the direct consumption of `snapshot data` will lead to problems such as high throughput and serious
-disorder (writing partition randomly), which will lead to write performance degradation and throughput glitches. At this time,
-the `write.rate.limit` option can be turned on to ensure smooth writing.
-
-### Options
-
-|  Option Name  | Required | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.rate.limit` | `false` | `0` | Turn off by default |
\ No newline at end of file
+- **STREAMING**: The classic DataStream execution mode. This is the default setting for the `StreamExecutionEnvironment`. 
+- **BATCH**: Batch-style execution on the DataStream API
+- **AUTOMATIC**: Let the system decide based on the boundedness of the sources
+
+You can configured the execution mode via the command line:
+
+```sh
+$ bin/flink run -Dexecution.runtime-mode=BATCH <jarFile>
+
+```
+
+Separately, you can programmatically create and configure the `StreamExecutionEnvironment`, a Flink programming API. This execution environment is how all data pipelines are created and maintained.
+
+You can configure the execution mode programmatically. Below is an example of how to set the `BATCH` mode.
+
+```sh
+StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
+env.setRuntimeMode(RuntimeExecutionMode.BATCH);
+```
+See the [Flink docs](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/) for more details.
+
+## Global Configurations
+
+The global configurations are used to tune Flink for throughput, memory management and/or checkpoints (disaster recovery i.e., data loss). Two of the most important global configurations for a Flink job are parallelism and memory. For a long-running job, the initial resource configuration is crucial because open-source Flink does not support auto-pilot yet, where you can automatically scale up or down resources when there’s high or low data ingestion. So, you might waste or underutilize resources. 
+
+All Hudi-specific parallelism and memory configurations depend on your Flink job resources.
+
+When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`.
+
+### Parallelism
+
+If your system has a lot of data to ingest, increasing the parallelism can improve throughput significantly. Hudi supplies flexible config options for specific operators, but at a high level, a default global parallelism can reduce the complexity of manual configuration. Try the default configuration and adjust as necessary. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `taskmanager.numberOfTaskSlots` | 1 | The is the number of parallel operators or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data | n/a | 0.9.0 |
+| `parallelism.default` | 1 | The is the default parallelism used when no parallelism is specified anywhere (default: 1). For example, if the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used | n/a | 0.9.0.|
+
+### Memory
+The `JobManager` and `TaskManager` memory configuration is very important for a Flink job to work smoothly. Below, we'll describe these configurations. 
+
+#### JobManager
+The JobManager handles all the instants coordination. It keeps an in-memory fs view for all the file handles on the filesystem within its embedded timeline server. We need to ensure enough memory is allocated to avoid OOM errors. The configs below allow you to allocate the necessary memory. 

Review Comment:
   ```suggestion
   The JobManager handles all the instants coordination. It keeps an in-memory fs view for all the file handles on storage within an embedded timeline server. We need to ensure enough memory is allocated to avoid OOM errors. The configs below allow you to allocate the necessary memory. 
   ```



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 
+
+:::note 
+the bulk_insert write operation does not have auto-sizing capabilities during ingestion
+:::
+
+If you need to customize the file sizing, i.e., increase the target file size or change how small files are identified, follow the instructions below for Copy-On-Write and Merge-On-Read.
+
+### Copy-On-Write (COW)
+To tune the file sizing for a COW table, you can set the small file limit and the maximum Parquet file size. Hudi will try to add enough records to a small file at write time to get it to the configured maximum limit.
+
+ - For example, if the `hoodie.parquet.small.file.limit=104857600` (100MB) and `hoodie.parquet.max.file.size=125829120` (120MB), Hudi will pick all files < 100MB and try to get them up to 120MB.
+
+For creating a Hudi table initially, setting an accurate record size estimate is vital to ensure Hudi can adequately estimate how many records need to be bin-packed in a Parquet file for the first ingestion batch. Then, Hudi automatically uses the average record size for subsequent writes based on previous commits.
+
+ Parameter Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `hoodie.parquet.small.file.limit` | 104857600 (100MB) | During an insert and upsert operation, we opportunistically expand existing small files on storage instead of writing new files to keep the number of files optimum. This config sets the file size limit below which a storage file becomes a candidate to be selected as such a `small file`. By default, treat any file <= 100MB as a small file. Also note that if this is set to <= 0, Hudi will not try to get small files and directly write new files. | Write COW, MOR | 0.4.0 |
+| `hoodie.parquet.max.file.size` |125829120 (120MB) | This config is the target size in bytes for parquet files produced by the Hudi write phases. For DFS, this needs to be aligned with the underlying filesystem block size for optimal performance.  | Write COW, MOR  | 0.4.0 |
+| `hoodie.copyonwrite.record.size.estimate` |1024 (1024B) | The config is the average record size. If it’s not explicitly specified, Hudi will compute the record size estimate compute dynamically based on commit metadata. This is critical in computing the insert parallelism, and bin-packing inserts into small files. | Write COW  | 0.4.0 |  
+
+## Merge-On-Read (MOR) 
+As a MOR table aims to reduce the write amplification, compared to a COW table, when writing to a MOR table, Hudi limits the number of Parquet base files to one for auto file sizing during insert and upsert operation. This limits the number of rewritten files. This can be configured through `hoodie.merge.small.file.group.candidates.limit`.
+
+In addition to file sizing Parquet base files for a MOR table, you can also tune the log files file-sizing with `hoodie.logfile.max.size`. 
+
+:::note
+For the BloomFilter index:  Small files in file groups included in the requested or inflight compaction or clustering under the active timeline, or small files with associated log files are not auto-sized with incoming inserts until the compaction or clustering is complete. For example: 
+:::
+
+- In case 1: If you had a log file and a compaction, C1, was scheduled to convert that log file to Parquet, no more inserts can go into the same file slice. 
+
+- In case 2: If the Hudi table has a file group with a Parquet base file and an associated log file from updates, or this file group is under a requested or inflight compaction, no more inserts can go into this file group to automatically size the Parquet file. Only after the compaction has been performed, and there are NO log files associated with the base Parquet file, can new inserts be sent to auto-size that parquet file.
+
+Here are the essential configurations:
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `hoodie.parquet.small.file.limit` | 104857600 (100MB) | During an insert and upsert operation, we opportunistically expand existing small files on storage instead of writing new files to keep the number of files optimum. This config sets the file size limit below which a storage file becomes a candidate to be selected as such a `small file`. By default, treat any file <= 100MB as a small file. Also note that if this is set to <= 0, Hudi will not try to get small files and directly write new files. | Write COW, MOR | 0.4.0 |
+| `hoodie.parquet.max.file.size` |125829120 (120MB) | This config is the target size in bytes for parquet files produced by the Hudi write phases. For DFS, this needs to be aligned with the underlying filesystem block size for optimal performance.  | Write COW, MOR  | 0.4.0 | 
+| `hoodie.logfile.max.size` | 1073741824 (1GB) | This is the log file max size in bytes. This is the maximum size allowed for a log file before it is rolled over to the next version. | Write MOR  | 0.4.0 | 
+| `hoodie.merge.small.file.group.candidates.limit` | 1 | This limits the number of file groups, whose base file satisfies the small-file limit to be considered for appending records during an upsert operation. This is only applicable for MOR tables. | Write MOR | 0.4.0 |
+
+
+## Auto-Sizing With Clustering
+Clustering is a service that allows you to combine small files into larger ones while at the same time (optionally) changing the data layout by sorting or applying space-filling curves like Z-order or Hilbert curve. We won’t go into all the details about clustering here, but please refer to the [clustering section](https://hudi.apache.org/docs/clustering) for more details. 
+
+Clustering is one way to achieve file sizing so you can have faster queries. When you ingest data, you may still have a lot of small files (depending on your configurations and the data size from ingestion i.e., input batch). In this case, you will want to cluster all the small files to larger files to improve query performance. Clustering can be performed in different ways. Please check out the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 
+
+An example where clustering might be very useful is when a user has a Hudi table with many small files. Then, instead of waiting for multiple ingestion batches to gradually auto-size files, a user can use the clustering service to fix all the file sizes without ingesting any new data.
+
+:::note
+Clustering in Hudi is not a blocking operation, and ingestion can continue concurrently as long as no files need to be updated while the clustering service is running. The writes will fail if there are updates to the data being clustered while the clustering service runs.
+:::
+
+Here are the critical file sizing configurations:
+
+| Parameter Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `hoodie.clustering.plan.strategy.small.file.limit` | 314572800 (300MB) | Files smaller than the size in bytes specified here are candidates for clustering. | Clustering | 0.7.0 |
+| `target.file.max.bytes` |1073741824 (1GB) | This configures the target file size in bytes for clustering.| Clustering  | 0.7.0 |
+
+:::note
+Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will always create a newer version of the smaller file, resulting in 2 versions of the same file. The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.

Review Comment:
   use "storage" over "disk" consistently?



##########
website/docs/timeline.md:
##########
@@ -3,40 +3,386 @@ title: Timeline
 toc: true
 ---
 
-## Timeline
-At its core, Hudi maintains a `timeline` of all actions performed on the table at different `instants` of time that helps provide instantaneous views of the table,
-while also efficiently supporting retrieval of data in the order of arrival. A Hudi instant consists of the following components
+A Hudi table maintains all operations happened to the table in a single timeline comprised of two parts, an active timeline and an archived timeline. The active timeline stores all the recent instants, while the archived timeline stores the older instants. An instant is a transaction where all respective partitions within a base path have been successfully updated by either a writer or a table service. Instants that get older in the active timeline are moved to archived timeline at various times.

Review Comment:
   ```suggestion
   A Hudi table records all operations performed on a table in a single timeline comprised of two parts, an active timeline and an archived timeline. The active timeline stores all the recent instants, while the archived timeline stores the older instants. An instant is a transaction where all respective partitions within a base path have been successfully updated by either a writer or a table service. Instants that get older in the active timeline are moved to archived timeline at various times.
   ```



##########
website/docs/flink_configuration.md:
##########
@@ -3,115 +3,177 @@ title: Flink Setup
 toc: true
 ---
 
-## Global Configurations
-When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`
+[Apache Flink](https://flink.apache.org/what-is-flink/flink-architecture/) is a powerful streaming-batch integrated engine that provides a stream processing framework. Flink can process events at an incredible speed with low latency. Along with Hudi, you can use streaming ingestion and consumption with sources like Kafka; and also perform batch workloads like bulk ingest, snapshot queries and incremental queries. 
 
-### Parallelism
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `taskmanager.numberOfTaskSlots` | `1` | `Integer` | The number of parallel operator or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data |
-| `parallelism.default` | `1` | `Integer` | The default parallelism used when no parallelism is specified anywhere (default: 1). For example, If the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used |
+There are three execution modes a user can configure for Flink, and within each execution mode, users can use Flink SQL writing to configure their job options. The following section describes the necessary configs for different job conditions.   
 
-### Memory
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `jobmanager.memory.process.size` | `(none)` | `MemorySize` | Total Process Memory size for the JobManager. This includes all the memory that a JobManager JVM process consumes, consisting of Total Flink Memory, JVM Metaspace, and JVM Overhead |
-| `taskmanager.memory.task.heap.size` | `(none)` | `MemorySize` | Task Heap Memory size for TaskExecutors. This is the size of JVM heap memory reserved for write cache |
-| `taskmanager.memory.managed.size`  |  `(none)`  | `MemorySize` | Managed Memory size for TaskExecutors. This is the size of off-heap memory managed by the memory manager, reserved for sorting and RocksDB state backend. If you choose RocksDB as the state backend, you need to set this memory |
-
-### Checkpoint
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `execution.checkpointing.interval` | `(none)` | `Duration` | Setting this value as `execution.checkpointing.interval = 150000ms`, 150000ms = 2.5min. Configuring this parameter is equivalent to enabling the checkpoint |
-| `state.backend` | `(none)` | `String` | The state backend to be used to store state. We recommend setting store state as `rocksdb` : `state.backend: rocksdb`  |
-| `state.backend.rocksdb.localdir` | `(none)` | `String` | The local directory (on the TaskManager) where RocksDB puts its files |
-| `state.checkpoints.dir` | `(none)` | `String` | The default directory used for storing the data files and meta data of checkpoints in a Flink supported filesystem. The storage path must be accessible from all participating processes/nodes(i.e. all TaskManagers and JobManagers), like hdfs and oss path |
-| `state.backend.incremental`  |  `false`  | `Boolean` | Option whether the state backend should create incremental checkpoints, if possible. For an incremental checkpoint, only a diff from the previous checkpoint is stored, rather than the complete checkpoint state. If store state is setting as `rocksdb`, recommending to turn on |
-
-## Table Options
-
-Flink SQL jobs can be configured through options in the `WITH` clause.
-The actual datasource level configs are listed below.
-
-### Memory
-
-:::note
-When optimizing memory, we need to pay attention to the memory configuration
-and the number of taskManagers, parallelism of write tasks (write.tasks : 4) first. After confirm each write task to be
-allocated with enough memory, we can try to set these memory options.
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.task.max.size` | Maximum memory in MB for a write task, when the threshold hits, it flushes the max size data bucket to avoid OOM. Default `1024MB` | `1024D` | The memory reserved for write buffer is `write.task.max.size` - `compaction.max_memory`. When total buffer of write tasks reach the threshold, the largest buffer in the memory will be flushed |
-| `write.batch.size`  | In order to improve the efficiency of writing, Flink write task will cache data in buffer according to the write bucket until the memory reaches the threshold. When reached threshold, the data buffer would be flushed out. Default `64MB` | `64D` |  Recommend to use the default settings  |
-| `write.log_block.size` | The log writer of Hudi will not flush the data immediately after receiving data. The writer flush data to the disk in the unit of `LogBlock`. Before `LogBlock` reached threshold, records will be buffered in the writer in form of serialized bytes. Default `128MB`  | `128` |  Recommend to use the default settings  |
-| `write.merge.max_memory` | If write type is `COPY_ON_WRITE`, Hudi will merge the incremental data and base file data. The incremental data will be cached and spilled to disk. this threshold controls the max heap size that can be used. Default `100MB`  | `100` | Recommend to use the default settings |
-| `compaction.max_memory` | Same as `write.merge.max_memory`, but occurs during compaction. Default `100MB` | `100` | If it is online compaction, it can be turned up when resources are sufficient, such as setting as `1024MB` |
-
-### Parallelism
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.tasks` |  The parallelism of writer tasks. Each write task writes 1 to `N` buckets in sequence. Default `4` | `4` | Increases the parallelism has no effect on the number of small files |
-| `write.bucket_assign.tasks`  |  The parallelism of bucket assigner operators. No default value, using Flink `parallelism.default`  | [`parallelism.default`](#parallelism) |  Increases the parallelism also increases the number of buckets, thus the number of small files (small buckets)  |
-| `write.index_boostrap.tasks` |  The parallelism of index bootstrap. Increasing parallelism can speed up the efficiency of the bootstrap stage. The bootstrap stage will block checkpointing. Therefore, it is necessary to set more checkpoint failure tolerance times. Default using Flink `parallelism.default` | [`parallelism.default`](#parallelism) | It only take effect when `index.bootsrap.enabled` is `true` |
-| `read.tasks` | The parallelism of read operators (batch and stream). Default `4`  | `4` |  |
-| `compaction.tasks` | The parallelism of online compaction. Default `4` | `4` | `Online compaction` will occupy the resources of the write task. It is recommended to use [`offline compaction`](/docs/compaction/#flink-offline-compaction) |
-
-### Compaction
-
-:::note
-These are options only for `online compaction`.
-:::
-
-:::note
-Turn off online compaction by setting `compaction.async.enabled` = `false`, but we still recommend turning on `compaction.schedule.enable` for the writing job. You can then execute the compaction plan by [`offline compaction`](#offline-compaction).
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `compaction.schedule.enabled` | Whether to generate compaction plan periodically | `true` | Recommend to turn it on, even if `compaction.async.enabled` = `false` |
-| `compaction.async.enabled`  |  Async Compaction, enabled by default for MOR | `true` | Turn off `online compaction` by turning off this option |
-| `compaction.trigger.strategy`  | Strategy to trigger compaction | `num_commits` | Options are `num_commits`: trigger compaction when reach N delta commits; `time_elapsed`: trigger compaction when time elapsed > N seconds since last compaction; `num_and_time`: trigger compaction when both `NUM_COMMITS` and `TIME_ELAPSED` are satisfied; `num_or_time`: trigger compaction when `NUM_COMMITS` or `TIME_ELAPSED` is satisfied. |
-| `compaction.delta_commits` | Max delta commits needed to trigger compaction, default `5` commits | `5` | -- |
-| `compaction.delta_seconds`  |  Max delta seconds time needed to trigger compaction, default `1` hour | `3600` | -- |
-| `compaction.max_memory` | Max memory in MB for compaction spillable map, default `100MB` | `100` | If your have sufficient resources, recommend to adjust to `1024MB` |
-| `compaction.target_io`  |  Target IO per compaction (both read and write), default `500GB`| `512000` | -- |
-
-## Memory Optimization
-
-### MOR
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. If there is enough memory, `compaction.max_memory` can be set larger (`100MB` by default, and can be adjust to `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two streamWriteFunction, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as bucketAssignFunction) will also consume memory.
-4. Pay attention to the memory changes of compaction. `compaction.max_memory` controls the maximum memory that each task can be used when compaction tasks read
-   logs. `compaction.tasks` controls the parallelism of compaction tasks.
-
-### COW
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. Increase both `write.task.max.size` and `write.merge.max_memory` (`1024MB` and `100MB` by default, adjust to `2014MB` and `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two write tasks, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as `BucketAssignFunction`) will also consume memory.
-
-
-## Write Rate Limit
+## Configure Flink Execution Modes
+You can configure the execution mode via the `execution.runtime-mode` setting. There are three possible modes:
 
-In the existing data synchronization, `snapshot data` and `incremental data` are send to kafka first, and then streaming write
-to Hudi by Flink. Because the direct consumption of `snapshot data` will lead to problems such as high throughput and serious
-disorder (writing partition randomly), which will lead to write performance degradation and throughput glitches. At this time,
-the `write.rate.limit` option can be turned on to ensure smooth writing.
-
-### Options
-
-|  Option Name  | Required | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.rate.limit` | `false` | `0` | Turn off by default |
\ No newline at end of file
+- **STREAMING**: The classic DataStream execution mode. This is the default setting for the `StreamExecutionEnvironment`. 
+- **BATCH**: Batch-style execution on the DataStream API
+- **AUTOMATIC**: Let the system decide based on the boundedness of the sources
+
+You can configured the execution mode via the command line:
+
+```sh
+$ bin/flink run -Dexecution.runtime-mode=BATCH <jarFile>
+
+```
+
+Separately, you can programmatically create and configure the `StreamExecutionEnvironment`, a Flink programming API. This execution environment is how all data pipelines are created and maintained.
+
+You can configure the execution mode programmatically. Below is an example of how to set the `BATCH` mode.
+
+```sh
+StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
+env.setRuntimeMode(RuntimeExecutionMode.BATCH);
+```
+See the [Flink docs](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/) for more details.
+
+## Global Configurations
+
+The global configurations are used to tune Flink for throughput, memory management and/or checkpoints (disaster recovery i.e., data loss). Two of the most important global configurations for a Flink job are parallelism and memory. For a long-running job, the initial resource configuration is crucial because open-source Flink does not support auto-pilot yet, where you can automatically scale up or down resources when there’s high or low data ingestion. So, you might waste or underutilize resources. 
+
+All Hudi-specific parallelism and memory configurations depend on your Flink job resources.
+
+When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`.
+
+### Parallelism
+
+If your system has a lot of data to ingest, increasing the parallelism can improve throughput significantly. Hudi supplies flexible config options for specific operators, but at a high level, a default global parallelism can reduce the complexity of manual configuration. Try the default configuration and adjust as necessary. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `taskmanager.numberOfTaskSlots` | 1 | The is the number of parallel operators or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data | n/a | 0.9.0 |
+| `parallelism.default` | 1 | The is the default parallelism used when no parallelism is specified anywhere (default: 1). For example, if the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used | n/a | 0.9.0.|
+
+### Memory
+The `JobManager` and `TaskManager` memory configuration is very important for a Flink job to work smoothly. Below, we'll describe these configurations. 
+
+#### JobManager
+The JobManager handles all the instants coordination. It keeps an in-memory fs view for all the file handles on the filesystem within its embedded timeline server. We need to ensure enough memory is allocated to avoid OOM errors. The configs below allow you to allocate the necessary memory. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `jobmanager.memory.process.size` | -- |This is the total process memory size for the JobManager. This includes all the memory that a JobManager JVM process consumes: Total Flink Memory, JVM Metaspace, and JVM Overhead | n/a | 0.9.0
+
+
+#### TaskManager
+The TaskManager is a container for the writing and table service tasks. For regular Parquet file flushing, we need to allocate enough memory to read and write files. At the same time, there must be enough resources for  MOR table compaction because it’s memory intensive: we need to read and merge all the log files into an output Parquet file. Below are the configs you can set for the TaskManager to allocate enough memory for these services. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `taskmanager.memory.task.heap.size` | -- | This is the task heap memory size for TaskExecutors. This is the size of JVM heap memory reserved for write cache. | n/a | 0.9.0 |
+| `taskmanager.memory.managed.size` | -- | This is the managed memory size for TaskExecutors. This is the size of off-heap memory managed by the memory manager, reserved for sorting and RocksDB state backend. If you choose RocksDB as the state backend, you need to set this memory | n/a | 0.9.0 |
+
+#### Checkpoint

Review Comment:
   ```suggestion
   #### Checkpointing
   ```



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:

Review Comment:
   small files affect both queries and writes and orthogonal to data freshness/stale data.



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion

Review Comment:
   lets use "writes" over "ingestion" consistently, since writes can be ingest or a downstream ETL pipeline.



##########
website/docs/flink_configuration.md:
##########
@@ -3,115 +3,177 @@ title: Flink Setup
 toc: true
 ---
 
-## Global Configurations
-When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`
+[Apache Flink](https://flink.apache.org/what-is-flink/flink-architecture/) is a powerful streaming-batch integrated engine that provides a stream processing framework. Flink can process events at an incredible speed with low latency. Along with Hudi, you can use streaming ingestion and consumption with sources like Kafka; and also perform batch workloads like bulk ingest, snapshot queries and incremental queries. 
 
-### Parallelism
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `taskmanager.numberOfTaskSlots` | `1` | `Integer` | The number of parallel operator or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data |
-| `parallelism.default` | `1` | `Integer` | The default parallelism used when no parallelism is specified anywhere (default: 1). For example, If the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used |
+There are three execution modes a user can configure for Flink, and within each execution mode, users can use Flink SQL writing to configure their job options. The following section describes the necessary configs for different job conditions.   
 
-### Memory
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `jobmanager.memory.process.size` | `(none)` | `MemorySize` | Total Process Memory size for the JobManager. This includes all the memory that a JobManager JVM process consumes, consisting of Total Flink Memory, JVM Metaspace, and JVM Overhead |
-| `taskmanager.memory.task.heap.size` | `(none)` | `MemorySize` | Task Heap Memory size for TaskExecutors. This is the size of JVM heap memory reserved for write cache |
-| `taskmanager.memory.managed.size`  |  `(none)`  | `MemorySize` | Managed Memory size for TaskExecutors. This is the size of off-heap memory managed by the memory manager, reserved for sorting and RocksDB state backend. If you choose RocksDB as the state backend, you need to set this memory |
-
-### Checkpoint
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `execution.checkpointing.interval` | `(none)` | `Duration` | Setting this value as `execution.checkpointing.interval = 150000ms`, 150000ms = 2.5min. Configuring this parameter is equivalent to enabling the checkpoint |
-| `state.backend` | `(none)` | `String` | The state backend to be used to store state. We recommend setting store state as `rocksdb` : `state.backend: rocksdb`  |
-| `state.backend.rocksdb.localdir` | `(none)` | `String` | The local directory (on the TaskManager) where RocksDB puts its files |
-| `state.checkpoints.dir` | `(none)` | `String` | The default directory used for storing the data files and meta data of checkpoints in a Flink supported filesystem. The storage path must be accessible from all participating processes/nodes(i.e. all TaskManagers and JobManagers), like hdfs and oss path |
-| `state.backend.incremental`  |  `false`  | `Boolean` | Option whether the state backend should create incremental checkpoints, if possible. For an incremental checkpoint, only a diff from the previous checkpoint is stored, rather than the complete checkpoint state. If store state is setting as `rocksdb`, recommending to turn on |
-
-## Table Options
-
-Flink SQL jobs can be configured through options in the `WITH` clause.
-The actual datasource level configs are listed below.
-
-### Memory
-
-:::note
-When optimizing memory, we need to pay attention to the memory configuration
-and the number of taskManagers, parallelism of write tasks (write.tasks : 4) first. After confirm each write task to be
-allocated with enough memory, we can try to set these memory options.
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.task.max.size` | Maximum memory in MB for a write task, when the threshold hits, it flushes the max size data bucket to avoid OOM. Default `1024MB` | `1024D` | The memory reserved for write buffer is `write.task.max.size` - `compaction.max_memory`. When total buffer of write tasks reach the threshold, the largest buffer in the memory will be flushed |
-| `write.batch.size`  | In order to improve the efficiency of writing, Flink write task will cache data in buffer according to the write bucket until the memory reaches the threshold. When reached threshold, the data buffer would be flushed out. Default `64MB` | `64D` |  Recommend to use the default settings  |
-| `write.log_block.size` | The log writer of Hudi will not flush the data immediately after receiving data. The writer flush data to the disk in the unit of `LogBlock`. Before `LogBlock` reached threshold, records will be buffered in the writer in form of serialized bytes. Default `128MB`  | `128` |  Recommend to use the default settings  |
-| `write.merge.max_memory` | If write type is `COPY_ON_WRITE`, Hudi will merge the incremental data and base file data. The incremental data will be cached and spilled to disk. this threshold controls the max heap size that can be used. Default `100MB`  | `100` | Recommend to use the default settings |
-| `compaction.max_memory` | Same as `write.merge.max_memory`, but occurs during compaction. Default `100MB` | `100` | If it is online compaction, it can be turned up when resources are sufficient, such as setting as `1024MB` |
-
-### Parallelism
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.tasks` |  The parallelism of writer tasks. Each write task writes 1 to `N` buckets in sequence. Default `4` | `4` | Increases the parallelism has no effect on the number of small files |
-| `write.bucket_assign.tasks`  |  The parallelism of bucket assigner operators. No default value, using Flink `parallelism.default`  | [`parallelism.default`](#parallelism) |  Increases the parallelism also increases the number of buckets, thus the number of small files (small buckets)  |
-| `write.index_boostrap.tasks` |  The parallelism of index bootstrap. Increasing parallelism can speed up the efficiency of the bootstrap stage. The bootstrap stage will block checkpointing. Therefore, it is necessary to set more checkpoint failure tolerance times. Default using Flink `parallelism.default` | [`parallelism.default`](#parallelism) | It only take effect when `index.bootsrap.enabled` is `true` |
-| `read.tasks` | The parallelism of read operators (batch and stream). Default `4`  | `4` |  |
-| `compaction.tasks` | The parallelism of online compaction. Default `4` | `4` | `Online compaction` will occupy the resources of the write task. It is recommended to use [`offline compaction`](/docs/compaction/#flink-offline-compaction) |
-
-### Compaction
-
-:::note
-These are options only for `online compaction`.
-:::
-
-:::note
-Turn off online compaction by setting `compaction.async.enabled` = `false`, but we still recommend turning on `compaction.schedule.enable` for the writing job. You can then execute the compaction plan by [`offline compaction`](#offline-compaction).
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `compaction.schedule.enabled` | Whether to generate compaction plan periodically | `true` | Recommend to turn it on, even if `compaction.async.enabled` = `false` |
-| `compaction.async.enabled`  |  Async Compaction, enabled by default for MOR | `true` | Turn off `online compaction` by turning off this option |
-| `compaction.trigger.strategy`  | Strategy to trigger compaction | `num_commits` | Options are `num_commits`: trigger compaction when reach N delta commits; `time_elapsed`: trigger compaction when time elapsed > N seconds since last compaction; `num_and_time`: trigger compaction when both `NUM_COMMITS` and `TIME_ELAPSED` are satisfied; `num_or_time`: trigger compaction when `NUM_COMMITS` or `TIME_ELAPSED` is satisfied. |
-| `compaction.delta_commits` | Max delta commits needed to trigger compaction, default `5` commits | `5` | -- |
-| `compaction.delta_seconds`  |  Max delta seconds time needed to trigger compaction, default `1` hour | `3600` | -- |
-| `compaction.max_memory` | Max memory in MB for compaction spillable map, default `100MB` | `100` | If your have sufficient resources, recommend to adjust to `1024MB` |
-| `compaction.target_io`  |  Target IO per compaction (both read and write), default `500GB`| `512000` | -- |
-
-## Memory Optimization
-
-### MOR
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. If there is enough memory, `compaction.max_memory` can be set larger (`100MB` by default, and can be adjust to `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two streamWriteFunction, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as bucketAssignFunction) will also consume memory.
-4. Pay attention to the memory changes of compaction. `compaction.max_memory` controls the maximum memory that each task can be used when compaction tasks read
-   logs. `compaction.tasks` controls the parallelism of compaction tasks.
-
-### COW
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. Increase both `write.task.max.size` and `write.merge.max_memory` (`1024MB` and `100MB` by default, adjust to `2014MB` and `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two write tasks, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as `BucketAssignFunction`) will also consume memory.
-
-
-## Write Rate Limit
+## Configure Flink Execution Modes
+You can configure the execution mode via the `execution.runtime-mode` setting. There are three possible modes:
 
-In the existing data synchronization, `snapshot data` and `incremental data` are send to kafka first, and then streaming write
-to Hudi by Flink. Because the direct consumption of `snapshot data` will lead to problems such as high throughput and serious
-disorder (writing partition randomly), which will lead to write performance degradation and throughput glitches. At this time,
-the `write.rate.limit` option can be turned on to ensure smooth writing.
-
-### Options
-
-|  Option Name  | Required | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.rate.limit` | `false` | `0` | Turn off by default |
\ No newline at end of file
+- **STREAMING**: The classic DataStream execution mode. This is the default setting for the `StreamExecutionEnvironment`. 
+- **BATCH**: Batch-style execution on the DataStream API
+- **AUTOMATIC**: Let the system decide based on the boundedness of the sources
+
+You can configured the execution mode via the command line:
+
+```sh
+$ bin/flink run -Dexecution.runtime-mode=BATCH <jarFile>
+
+```
+
+Separately, you can programmatically create and configure the `StreamExecutionEnvironment`, a Flink programming API. This execution environment is how all data pipelines are created and maintained.
+
+You can configure the execution mode programmatically. Below is an example of how to set the `BATCH` mode.
+
+```sh
+StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
+env.setRuntimeMode(RuntimeExecutionMode.BATCH);
+```
+See the [Flink docs](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/) for more details.
+
+## Global Configurations
+
+The global configurations are used to tune Flink for throughput, memory management and/or checkpoints (disaster recovery i.e., data loss). Two of the most important global configurations for a Flink job are parallelism and memory. For a long-running job, the initial resource configuration is crucial because open-source Flink does not support auto-pilot yet, where you can automatically scale up or down resources when there’s high or low data ingestion. So, you might waste or underutilize resources. 
+
+All Hudi-specific parallelism and memory configurations depend on your Flink job resources.
+
+When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`.
+
+### Parallelism
+
+If your system has a lot of data to ingest, increasing the parallelism can improve throughput significantly. Hudi supplies flexible config options for specific operators, but at a high level, a default global parallelism can reduce the complexity of manual configuration. Try the default configuration and adjust as necessary. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `taskmanager.numberOfTaskSlots` | 1 | The is the number of parallel operators or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data | n/a | 0.9.0 |
+| `parallelism.default` | 1 | The is the default parallelism used when no parallelism is specified anywhere (default: 1). For example, if the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used | n/a | 0.9.0.|
+
+### Memory
+The `JobManager` and `TaskManager` memory configuration is very important for a Flink job to work smoothly. Below, we'll describe these configurations. 
+
+#### JobManager
+The JobManager handles all the instants coordination. It keeps an in-memory fs view for all the file handles on the filesystem within its embedded timeline server. We need to ensure enough memory is allocated to avoid OOM errors. The configs below allow you to allocate the necessary memory. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `jobmanager.memory.process.size` | -- |This is the total process memory size for the JobManager. This includes all the memory that a JobManager JVM process consumes: Total Flink Memory, JVM Metaspace, and JVM Overhead | n/a | 0.9.0
+
+
+#### TaskManager
+The TaskManager is a container for the writing and table service tasks. For regular Parquet file flushing, we need to allocate enough memory to read and write files. At the same time, there must be enough resources for  MOR table compaction because it’s memory intensive: we need to read and merge all the log files into an output Parquet file. Below are the configs you can set for the TaskManager to allocate enough memory for these services. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `taskmanager.memory.task.heap.size` | -- | This is the task heap memory size for TaskExecutors. This is the size of JVM heap memory reserved for write cache. | n/a | 0.9.0 |
+| `taskmanager.memory.managed.size` | -- | This is the managed memory size for TaskExecutors. This is the size of off-heap memory managed by the memory manager, reserved for sorting and RocksDB state backend. If you choose RocksDB as the state backend, you need to set this memory | n/a | 0.9.0 |
+
+#### Checkpoint
+Checkpoint is a disaster recovery mechanism for Flink. When a job fails, users can choose to recover the job from the latest checkpoint to keep the latest data correctness. To keep the transaction integrity, we flush the memory buffer into a Hudi table for persistence during the checkpointing lifecycle. It’s important to note the Hudi transaction cannot be committed without enabling the checkpoint config.
+
+Please read the Flink docs for [checkpoints](https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/checkpoints/) for more details. 

Review Comment:
   link to latest stable



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 

Review Comment:
   can we discuss limitations of clustering approach in the clustering section? Instead, we should call out here, that auto file sizing during write works only for UPSERT, INSERT operations and not BULK_INSERT. Need to also think about how we provide guidance for SQL DMLs from Spark/Flink. 



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 

Review Comment:
   >This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment.
   
   if you mean clustering is not supported with upserts, Idk if that's correct. They will need to retry if they conflict, but it works. 



##########
website/docs/flink_configuration.md:
##########
@@ -3,115 +3,177 @@ title: Flink Setup
 toc: true
 ---
 
-## Global Configurations
-When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`
+[Apache Flink](https://flink.apache.org/what-is-flink/flink-architecture/) is a powerful streaming-batch integrated engine that provides a stream processing framework. Flink can process events at an incredible speed with low latency. Along with Hudi, you can use streaming ingestion and consumption with sources like Kafka; and also perform batch workloads like bulk ingest, snapshot queries and incremental queries. 
 
-### Parallelism
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `taskmanager.numberOfTaskSlots` | `1` | `Integer` | The number of parallel operator or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data |
-| `parallelism.default` | `1` | `Integer` | The default parallelism used when no parallelism is specified anywhere (default: 1). For example, If the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used |
+There are three execution modes a user can configure for Flink, and within each execution mode, users can use Flink SQL writing to configure their job options. The following section describes the necessary configs for different job conditions.   
 
-### Memory
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `jobmanager.memory.process.size` | `(none)` | `MemorySize` | Total Process Memory size for the JobManager. This includes all the memory that a JobManager JVM process consumes, consisting of Total Flink Memory, JVM Metaspace, and JVM Overhead |
-| `taskmanager.memory.task.heap.size` | `(none)` | `MemorySize` | Task Heap Memory size for TaskExecutors. This is the size of JVM heap memory reserved for write cache |
-| `taskmanager.memory.managed.size`  |  `(none)`  | `MemorySize` | Managed Memory size for TaskExecutors. This is the size of off-heap memory managed by the memory manager, reserved for sorting and RocksDB state backend. If you choose RocksDB as the state backend, you need to set this memory |
-
-### Checkpoint
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `execution.checkpointing.interval` | `(none)` | `Duration` | Setting this value as `execution.checkpointing.interval = 150000ms`, 150000ms = 2.5min. Configuring this parameter is equivalent to enabling the checkpoint |
-| `state.backend` | `(none)` | `String` | The state backend to be used to store state. We recommend setting store state as `rocksdb` : `state.backend: rocksdb`  |
-| `state.backend.rocksdb.localdir` | `(none)` | `String` | The local directory (on the TaskManager) where RocksDB puts its files |
-| `state.checkpoints.dir` | `(none)` | `String` | The default directory used for storing the data files and meta data of checkpoints in a Flink supported filesystem. The storage path must be accessible from all participating processes/nodes(i.e. all TaskManagers and JobManagers), like hdfs and oss path |
-| `state.backend.incremental`  |  `false`  | `Boolean` | Option whether the state backend should create incremental checkpoints, if possible. For an incremental checkpoint, only a diff from the previous checkpoint is stored, rather than the complete checkpoint state. If store state is setting as `rocksdb`, recommending to turn on |
-
-## Table Options
-
-Flink SQL jobs can be configured through options in the `WITH` clause.
-The actual datasource level configs are listed below.
-
-### Memory
-
-:::note
-When optimizing memory, we need to pay attention to the memory configuration
-and the number of taskManagers, parallelism of write tasks (write.tasks : 4) first. After confirm each write task to be
-allocated with enough memory, we can try to set these memory options.
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.task.max.size` | Maximum memory in MB for a write task, when the threshold hits, it flushes the max size data bucket to avoid OOM. Default `1024MB` | `1024D` | The memory reserved for write buffer is `write.task.max.size` - `compaction.max_memory`. When total buffer of write tasks reach the threshold, the largest buffer in the memory will be flushed |
-| `write.batch.size`  | In order to improve the efficiency of writing, Flink write task will cache data in buffer according to the write bucket until the memory reaches the threshold. When reached threshold, the data buffer would be flushed out. Default `64MB` | `64D` |  Recommend to use the default settings  |
-| `write.log_block.size` | The log writer of Hudi will not flush the data immediately after receiving data. The writer flush data to the disk in the unit of `LogBlock`. Before `LogBlock` reached threshold, records will be buffered in the writer in form of serialized bytes. Default `128MB`  | `128` |  Recommend to use the default settings  |
-| `write.merge.max_memory` | If write type is `COPY_ON_WRITE`, Hudi will merge the incremental data and base file data. The incremental data will be cached and spilled to disk. this threshold controls the max heap size that can be used. Default `100MB`  | `100` | Recommend to use the default settings |
-| `compaction.max_memory` | Same as `write.merge.max_memory`, but occurs during compaction. Default `100MB` | `100` | If it is online compaction, it can be turned up when resources are sufficient, such as setting as `1024MB` |
-
-### Parallelism
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.tasks` |  The parallelism of writer tasks. Each write task writes 1 to `N` buckets in sequence. Default `4` | `4` | Increases the parallelism has no effect on the number of small files |
-| `write.bucket_assign.tasks`  |  The parallelism of bucket assigner operators. No default value, using Flink `parallelism.default`  | [`parallelism.default`](#parallelism) |  Increases the parallelism also increases the number of buckets, thus the number of small files (small buckets)  |
-| `write.index_boostrap.tasks` |  The parallelism of index bootstrap. Increasing parallelism can speed up the efficiency of the bootstrap stage. The bootstrap stage will block checkpointing. Therefore, it is necessary to set more checkpoint failure tolerance times. Default using Flink `parallelism.default` | [`parallelism.default`](#parallelism) | It only take effect when `index.bootsrap.enabled` is `true` |
-| `read.tasks` | The parallelism of read operators (batch and stream). Default `4`  | `4` |  |
-| `compaction.tasks` | The parallelism of online compaction. Default `4` | `4` | `Online compaction` will occupy the resources of the write task. It is recommended to use [`offline compaction`](/docs/compaction/#flink-offline-compaction) |
-
-### Compaction
-
-:::note
-These are options only for `online compaction`.
-:::
-
-:::note
-Turn off online compaction by setting `compaction.async.enabled` = `false`, but we still recommend turning on `compaction.schedule.enable` for the writing job. You can then execute the compaction plan by [`offline compaction`](#offline-compaction).
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `compaction.schedule.enabled` | Whether to generate compaction plan periodically | `true` | Recommend to turn it on, even if `compaction.async.enabled` = `false` |
-| `compaction.async.enabled`  |  Async Compaction, enabled by default for MOR | `true` | Turn off `online compaction` by turning off this option |
-| `compaction.trigger.strategy`  | Strategy to trigger compaction | `num_commits` | Options are `num_commits`: trigger compaction when reach N delta commits; `time_elapsed`: trigger compaction when time elapsed > N seconds since last compaction; `num_and_time`: trigger compaction when both `NUM_COMMITS` and `TIME_ELAPSED` are satisfied; `num_or_time`: trigger compaction when `NUM_COMMITS` or `TIME_ELAPSED` is satisfied. |
-| `compaction.delta_commits` | Max delta commits needed to trigger compaction, default `5` commits | `5` | -- |
-| `compaction.delta_seconds`  |  Max delta seconds time needed to trigger compaction, default `1` hour | `3600` | -- |
-| `compaction.max_memory` | Max memory in MB for compaction spillable map, default `100MB` | `100` | If your have sufficient resources, recommend to adjust to `1024MB` |
-| `compaction.target_io`  |  Target IO per compaction (both read and write), default `500GB`| `512000` | -- |
-
-## Memory Optimization
-
-### MOR
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. If there is enough memory, `compaction.max_memory` can be set larger (`100MB` by default, and can be adjust to `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two streamWriteFunction, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as bucketAssignFunction) will also consume memory.
-4. Pay attention to the memory changes of compaction. `compaction.max_memory` controls the maximum memory that each task can be used when compaction tasks read
-   logs. `compaction.tasks` controls the parallelism of compaction tasks.
-
-### COW
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. Increase both `write.task.max.size` and `write.merge.max_memory` (`1024MB` and `100MB` by default, adjust to `2014MB` and `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two write tasks, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as `BucketAssignFunction`) will also consume memory.
-
-
-## Write Rate Limit
+## Configure Flink Execution Modes
+You can configure the execution mode via the `execution.runtime-mode` setting. There are three possible modes:
 
-In the existing data synchronization, `snapshot data` and `incremental data` are send to kafka first, and then streaming write
-to Hudi by Flink. Because the direct consumption of `snapshot data` will lead to problems such as high throughput and serious
-disorder (writing partition randomly), which will lead to write performance degradation and throughput glitches. At this time,
-the `write.rate.limit` option can be turned on to ensure smooth writing.
-
-### Options
-
-|  Option Name  | Required | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.rate.limit` | `false` | `0` | Turn off by default |
\ No newline at end of file
+- **STREAMING**: The classic DataStream execution mode. This is the default setting for the `StreamExecutionEnvironment`. 
+- **BATCH**: Batch-style execution on the DataStream API
+- **AUTOMATIC**: Let the system decide based on the boundedness of the sources
+
+You can configured the execution mode via the command line:
+
+```sh
+$ bin/flink run -Dexecution.runtime-mode=BATCH <jarFile>
+
+```
+
+Separately, you can programmatically create and configure the `StreamExecutionEnvironment`, a Flink programming API. This execution environment is how all data pipelines are created and maintained.
+
+You can configure the execution mode programmatically. Below is an example of how to set the `BATCH` mode.
+
+```sh
+StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
+env.setRuntimeMode(RuntimeExecutionMode.BATCH);
+```
+See the [Flink docs](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/) for more details.
+
+## Global Configurations
+
+The global configurations are used to tune Flink for throughput, memory management and/or checkpoints (disaster recovery i.e., data loss). Two of the most important global configurations for a Flink job are parallelism and memory. For a long-running job, the initial resource configuration is crucial because open-source Flink does not support auto-pilot yet, where you can automatically scale up or down resources when there’s high or low data ingestion. So, you might waste or underutilize resources. 
+
+All Hudi-specific parallelism and memory configurations depend on your Flink job resources.
+
+When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`.
+
+### Parallelism
+
+If your system has a lot of data to ingest, increasing the parallelism can improve throughput significantly. Hudi supplies flexible config options for specific operators, but at a high level, a default global parallelism can reduce the complexity of manual configuration. Try the default configuration and adjust as necessary. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `taskmanager.numberOfTaskSlots` | 1 | The is the number of parallel operators or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data | n/a | 0.9.0 |
+| `parallelism.default` | 1 | The is the default parallelism used when no parallelism is specified anywhere (default: 1). For example, if the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used | n/a | 0.9.0.|
+
+### Memory
+The `JobManager` and `TaskManager` memory configuration is very important for a Flink job to work smoothly. Below, we'll describe these configurations. 
+
+#### JobManager
+The JobManager handles all the instants coordination. It keeps an in-memory fs view for all the file handles on the filesystem within its embedded timeline server. We need to ensure enough memory is allocated to avoid OOM errors. The configs below allow you to allocate the necessary memory. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `jobmanager.memory.process.size` | -- |This is the total process memory size for the JobManager. This includes all the memory that a JobManager JVM process consumes: Total Flink Memory, JVM Metaspace, and JVM Overhead | n/a | 0.9.0
+
+
+#### TaskManager
+The TaskManager is a container for the writing and table service tasks. For regular Parquet file flushing, we need to allocate enough memory to read and write files. At the same time, there must be enough resources for  MOR table compaction because it’s memory intensive: we need to read and merge all the log files into an output Parquet file. Below are the configs you can set for the TaskManager to allocate enough memory for these services. 

Review Comment:
   lets move away from parquet to base



##########
website/docs/file_sizing.md:
##########
@@ -2,52 +2,90 @@
 title: "File Sizing"
 toc: true
 ---
+One of the fundamental problems in data lakes during writing is having a lot of small files. This is also known as a small file problem. If you don’t size the files appropriately, you can slow down the query performance and work with stale analytics. Some of the issues you may encounter with small files include the following:
 
-This doc will show you how Apache Hudi overcomes the dreaded small files problem. A key design decision in Hudi was to 
-avoid creating small files in the first place and always write properly sized files. 
-There are 2 ways to manage small files in Hudi and below will describe the advantages and trade-offs of each.
-
-## Auto-Size During ingestion
-
-You can automatically manage size of files during ingestion. This solution adds a little latency during ingestion, but
-it ensures that read queries are always efficient as soon as a write is committed. If you don't 
-manage file sizing as you write and instead try to periodically run a file-sizing clean-up, your queries will be slow until that resize cleanup is periodically performed.
- 
-(Note: [bulk_insert](/docs/next/write_operations) write operation does not provide auto-sizing during ingestion)
-
-### For Copy-On-Write 
-This is as simple as configuring the [maximum size for a base/parquet file](/docs/configurations#hoodieparquetmaxfilesize) 
-and the [soft limit](/docs/configurations#hoodieparquetsmallfilelimit) below which a file should 
-be considered a small file. For the initial bootstrap of a Hudi table, tuning record size estimate is also important to 
-ensure sufficient records are bin-packed in a parquet file. For subsequent writes, Hudi automatically uses average 
-record size based on previous commit. Hudi will try to add enough records to a small file at write time to get it to the 
-configured maximum limit. For e.g , with `compactionSmallFileSize=100MB` and limitFileSize=120MB, Hudi will pick all 
-files < 100MB and try to get them upto 120MB.
-
-### For Merge-On-Read 
-MergeOnRead works differently for different INDEX choices so there are few more configs to set:  
-
-- Indexes with **canIndexLogFiles = true** : Inserts of new data go directly to log files. In this case, you can 
-configure the [maximum log size](/docs/configurations#hoodielogfilemaxsize) and a 
-[factor](/docs/configurations#hoodielogfiletoparquetcompressionratio) that denotes reduction in 
-size when data moves from avro to parquet files.
-- Indexes with **canIndexLogFiles = false** : Inserts of new data go only to parquet files. In this case, the 
-same configurations as above for the COPY_ON_WRITE case applies.
-
-NOTE : In either case, small files will be auto sized only if there is no PENDING compaction or associated log file for 
-that particular file slice. For example, for case 1: If you had a log file and a compaction C1 was scheduled to convert 
-that log file to parquet, no more inserts can go into that log file. For case 2: If you had a parquet file and an update 
-ended up creating an associated delta log file, no more inserts can go into that parquet file. Only after the compaction 
-has been performed and there are NO log files associated with the base parquet file, can new inserts be sent to auto size that parquet file.
-
-## Auto-Size With Clustering
-**[Clustering](/docs/next/clustering)** is a feature in Hudi to group 
-small files into larger ones either synchronously or asynchronously. Since first solution of auto-sizing small files has 
-a tradeoff on ingestion speed (since the small files are sized during ingestion), if your use-case is very sensitive to 
-ingestion latency where you don't want to compromise on ingestion speed which may end up creating a lot of small files, 
-clustering comes to the rescue. Clustering can be scheduled through the ingestion job and an asynchronus job can stitch 
-small files together in the background to generate larger files. NOTE that during this, ingestion can continue to run concurrently.
-
-*Please note that Hudi always creates immutable files on disk. To be able to do auto-sizing or clustering, Hudi will 
-always create a newer version of the smaller file, resulting in 2 versions of the same file. 
-The [cleaner service](/docs/next/hoodie_cleaner) will later kick in and delete the older version small file and keep the latest one.*
\ No newline at end of file
+- **Reads slow down**: You’ll have to scan through many small files to retrieve data for a query. It’s a very inefficient way of accessing and utilizing the data. Also, cloud storage, like S3, enforces rate-limiting on how many requests can be processed per second per prefix in a bucket. For a higher number of files, i.e., at least one request per file regardless of the file size, increases the chance of encountering the rate-limiting. This causes the reader to slow down.
+
+- **Processes slow down**: You can slow down your i.e., Spark or Hive jobs; the more files you have, the more tasks you create.
+
+- **Storage use inefficiencies**: When working with a lot of data, you can be inefficient in using your storage. For example, many small files can have a lower compression ratio, leading to more data on storage. If you’re indexing the data, that also takes up more storage space inside the Parquet files. If you’re working with a small amount of data, you might not see a significant impact with storage. However, when dealing with petabyte and exabyte data, you’ll need to be efficient in managing storage resources.
+
+All these challenges inevitably lead to stale analytics and scalability challenges:
+- Query performance slows down.
+- Jobs could run slower.
+- You utilize more resources. 
+
+A critical design decision in the Hudi architecture is to avoid small file creation. Hudi is uniquely designed to write appropriately sized files automatically. This document will show you how Apache Hudi overcomes the dreaded small files problem. There are two ways to manage small files in Hudi: 
+
+- Auto-size during ingestion
+- Clustering
+
+Below, we will describe the advantages and trade-offs of each.
+
+## Auto-sizing during ingestion
+
+You can manage file sizes through Hudi’s auto-sizing capability during ingestion. The default targeted file size for Parquet base files is 120MB, which can be configured by `hoodie.parquet.max.file.size`. Auto-sizing may add some data latency, but it ensures that the read queries are always efficient as soon as a write transaction is committed. It’s important to note that if you don’t manage file sizing as you write and, instead, try to run clustering to fix your file sizing periodically, your queries might be slow until the point when the clustering finishes. This is only supported for **APPEND** use cases only; **UPSERTS** are not supported at the moment. Please refer to the [clustering documentation](https://hudi.apache.org/docs/clustering) for more details. 
+
+:::note 
+the bulk_insert write operation does not have auto-sizing capabilities during ingestion

Review Comment:
   While bulk_insert does not have auto-sizing, there are knobs in the various sort modes to control file sizing. I am sure there is either a blog on this or a PR open for the blog. lets include that a third section may be



##########
website/docs/flink_configuration.md:
##########
@@ -3,115 +3,177 @@ title: Flink Setup
 toc: true
 ---
 
-## Global Configurations
-When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`
+[Apache Flink](https://flink.apache.org/what-is-flink/flink-architecture/) is a powerful streaming-batch integrated engine that provides a stream processing framework. Flink can process events at an incredible speed with low latency. Along with Hudi, you can use streaming ingestion and consumption with sources like Kafka; and also perform batch workloads like bulk ingest, snapshot queries and incremental queries. 
 
-### Parallelism
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `taskmanager.numberOfTaskSlots` | `1` | `Integer` | The number of parallel operator or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data |
-| `parallelism.default` | `1` | `Integer` | The default parallelism used when no parallelism is specified anywhere (default: 1). For example, If the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used |
+There are three execution modes a user can configure for Flink, and within each execution mode, users can use Flink SQL writing to configure their job options. The following section describes the necessary configs for different job conditions.   
 
-### Memory
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `jobmanager.memory.process.size` | `(none)` | `MemorySize` | Total Process Memory size for the JobManager. This includes all the memory that a JobManager JVM process consumes, consisting of Total Flink Memory, JVM Metaspace, and JVM Overhead |
-| `taskmanager.memory.task.heap.size` | `(none)` | `MemorySize` | Task Heap Memory size for TaskExecutors. This is the size of JVM heap memory reserved for write cache |
-| `taskmanager.memory.managed.size`  |  `(none)`  | `MemorySize` | Managed Memory size for TaskExecutors. This is the size of off-heap memory managed by the memory manager, reserved for sorting and RocksDB state backend. If you choose RocksDB as the state backend, you need to set this memory |
-
-### Checkpoint
-
-|  Option Name  | Default | Type | Description |
-|  -----------  | -------  | ------- | ------- |
-| `execution.checkpointing.interval` | `(none)` | `Duration` | Setting this value as `execution.checkpointing.interval = 150000ms`, 150000ms = 2.5min. Configuring this parameter is equivalent to enabling the checkpoint |
-| `state.backend` | `(none)` | `String` | The state backend to be used to store state. We recommend setting store state as `rocksdb` : `state.backend: rocksdb`  |
-| `state.backend.rocksdb.localdir` | `(none)` | `String` | The local directory (on the TaskManager) where RocksDB puts its files |
-| `state.checkpoints.dir` | `(none)` | `String` | The default directory used for storing the data files and meta data of checkpoints in a Flink supported filesystem. The storage path must be accessible from all participating processes/nodes(i.e. all TaskManagers and JobManagers), like hdfs and oss path |
-| `state.backend.incremental`  |  `false`  | `Boolean` | Option whether the state backend should create incremental checkpoints, if possible. For an incremental checkpoint, only a diff from the previous checkpoint is stored, rather than the complete checkpoint state. If store state is setting as `rocksdb`, recommending to turn on |
-
-## Table Options
-
-Flink SQL jobs can be configured through options in the `WITH` clause.
-The actual datasource level configs are listed below.
-
-### Memory
-
-:::note
-When optimizing memory, we need to pay attention to the memory configuration
-and the number of taskManagers, parallelism of write tasks (write.tasks : 4) first. After confirm each write task to be
-allocated with enough memory, we can try to set these memory options.
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.task.max.size` | Maximum memory in MB for a write task, when the threshold hits, it flushes the max size data bucket to avoid OOM. Default `1024MB` | `1024D` | The memory reserved for write buffer is `write.task.max.size` - `compaction.max_memory`. When total buffer of write tasks reach the threshold, the largest buffer in the memory will be flushed |
-| `write.batch.size`  | In order to improve the efficiency of writing, Flink write task will cache data in buffer according to the write bucket until the memory reaches the threshold. When reached threshold, the data buffer would be flushed out. Default `64MB` | `64D` |  Recommend to use the default settings  |
-| `write.log_block.size` | The log writer of Hudi will not flush the data immediately after receiving data. The writer flush data to the disk in the unit of `LogBlock`. Before `LogBlock` reached threshold, records will be buffered in the writer in form of serialized bytes. Default `128MB`  | `128` |  Recommend to use the default settings  |
-| `write.merge.max_memory` | If write type is `COPY_ON_WRITE`, Hudi will merge the incremental data and base file data. The incremental data will be cached and spilled to disk. this threshold controls the max heap size that can be used. Default `100MB`  | `100` | Recommend to use the default settings |
-| `compaction.max_memory` | Same as `write.merge.max_memory`, but occurs during compaction. Default `100MB` | `100` | If it is online compaction, it can be turned up when resources are sufficient, such as setting as `1024MB` |
-
-### Parallelism
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.tasks` |  The parallelism of writer tasks. Each write task writes 1 to `N` buckets in sequence. Default `4` | `4` | Increases the parallelism has no effect on the number of small files |
-| `write.bucket_assign.tasks`  |  The parallelism of bucket assigner operators. No default value, using Flink `parallelism.default`  | [`parallelism.default`](#parallelism) |  Increases the parallelism also increases the number of buckets, thus the number of small files (small buckets)  |
-| `write.index_boostrap.tasks` |  The parallelism of index bootstrap. Increasing parallelism can speed up the efficiency of the bootstrap stage. The bootstrap stage will block checkpointing. Therefore, it is necessary to set more checkpoint failure tolerance times. Default using Flink `parallelism.default` | [`parallelism.default`](#parallelism) | It only take effect when `index.bootsrap.enabled` is `true` |
-| `read.tasks` | The parallelism of read operators (batch and stream). Default `4`  | `4` |  |
-| `compaction.tasks` | The parallelism of online compaction. Default `4` | `4` | `Online compaction` will occupy the resources of the write task. It is recommended to use [`offline compaction`](/docs/compaction/#flink-offline-compaction) |
-
-### Compaction
-
-:::note
-These are options only for `online compaction`.
-:::
-
-:::note
-Turn off online compaction by setting `compaction.async.enabled` = `false`, but we still recommend turning on `compaction.schedule.enable` for the writing job. You can then execute the compaction plan by [`offline compaction`](#offline-compaction).
-:::
-
-|  Option Name  | Description | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `compaction.schedule.enabled` | Whether to generate compaction plan periodically | `true` | Recommend to turn it on, even if `compaction.async.enabled` = `false` |
-| `compaction.async.enabled`  |  Async Compaction, enabled by default for MOR | `true` | Turn off `online compaction` by turning off this option |
-| `compaction.trigger.strategy`  | Strategy to trigger compaction | `num_commits` | Options are `num_commits`: trigger compaction when reach N delta commits; `time_elapsed`: trigger compaction when time elapsed > N seconds since last compaction; `num_and_time`: trigger compaction when both `NUM_COMMITS` and `TIME_ELAPSED` are satisfied; `num_or_time`: trigger compaction when `NUM_COMMITS` or `TIME_ELAPSED` is satisfied. |
-| `compaction.delta_commits` | Max delta commits needed to trigger compaction, default `5` commits | `5` | -- |
-| `compaction.delta_seconds`  |  Max delta seconds time needed to trigger compaction, default `1` hour | `3600` | -- |
-| `compaction.max_memory` | Max memory in MB for compaction spillable map, default `100MB` | `100` | If your have sufficient resources, recommend to adjust to `1024MB` |
-| `compaction.target_io`  |  Target IO per compaction (both read and write), default `500GB`| `512000` | -- |
-
-## Memory Optimization
-
-### MOR
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. If there is enough memory, `compaction.max_memory` can be set larger (`100MB` by default, and can be adjust to `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two streamWriteFunction, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as bucketAssignFunction) will also consume memory.
-4. Pay attention to the memory changes of compaction. `compaction.max_memory` controls the maximum memory that each task can be used when compaction tasks read
-   logs. `compaction.tasks` controls the parallelism of compaction tasks.
-
-### COW
-
-1. [Setting Flink state backend to `rocksdb`](#checkpoint) (the default `in memory` state backend is very memory intensive).
-2. Increase both `write.task.max.size` and `write.merge.max_memory` (`1024MB` and `100MB` by default, adjust to `2014MB` and `1024MB`).
-3. Pay attention to the memory allocated to each write task by taskManager to ensure that each write task can be allocated to the
-   desired memory size `write.task.max.size`. For example, taskManager has `4GB` of memory running two write tasks, so each write task
-   can be allocated with `2GB` memory. Please reserve some buffers because the network buffer and other types of tasks on taskManager (such as `BucketAssignFunction`) will also consume memory.
-
-
-## Write Rate Limit
+## Configure Flink Execution Modes
+You can configure the execution mode via the `execution.runtime-mode` setting. There are three possible modes:
 
-In the existing data synchronization, `snapshot data` and `incremental data` are send to kafka first, and then streaming write
-to Hudi by Flink. Because the direct consumption of `snapshot data` will lead to problems such as high throughput and serious
-disorder (writing partition randomly), which will lead to write performance degradation and throughput glitches. At this time,
-the `write.rate.limit` option can be turned on to ensure smooth writing.
-
-### Options
-
-|  Option Name  | Required | Default | Remarks |
-|  -----------  | -------  | ------- | ------- |
-| `write.rate.limit` | `false` | `0` | Turn off by default |
\ No newline at end of file
+- **STREAMING**: The classic DataStream execution mode. This is the default setting for the `StreamExecutionEnvironment`. 
+- **BATCH**: Batch-style execution on the DataStream API
+- **AUTOMATIC**: Let the system decide based on the boundedness of the sources
+
+You can configured the execution mode via the command line:
+
+```sh
+$ bin/flink run -Dexecution.runtime-mode=BATCH <jarFile>
+
+```
+
+Separately, you can programmatically create and configure the `StreamExecutionEnvironment`, a Flink programming API. This execution environment is how all data pipelines are created and maintained.
+
+You can configure the execution mode programmatically. Below is an example of how to set the `BATCH` mode.
+
+```sh
+StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
+env.setRuntimeMode(RuntimeExecutionMode.BATCH);
+```
+See the [Flink docs](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/) for more details.
+
+## Global Configurations
+
+The global configurations are used to tune Flink for throughput, memory management and/or checkpoints (disaster recovery i.e., data loss). Two of the most important global configurations for a Flink job are parallelism and memory. For a long-running job, the initial resource configuration is crucial because open-source Flink does not support auto-pilot yet, where you can automatically scale up or down resources when there’s high or low data ingestion. So, you might waste or underutilize resources. 
+
+All Hudi-specific parallelism and memory configurations depend on your Flink job resources.
+
+When using Flink, you can set some global configurations in `$FLINK_HOME/conf/flink-conf.yaml`.
+
+### Parallelism
+
+If your system has a lot of data to ingest, increasing the parallelism can improve throughput significantly. Hudi supplies flexible config options for specific operators, but at a high level, a default global parallelism can reduce the complexity of manual configuration. Try the default configuration and adjust as necessary. 
+
+| Property Name | Default  | Description | Scope | Since Version                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| `taskmanager.numberOfTaskSlots` | 1 | The is the number of parallel operators or user function instances that a single TaskManager can run. We recommend setting this value > 4, and the actual value needs to be set according to the amount of data | n/a | 0.9.0 |
+| `parallelism.default` | 1 | The is the default parallelism used when no parallelism is specified anywhere (default: 1). For example, if the value of [`write.bucket_assign.tasks`](#parallelism-1) is not set, this value will be used | n/a | 0.9.0.|
+
+### Memory
+The `JobManager` and `TaskManager` memory configuration is very important for a Flink job to work smoothly. Below, we'll describe these configurations. 
+
+#### JobManager
+The JobManager handles all the instants coordination. It keeps an in-memory fs view for all the file handles on the filesystem within its embedded timeline server. We need to ensure enough memory is allocated to avoid OOM errors. The configs below allow you to allocate the necessary memory. 

Review Comment:
   "instants coordination" is a bit unclear



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org