You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by vi...@apache.org on 2021/08/27 15:32:56 UTC

[hudi] branch asf-site updated: [HUDI-2347] Blog on improving marker mechanism (#3527)

This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new ba398fd  [HUDI-2347] Blog on improving marker mechanism (#3527)
ba398fd is described below

commit ba398fd2163d1dd7c1ee6c0ba8fc6ab57f8f31f9
Author: Y Ethan Guo <et...@gmail.com>
AuthorDate: Fri Aug 27 08:32:45 2021 -0700

    [HUDI-2347] Blog on improving marker mechanism (#3527)
    
    * [HUDI-2347] Blog on improving marker mechanism
    
    * Edits from Code Review
     - Small word-smithing
     - Replace "file system" with "storage"
    
    Co-authored-by: Vinoth Chandar <vi...@apache.org>
---
 .../blog/2021-08-18-improving-marker-mechanism.md  |  72 +++++++++++++++++++++
 .../marker-mechanism/batched-marker-creation.png   | Bin 0 -> 244371 bytes
 .../direct-marker-file-mechanism.png               | Bin 0 -> 282614 bytes
 .../timeline-server-based-marker-mechanism.png     | Bin 0 -> 367246 bytes
 4 files changed, 72 insertions(+)

diff --git a/website/blog/2021-08-18-improving-marker-mechanism.md b/website/blog/2021-08-18-improving-marker-mechanism.md
new file mode 100644
index 0000000..e9b4021
--- /dev/null
+++ b/website/blog/2021-08-18-improving-marker-mechanism.md
@@ -0,0 +1,72 @@
+---
+title: "Improving Marker Mechanism in Apache Hudi"
+excerpt: "We introduce a new marker mechanism leveraging the timeline server to address performance bottlenecks due to rate-limiting on cloud storage like AWS S3."
+author: yihua
+category: blog
+---
+
+Hudi supports fully automatic cleanup of uncommitted data on storage during its write operations. Write operations in an Apache Hudi table use markers to efficiently track the data files written to storage. In this blog, we dive into the design of the existing direct marker file mechanism and explain its performance problems on cloud storage like AWS S3 for 
+very large writes. We demonstrate how we improve write performance with introduction of timeline-server-based markers.
+
+<!--truncate-->
+
+## Need for Markers during Write Operations
+ 
+A **marker** in Hudi, such as a marker file with a unique filename, is a label to indicate that a corresponding data file exists in storage, which then Hudi
+uses to automatically clean up uncommitted data during failure and rollback scenarios. Each marker entry is composed of three parts, the data file name, 
+the marker extension (`.marker`), and the I/O operation created the file (`CREATE` - inserts, `MERGE` - updates/deletes, or `APPEND` - either). For example, the marker `91245ce3-bb82-4f9f-969e-343364159174-0_140-579-0_20210820173605.parquet.marker.CREATE` indicates 
+that the corresponding data file is `91245ce3-bb82-4f9f-969e-343364159174-0_140-579-0_20210820173605.parquet` and the I/O type is `CREATE`. Before writing each data file, the Hudi write client creates a marker first in storage, which is persistent until they are explicitly deleted 
+by the write client after a commit is successful.
+
+The markers are useful for efficiently carrying out different operations by the write client. Two important operations use markers to find uncommitted data files of interest efficiently, instead of scanning the whole Hudi table:
+  - **Removing duplicate/partial data files**: in Spark, the Hudi write client delegates the data file writing to multiple executors.  One executor can fail the task, leaving partial data files written, and Spark retries the task in this case until it succeeds. When speculative execution is enabled, there can also be multiple successful attempts at writing out the same data into different files, only one of which is finally handed to the Spark driver process for committing. The markers h [...]
+  - **Rolling back failed commits**: the write operation can fail in the middle, leaving some data files written in storage.  In this case, the marker entries stay in storage as the commit is failed.  In the next write operation, the write client first rolls back the failed commits, by identifying the data files written in these commits through the markers and deleting them.
+
+Next, we dive into the existing marker mechanism, explain its performance problem, and demonstrate the new timeline-server-based marker mechanism to address the problem.
+
+## Existing Direct Marker Mechanism and its limitations
+
+The **existing marker mechanism** simply creates a new marker file corresponding to each data file, with the marker filename as described above.  Each marker file is written to storage in the same directory hierarchy, i.e., commit instant and partition path, under a temporary folder `.hoodie/.temp` under the base path of the Hudi table.  For example, the figure below shows one example of the marker files created and the corresponding data files when writing data to the Hudi table.  When  [...]
+
+![An example of marker and data files in direct marker file mechanism](/assets/images/blog/marker-mechanism/direct-marker-file-mechanism.png)
+
+While it's much efficient over scanning the entire table for uncommitted data files, as the number of data files to write increases, so does the number of marker files to create. This can create performance bottlenecks for cloud storage such as AWS S3.  In AWS S3, each file create and delete call triggers an HTTP request and there is [rate-limiting](https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html) on how many requests can be processed per second per pref [...]
+
+## Timeline-server-based marker mechanism improving write performance
+
+To address the performance bottleneck due to rate-limiting of AWS S3 explained above, we introduce a **new marker mechanism leveraging the timeline server**, which optimizes the marker-related latency for storage with non-trivial file I/O latency.  The **timeline server** in Hudi serves as a centralized place for providing the file system and timeline views. As shown below, the new timeline-server-based marker mechanism delegates the marker creation and other marker-related operations fr [...]
+
+![Timeline-server-based marker mechanism](/assets/images/blog/marker-mechanism/timeline-server-based-marker-mechanism.png)
+
+To improve the efficiency of processing marker creation requests, we design the batched handling of marker requests at the timeline server. Each marker creation request is handled asynchronously in the Javalin timeline server and queued before processing. For every batch interval, e.g., 20ms, a dispatching thread pulls the pending requests from the queue and sends them to the worker thread for processing. Each worker thread processes the marker creation requests, sets the responses, and  [...]
+
+![Batched processing of marker creation requests](/assets/images/blog/marker-mechanism/batched-marker-creation.png)
+
+
+Note that the worker thread always checks whether the marker has already been created by comparing the marker name from the request with the memory copy of all markers maintained at the timeline server. The underlying files storing the markers are only read upon the first marker request (lazy loading).  The responses of requests are only sent back once the new markers are flushed to the files, so that in the case of the timeline server failure, the timeline server can recover the already [...]
+
+## Marker-related write options
+
+We introduce the following new marker-related write options in `0.9.0` release, to configure the marker mechanism.
+
+| Property Name |   Default   |     Meaning    |        
+| ------------- | ----------- | :-------------:| 
+| `hoodie.write.markers.type`     | direct | Marker type to use.  Two modes are supported: (1) `direct`: individual marker file corresponding to each data file is directly created by the writer; (2) `timeline_server_based`: marker operations are all handled at the timeline service which serves as a proxy.  New marker entries are batch processed and stored in a limited number of underlying files for efficiency. |
+| `hoodie.markers.timeline_server_based.batch.num_threads`     | 20 | Number of threads to use for batch processing marker creation requests at the timeline server. | 
+| `hoodie.markers.timeline_server_based.batch.interval_ms` | 50 | The batch interval in milliseconds for marker creation batch processing. |
+
+## Performance
+
+We evaluate the write performance over both direct and timeline-server-based marker mechanisms by bulk-inserting a large dataset using Amazon EMR with Spark and S3. The input data is around 100GB.  We configure the write operation to generate a large number of data files concurrently by setting the max parquet file size to be 1MB and parallelism to be 240. As we noted before, while the latency of direct marker mechanism is acceptable for incremental writes with smaller number of data fil [...]
+
+As shown below, the timeline-server-based marker mechanism generates much fewer files storing markers because of the batch processing, leading to much less time on marker-related I/O operations, thus achieving 31% lower write completion time compared to the direct marker file mechanism.
+
+| Marker Type |   Total Files   |  Num data files written | Files created for markers | Marker deletion time | Bulk Insert Time (including marker deletion) |
+| ----------- | --------- | :---------: | :---------: | :---------: | :---------: | 
+| Direct | 165K | 1k | 165k | 5.4secs | - |
+| Direct | 165K | 165k | 165k | 15min | 55min |
+| Timeline-server-based | 165K | 165k | 20 | ~3s | 38min |
+
+## Conclusion
+
+We identify that the existing direct marker file mechanism incurs performance bottlenecks due to the rate-limiting of file create and delete calls on cloud storage like AWS S3.  To address this issue, we introduce a new marker mechanism leveraging the timeline server, which delegates the marker creation and other marker-related operations from individual executors to the timeline server and uses batch processing to improve performance.  Performance evaluations on Amazon EMR with Spark an [...]
\ No newline at end of file
diff --git a/website/static/assets/images/blog/marker-mechanism/batched-marker-creation.png b/website/static/assets/images/blog/marker-mechanism/batched-marker-creation.png
new file mode 100644
index 0000000..ab814ba
Binary files /dev/null and b/website/static/assets/images/blog/marker-mechanism/batched-marker-creation.png differ
diff --git a/website/static/assets/images/blog/marker-mechanism/direct-marker-file-mechanism.png b/website/static/assets/images/blog/marker-mechanism/direct-marker-file-mechanism.png
new file mode 100644
index 0000000..02f2421
Binary files /dev/null and b/website/static/assets/images/blog/marker-mechanism/direct-marker-file-mechanism.png differ
diff --git a/website/static/assets/images/blog/marker-mechanism/timeline-server-based-marker-mechanism.png b/website/static/assets/images/blog/marker-mechanism/timeline-server-based-marker-mechanism.png
new file mode 100644
index 0000000..809a7d1
Binary files /dev/null and b/website/static/assets/images/blog/marker-mechanism/timeline-server-based-marker-mechanism.png differ