You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/06 00:30:27 UTC

[GitHub] [hudi] yihua commented on a diff in pull request #6003: [HUDI-1575][RFC-56] Early Conflict Detection For Multi-writer

yihua commented on code in PR #6003:
URL: https://github.com/apache/hudi/pull/6003#discussion_r988454552


##########
rfc/rfc-56/rfc-56.md:
##########
@@ -0,0 +1,238 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+# RFC-56: Early Conflict Detection For Multi-writer
+
+## Proposers
+
+- @zhangyue19921010
+
+## Approvers
+
+- @yihua
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-1575
+
+## Abstract
+
+At present, Hudi implements an OCC (Optimistic Concurrency Control) based on timeline to ensure data consistency,
+integrity and correctness between multi-writers. OCC detects the conflict at Hudi's file group level, i.e., two
+concurrent writers updating the same file group are detected as a conflict. Currently, the conflict detection is
+performed before commit metadata and after the data writing is completed. If any conflict is detected, it leads to a
+waste of cluster resources because computing and writing were finished already.
+
+To solve this problem, this RFC proposes an early conflict detection mechanism to detect the conflict during the data
+writing phase and abort the writing early if conflict is detected, using Hudi's marker mechanism. Before writing each
+data file, the writer creates a corresponding marker to mark that the file is created, so that the writer can use the
+markers to automatically clean up uncommitted data in failure and rollback scenarios. We propose to use the markers
+identify the conflict at the file group level during writing data. There are some subtle differences in early conflict
+detection work flow between different types of marker maintainers. For direct markers, hoodie lists necessary marker
+files directly and does conflict checking before the writers creating markers and before starting to write corresponding
+data file. For the timeline-server based markers, hoodie just gets the result of marker conflict checking before the
+writers creating markers and before starting to write corresponding data files. The conflicts are asynchronously and
+periodically checked so that the writing conflicts can be detected as early as possible. Both writers may still write
+the data files of the same file slice, until the conflict is detected in the next round of checking.
+
+What's more? Hoodie can stop writing earlier because of early conflict detection and release the resources to cluster,
+improving resource utilization.
+
+Note that, the early conflict detection proposed by this RFC operates within OCC. Any conflict detection outside the
+scope of OCC is not handle. For example, current OCC for multiple writers cannot detect the conflict if two concurrent
+writers perform INSERT operations for the same set of record keys, because the writers write to different file groups.
+This RFC does not intend to address this problem.
+
+## Background
+
+As we know, transactions and multi-writers of data lakes are becoming the key characteristics of building Lakehouse
+these days. Quoting this inspiring blog <strong>Lakehouse Concurrency Control: Are we too optimistic?</strong> directly:
+https://hudi.apache.org/blog/2021/12/16/lakehouse-concurrency-control-are-we-too-optimistic/
+
+> "Hudi implements a file level, log based concurrency control protocol on the Hudi timeline, which in-turn relies
+> on bare minimum atomic puts to cloud storage. By building on an event log as the central piece for inter process
+> coordination, Hudi is able to offer a few flexible deployment models that offer greater concurrency over pure OCC
+> approaches that just track table snapshots."
+
+In the multi-writer scenario, Hudi's existing conflict detection occurs after the writer finishing writing the data and
+before committing the metadata. In other words, the writer just detects the occurrence of the conflict when it starts to
+commit, although all calculations and data writing have been completed, which causes a waste of resources.
+
+For example:
+
+Now there are two writing jobs: job1 writes 10M data to the Hudi table, including updates to file group 1. Another job2
+writes 100G to the Hudi table, and also updates the same file group 1.
+
+Job1 finishes and commits to Hudi successfully. After a few hours, job2 finishes writing data files(100G) and starts to
+commit metadata. At this time, a conflict with job1 is found, and the job2 has to be aborted and re-run after failure.
+Obviously, a lot of computing resources and time are wasted for job2.
+
+Hudi currently has two important mechanisms, marker mechanism and heartbeat mechanism:
+
+1. Marker mechanism can track all the files that are part of an active write.
+2. Heartbeat mechanism that can track all active writers to a Hudi table.
+
+Based on marker and heartbeat, this RFC proposes a new conflict detection: Early Conflict Detection. Before the writer
+creates the marker and before it starts to write the file, Hudi performs this new conflict detection, trying to detect
+the writing conflict directly (for direct markers) or get the async conflict check result (for timeline-server-based
+markers) as early as possible and abort the writer when the conflict occurs, so that we can release compute resource as
+soon as possible and improve resource utilization.
+
+## Implementation
+
+Here is the high level workflow of early conflict detection as shown in Figure 1 below. The early conflict detection is
+guarded by a new feature flag. As we can see, when both `supportsOptimisticConcurrencyControl`
+and `isEarlyConflictDetectionEnable` (the new feature flag) are true, we could use this early conflict detection
+feature. Else, we skip this check and create marker directly.
+
+![](figure1.png)
+
+The three important steps marked in red in Figure 1 are introduced one by one as follows:
+
+### [1] Check Marker Conflict
+
+As we know, Hudi has two ways to create and maintain markers:
+
+1. DirectWriteMarkers: individual marker file corresponding to each data file is directly created by the writer.
+2. TimelineServerBasedWriteMarkers: marker operations are all handled at the timeline service which serves as a proxy
+
+Therefore, for different types of Marker, we must implement the corresponding conflict detection logic based on the
+markers. Here we design a new interface `HoodieEarlyConflictDetectionStrategy` to ensure the extensibility of checking
+marker conflict.
+
+![](flow1.png)
+
+In this design, we provide `SimpleTransactionDirectMarkerBasedEarlyConflictDetectionStrategy` and
+`SimpleDirectMarkerBasedEarlyConflictDetectionStrategy` for DirectWriteMarkers to perform corresponding conflict
+detection and conflict resolution. And we provide `AsyncTimelineMarkerEarlyConflictDetectionStrategy` for
+TimelineServerBasedWriteMarkers to perform corresponding conflict detection and conflict resolution
+
+#### DirectWriteMarkers related strategy

Review Comment:
   Given that the timeline-server-based markers are the default now, should we rely more on the timeline-server-based early conflict detection for better performance and make the Bloom filter based marker conflict detection as a nice-to-have in the first cut?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org