You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/11/20 14:51:00 UTC

[jira] [Created] (HUDI-2807) Failing to acquire lock with async clustering if clustering gets delayed due to lack of resources

sivabalan narayanan created HUDI-2807:
-----------------------------------------

             Summary: Failing to acquire lock with async clustering if clustering gets delayed due to lack of resources
                 Key: HUDI-2807
                 URL: https://issues.apache.org/jira/browse/HUDI-2807
             Project: Apache Hudi
          Issue Type: Bug
    Affects Versions: 0.10.0
            Reporter: sivabalan narayanan


With deltastreamer continuous mode and multi writer enabled, if async clustering while about to commit, acquires lock, and gets delayed to complete the transaction due to large writes or due to lack of resources, regular delta commits on the data timeline fails to acquire lock. I do understand this behavior might not be surprising, given thats how locking semantics work.

But I tried increasing the num retries for lock acquisition for ZK based lock provider, but even after 8 mins ish, clustering does not get a compute share and regular writes fail. when deltastreamer was shutdown, the clustering gets to completion. 

 

From logs, I see that metadata table writes for replace commit happen quickly, but the release of the lock and data table commit does not happen until delta streamer is shutdown by regular writes (failed to acquire lock).

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)