You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Yue Zhang (Jira)" <ji...@apache.org> on 2021/11/04 02:20:00 UTC

[jira] [Created] (HUDI-2683) Parallelize deleting archived hoodie commits

Yue Zhang created HUDI-2683:
-------------------------------

             Summary: Parallelize deleting archived hoodie commits 
                 Key: HUDI-2683
                 URL: https://issues.apache.org/jira/browse/HUDI-2683
             Project: Apache Hudi
          Issue Type: Task
            Reporter: Yue Zhang


For now, hoodie will use 5s to delete 30 archived commits, even worse for bigger archive threshold like set archive.max_commits 100 or larger.

This is because of hoodie deleting archived commits in driver serially.

Sometimes, it is unacceptable for Spark Streaming jobs with second level batch interval.

We need to delete archived commits in parallel.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)