You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Yue Zhang (Jira)" <ji...@apache.org> on 2021/11/04 02:20:00 UTC
[jira] [Created] (HUDI-2683) Parallelize deleting archived hoodie
commits
Yue Zhang created HUDI-2683:
-------------------------------
Summary: Parallelize deleting archived hoodie commits
Key: HUDI-2683
URL: https://issues.apache.org/jira/browse/HUDI-2683
Project: Apache Hudi
Issue Type: Task
Reporter: Yue Zhang
For now, hoodie will use 5s to delete 30 archived commits, even worse for bigger archive threshold like set archive.max_commits 100 or larger.
This is because of hoodie deleting archived commits in driver serially.
Sometimes, it is unacceptable for Spark Streaming jobs with second level batch interval.
We need to delete archived commits in parallel.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)