You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Zhaojing Yu (Jira)" <ji...@apache.org> on 2022/10/01 12:20:00 UTC

[jira] [Updated] (HUDI-3775) Allow for offline compaction of MOR tables via spark streaming

     [ https://issues.apache.org/jira/browse/HUDI-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhaojing Yu updated HUDI-3775:
------------------------------
    Fix Version/s: 0.13.0

> Allow for offline compaction of MOR tables via spark streaming
> --------------------------------------------------------------
>
>                 Key: HUDI-3775
>                 URL: https://issues.apache.org/jira/browse/HUDI-3775
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: compaction, spark
>            Reporter: Rajesh
>            Assignee: sivabalan narayanan
>            Priority: Critical
>              Labels: easyfix
>             Fix For: 0.13.0
>
>
> Currently there is no way to avoid compaction taking up a lot of resources when run inline or async for MOR tables via Spark Streaming. Delta Streamer has ways to assign resources between ingestion and async compaction but Spark Streaming does not have that option. 
> Introducing a flag to turn off automatic compaction and allowing users to run compaction in a separate process will decouple both concerns.
> This will also allow the users to size the cluster just for ingestion and deal with compaction separate without blocking.  We will need to look into documenting best practices for running offline compaction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)