You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Surya Prasanna Yalla (Jira)" <ji...@apache.org> on 2022/03/14 23:30:00 UTC

[jira] [Updated] (HUDI-3580) [RFC-TBD] Support LogCompaction action for MOR tables

     [ https://issues.apache.org/jira/browse/HUDI-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Surya Prasanna Yalla updated HUDI-3580:
---------------------------------------
    Summary: [RFC-TBD] Support LogCompaction action for MOR tables  (was: [RFC-TBD] Support minor compaction for MOR tables)

> [RFC-TBD] Support LogCompaction action for MOR tables
> -----------------------------------------------------
>
>                 Key: HUDI-3580
>                 URL: https://issues.apache.org/jira/browse/HUDI-3580
>             Project: Apache Hudi
>          Issue Type: Epic
>          Components: compaction, metadata
>            Reporter: Surya Prasanna Yalla
>            Priority: Major
>
> Record level index uses metadata table which is a MOR table. 
> Each delta commit in metadata table, creates multiple hfile log blocks and so to read them multiple file handles has to be opened which might cause issues in read performance. To reduce the read performance, compaction can be run frequently which basically merges all the log blocks to base file and creates another base file. If this is done frequently, it would cause write amplification.
> Instead of merging all the log blocks to base file and doing a full compaction, minor compaction can be done which basically stitches log blocks and create one log block. 
> This can be achieved by adding a new action to Hudi called logcompaction, and it operates at log file level. Compaction is creating base files and issues .commit upon completion, similarly minor compaction which is basically creates a new log block can issue a .deltacommit commit on the timeline after completion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)