You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Yuwei Xiao (Jira)" <ji...@apache.org> on 2022/08/31 05:13:00 UTC

[jira] [Created] (HUDI-4753) More accurate evaluation of log record during log writing or compaction

Yuwei Xiao created HUDI-4753:
--------------------------------

             Summary: More accurate evaluation of log record during log writing or compaction
                 Key: HUDI-4753
                 URL: https://issues.apache.org/jira/browse/HUDI-4753
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Yuwei Xiao


In current log writing, the avgRecordSize is taken from the first incoming log record, which may not be accurate, especially in metadata table case.

 

In metadata table writing, the first log record is always `__all_partition__`, which may be much larger than a normal partition record.

 

The issue will case performance issue in log writing and compaction, as we need to write too many log blocks and spill unnecessary record to disk.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)