You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Yuwei Xiao (Jira)" <ji...@apache.org> on 2022/08/31 05:13:00 UTC
[jira] [Created] (HUDI-4753) More accurate evaluation of log record during log writing or compaction
Yuwei Xiao created HUDI-4753:
--------------------------------
Summary: More accurate evaluation of log record during log writing or compaction
Key: HUDI-4753
URL: https://issues.apache.org/jira/browse/HUDI-4753
Project: Apache Hudi
Issue Type: Bug
Reporter: Yuwei Xiao
In current log writing, the avgRecordSize is taken from the first incoming log record, which may not be accurate, especially in metadata table case.
In metadata table writing, the first log record is always `__all_partition__`, which may be much larger than a normal partition record.
The issue will case performance issue in log writing and compaction, as we need to write too many log blocks and spill unnecessary record to disk.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)