You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2022/04/05 05:58:00 UTC

[jira] [Created] (HUDI-3796) Implement layout to filter out uncommitted log files without reading the log blocks

Ethan Guo created HUDI-3796:
-------------------------------

             Summary: Implement layout to filter out uncommitted log files without reading the log blocks
                 Key: HUDI-3796
                 URL: https://issues.apache.org/jira/browse/HUDI-3796
             Project: Apache Hudi
          Issue Type: Improvement
          Components: writer-core
            Reporter: Ethan Guo
             Fix For: 0.12.0


Related: HUDI-3637

At high level, getLatestFileSlices() is going to fetch the latest file slices for committed base files and filter out any file slices with the uncommitted base instant time.  The uncommitted log files in the latest file slices may be included, and they are skipped while doing log reading and merging, i.e., the logic in "AbstractHoodieLogRecordReader".

We can use log instant time instead of base instant time for the log file name so that it is able to filter out uncommitted log files without reading the log blocks beforehand.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)