You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Bo Cui (Jira)" <ji...@apache.org> on 2022/06/16 11:05:00 UTC

[jira] [Created] (HUDI-4270) Bootstrap operation data loading missing

Bo Cui created HUDI-4270:
----------------------------

             Summary: Bootstrap operation data loading missing
                 Key: HUDI-4270
                 URL: https://issues.apache.org/jira/browse/HUDI-4270
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Bo Cui


[https://github.com/apache/hudi/issues/4558]

Procedure:
1. The fs of hudi supports append (for example hdfs... local fs does not support append)
2. Use `hoodie.logfile.max.size` to control the log file size and generate multiple logs(for example log#1 log#2)
3. After the last instant time of log#2 is written(for example 20220616180000), but JM failes to submit the 20220616180000 commit
4. Restart the fink job. the job rolls back the data of the 20220616180000 and rollback instant is 20220616180010, and the index operator loads the all index.
5. In this case, the maximum instant in log#2 is 20220616180010, and the maximum instant in index operator is 20220616180000 (https://github.com/apache/hudi/blob/0ff34b697416fc697dfa96a80496b74598a95263/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java#L195).
6. If log#2 is read first, both log#2 and log#1 will be skipped because 20220616180010 is larger than 20220616180000 (https://github.com/apache/hudi/blob/0ff34b697416fc697dfa96a80496b74598a95263/hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java#L236)
7. In this way, the data of log#1 is not loaded.

Fixed version: Sorting log files in positive order



--
This message was sent by Atlassian Jira
(v8.20.7#820007)