You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ying Lin (Jira)" <ji...@apache.org> on 2022/06/24 07:28:00 UTC

[jira] [Created] (HUDI-4314) Improve the performance of reading from the specified instant when the Flink streaming read application starts

Ying Lin created HUDI-4314:
------------------------------

             Summary: Improve the performance of reading from the specified instant when the Flink streaming read application starts
                 Key: HUDI-4314
                 URL: https://issues.apache.org/jira/browse/HUDI-4314
             Project: Apache Hudi
          Issue Type: Improvement
          Components: flink
            Reporter: Ying Lin


When a Flink streaming reading application starts, it starts reading from the specified instant (or resumes the instant when it was stopped).

We need to filter out the file paths that does not exist, some files may be cleaned by the cleaner.

The current implementation is to do an _exists_ operation on all files, so an optimized way is to only do an _exists_ operatiion for lastest version files.

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)