You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ying Lin (Jira)" <ji...@apache.org> on 2022/06/24 07:28:00 UTC
[jira] [Created] (HUDI-4314) Improve the performance of reading from the specified instant when the Flink streaming read application starts
Ying Lin created HUDI-4314:
------------------------------
Summary: Improve the performance of reading from the specified instant when the Flink streaming read application starts
Key: HUDI-4314
URL: https://issues.apache.org/jira/browse/HUDI-4314
Project: Apache Hudi
Issue Type: Improvement
Components: flink
Reporter: Ying Lin
When a Flink streaming reading application starts, it starts reading from the specified instant (or resumes the instant when it was stopped).
We need to filter out the file paths that does not exist, some files may be cleaned by the cleaner.
The current implementation is to do an _exists_ operation on all files, so an optimized way is to only do an _exists_ operatiion for lastest version files.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)