You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Li Penglin (Jira)" <ji...@apache.org> on 2023/01/06 11:03:00 UTC

[jira] [Assigned] (IMPALA-11662) Improve "refresh iceberg_tbl_on_oss;" performance

     [ https://issues.apache.org/jira/browse/IMPALA-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Li Penglin reassigned IMPALA-11662:
-----------------------------------

    Assignee: Li Penglin

> Improve "refresh iceberg_tbl_on_oss;" performance
> -------------------------------------------------
>
>                 Key: IMPALA-11662
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11662
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Li Penglin
>            Assignee: Li Penglin
>            Priority: Major
>              Labels: impala-iceberg
>
> Since Iceberg provides rich metadata, the cost of directory listing on OSS service e.g. S3A is higher than the cost on HDFS, we could create the file descriptors from Iceberg metadata instead of using org.apache.hadoop.fs.FileSystem#listFiles. https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L189.
> The only thing missing there is the last_modification_time of the files. But since Iceberg files are immutable, maybe we could just come up with a special timestamp for these files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org