You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/01/31 23:32:00 UTC

[jira] [Commented] (IMPALA-11662) Improve "refresh iceberg_tbl_on_oss;" performance

    [ https://issues.apache.org/jira/browse/IMPALA-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682788#comment-17682788 ] 

ASF subversion and git services commented on IMPALA-11662:
----------------------------------------------------------

Commit 4d6ff6fddd77800999e1d6d8ef0de8af5b1684ab in impala's branch refs/heads/master from LPL
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4d6ff6fdd ]

IMPALA-11662: Improve 'refresh iceberg_tbl_on_oss' performance

As the cost of directory listing on Cloud Storage Systems such as OSS or
S3 is higher than the cost on HDFS, we could create the file descriptors
from the rich metadata provided by Iceberg instead of using
org.apache.hadoop.fs.FileSystem#listFiles. The only thing missing there
is the last_modification_time of the files. But since Iceberg files are
immutable, we could just come up with a special timestamp for these
files.

At the same time, we can also construct file descriptors ourselves
during time travel to reduce the cost of requests with OSS services.

Test:
 * existing tests
 * test on COS with my local test environment

Change-Id: If2ee8b6b7559e6590698b46ef1d574e55ed52f9a
Reviewed-on: http://gerrit.cloudera.org:8080/19379
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>


> Improve "refresh iceberg_tbl_on_oss;" performance
> -------------------------------------------------
>
>                 Key: IMPALA-11662
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11662
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Li Penglin
>            Assignee: Li Penglin
>            Priority: Major
>              Labels: impala-iceberg
>
> Since Iceberg provides rich metadata, the cost of directory listing on OSS service e.g. S3A is higher than the cost on HDFS, we could create the file descriptors from Iceberg metadata instead of using org.apache.hadoop.fs.FileSystem#listFiles. https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L189.
> The only thing missing there is the last_modification_time of the files. But since Iceberg files are immutable, maybe we could just come up with a special timestamp for these files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org