You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2023/01/05 15:04:00 UTC

[jira] [Updated] (IMPALA-11591) Avoid calling planFiles() on Iceberg tables when there are no predicates

     [ https://issues.apache.org/jira/browse/IMPALA-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zoltán Borók-Nagy updated IMPALA-11591:
---------------------------------------
    Description: 
Currently we always invoke Iceberg's planFiles() API for creating Iceberg scans.

When there are no predicates (and no time travel) on the table we could avoid that because we already cache everything we need (schema, partition information, file descriptors).

We can also consider only pushing down predicates if at least one of the predicates refer to a partition column. Otherwise it's possible that the overhead of reading, decoding, evaluating all the manifest files is too large.

I think the change should be fairly simple, we just need to take care:
 * -store delete files separately, so we can still do the V2 scans from cache- (will be implemented by IMPALA-11826)
 * During time-travel we also cache old file descriptors, so we need to separate them from the actual snapshot's file descriptors.

  was:
Currently we always invoke Iceberg's planFiles() API for creating Iceberg scans.

When there are no predicates (and no time travel) on the table we could avoid that because we already cache everything we need (schema, partition information, file descriptors).

We can also consider only pushing down predicates if at least one of the predicates refer to a partition column. Otherwise it's possible that the overhead of reading, decoding, evaluating all the manifest files is too large.

I think the change should be fairly simple, we just need to take care:
 * store delete files separately, so we can still do the V2 scans from cache
 * During time-travel we also cache old file descriptors, so we need to separate them from the actual snapshot's file descriptors.


> Avoid calling planFiles() on Iceberg tables when there are no predicates
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-11591
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11591
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>             Fix For: Impala 4.2.0
>
>
> Currently we always invoke Iceberg's planFiles() API for creating Iceberg scans.
> When there are no predicates (and no time travel) on the table we could avoid that because we already cache everything we need (schema, partition information, file descriptors).
> We can also consider only pushing down predicates if at least one of the predicates refer to a partition column. Otherwise it's possible that the overhead of reading, decoding, evaluating all the manifest files is too large.
> I think the change should be fairly simple, we just need to take care:
>  * -store delete files separately, so we can still do the V2 scans from cache- (will be implemented by IMPALA-11826)
>  * During time-travel we also cache old file descriptors, so we need to separate them from the actual snapshot's file descriptors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org