You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2021/09/09 14:15:00 UTC

[jira] [Created] (IMPALA-10910) Iceberg scans don't apply runtime filters at Parquet row group level

Zoltán Borók-Nagy created IMPALA-10910:
------------------------------------------

             Summary: Iceberg scans don't apply runtime filters at Parquet row group level
                 Key: IMPALA-10910
                 URL: https://issues.apache.org/jira/browse/IMPALA-10910
             Project: IMPALA
          Issue Type: Bug
            Reporter: Zoltán Borók-Nagy


From a performance test on TPC-DS 3000 executed by [~rizaon] we noticed that runtime filters are only applied at row level.

It is known that runtime filters are not applied at file/partition level on Iceberg tables (IMPALA-10453). But they could be applied at Parquet row group level. I think achieving this is much easier than fixing IMPALA-10453.

E.g. here is a snipped of the runtime profile of q49 of TPC-DS:
{noformat}
        Filter 0 (8.00 KB) [108 instances]:
           - Files processed: 0 (0)
           - Files rejected: 0 (0)
           - Files total: 0 (0)
           - InactiveTotalTime: 0.000ns
           - RowGroups processed: 0 (0)
           - RowGroups rejected: 0 (0)
           - RowGroups total: 0 (0)
           - Rows processed: 19.34M (19335783)
           - Rows rejected: 19.32M (19323695)
           - Rows total: 20.00M (19999711)
           - Splits processed: 0 (0)
           - Splits rejected: 0 (0)
           - Splits total: 0 (0)
           - TotalTime: 0.000ns
{noformat}

We could save a lot of IO by applying the filters at row group level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org