You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "LiPenglin (Jira)" <ji...@apache.org> on 2022/05/08 09:40:00 UTC

[jira] [Comment Edited] (IMPALA-11243) Improve predicate pushdown to Iceberg

    [ https://issues.apache.org/jira/browse/IMPALA-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533192#comment-17533192 ] 

LiPenglin edited comment on IMPALA-11243 at 5/8/22 9:39 AM:
------------------------------------------------------------

Hi [~boroknagyz] 

I have a problem, I'm making sure I've pushed the NOT_NULL predicate down to Iceberg, But the _org.apache.iceberg.BaseTableScan#filter(org.apache.iceberg.expressions.Expression)_ doesn't really work. This still happens with NOT_IN predicate push down.

---
Luckily, I found the cause of the problem.
1 I found that `value_counts` is null in the iceberg metadata
{code:java}
{"status":1,"snapshot_id":{"long":5383986197020147226},"data_file":{"file_path":"hdfs://localhost:20500/test-warehouse/tdb.db/ice_is    _null_pred_pd/data/col_i=8/c94417272af4dd76-6820892500000000_918326216_data.0.parq","file_format":"PARQUET","partition":{"col_i":{"i    nt":8}},"record_count":1,"file_size_in_bytes":2581,"block_size_in_bytes":67108864,"column_sizes":{"array":[{"key":1,"value":47},{"ke    y":2,"value":51},{"key":3,"value":51},{"key":4,"value":66},{"key":5,"value":51},{"key":6,"value":47},{"key":7,"value":47},{"key":8,"    value":51},{"key":9,"value":39}]},"value_counts":null,"null_value_counts":{"array":[{"key":1,"value":0},{"key":2,"value":0},{"key":3    ,"value":0},{"key":4,"value":0},{"key":5,"value":0},{"key":6,"value":0},{"key":7,"value":0},{"key":8,"value":0},{"key":9,"value":1}]    },"nan_value_counts":null,"lower_bounds":{"array":[{"key":1,"value":"\b\u0000\u0000\u0000"},{"key":2,"value":"<\u001CÜß\u0002\u0000\    u0000\u0000"},{"key":3,"value":"<90>÷ª<95>\t¿\u0005@"},{"key":4,"value":"1700-01-01 00:00:00"},{"key":5,"value":"<80>ZûÁ¨<84>Öÿ"},{"    key":6,"value":"\u001Cðýÿ"},{"key":7,"value":"\u0001â:"},{"key":8,"value":"\u0001\u001Fqû\u0004È"}]},"upper_bounds":{"array":[{"key"    :1,"value":"\b\u0000\u0000\u0000"},{"key":2,"value":"<\u001CÜß\u0002\u0000\u0000\u0000"},{"key":3,"value":"<90>÷ª<95>\t¿\u0005@"},{"    key":4,"value":"1700-01-01 00:00:00"},{"key":5,"value":"<80>ZûÁ¨<84>Öÿ"},{"key":6,"value":"\u001Cðýÿ"},{"key":7,"value":"\u0001â:"},    {"key":8,"value":"\u0001\u001Fqû\u0004È"}]},"key_metadata":null,"split_offsets":null,"sort_order_id":{"int":0}}} {code}
2 According to `https://github.com/apache/iceberg/blob/b521f40d9f897ffc0d3d30511cfdff35797b5894/api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java#L457` logic, `ROWS_CANNOT_MATCH` is returned when `value_counts` is null
3 Ultimately, the predicate fails to be pushed down


was (Author: lipenglin):
Hi [~boroknagyz] 

I have a problem, I'm making sure I've pushed the NOT_NULL predicate down to Iceberg, But the _org.apache.iceberg.BaseTableScan#filter(org.apache.iceberg.expressions.Expression)_ doesn't really work. This still happens with NOT_IN predicate push down.

> Improve predicate pushdown to Iceberg
> -------------------------------------
>
>                 Key: IMPALA-11243
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11243
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: LiPenglin
>            Priority: Major
>              Labels: impala-iceberg
>
> Iceberg provides a rich API to push down predicates, e.g. we could push down complex predicates with OR, NOT, etc.
> Also, currently we only push down predicates in the form:
> {noformat}
> COL <bin-operator> LITERAL_EXPR
> E.g.:
> col_ts <= '2021-01-01 12:01:00'
> {noformat}
> Instead of only allowing literal expressions, we could evaluate any constant expression and push down the result to Iceberg.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org