You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/09/28 07:03:00 UTC
[jira] [Created] (IMPALA-10932) Make sure all kinds of simple
predicates on bool columns are pushed down
Quanlong Huang created IMPALA-10932:
---------------------------------------
Summary: Make sure all kinds of simple predicates on bool columns are pushed down
Key: IMPALA-10932
URL: https://issues.apache.org/jira/browse/IMPALA-10932
Project: IMPALA
Issue Type: Improvement
Reporter: Quanlong Huang
Assignee: Quanlong Huang
When scanning parquet/orc tables, we push down binary predicates like "x < 1" to leverage the file level statistics. However, predicates on bool column may not in this form. They could be "{{x}}", "{{NOT x}}", "{{x ISĀ [NOT] TRUE}}", "{{x IS [NOT] FALSE}}".
Note that dictionary predicates may have some of them, but still not all of them. For instance, here we have the predicate in dictionary predicates:
{code:sql}
set explain_level=2;
explain select count(*) from functional_parquet.alltypessmall where bool_col;
| 00:SCAN HDFS [functional_parquet.alltypessmall, RANDOM] |
| HDFS partitions=4/4 files=4 size=14.76KB |
| predicates: bool_col |
| stored statistics: |
| table: rows=unavailable size=unavailable |
| partitions: 0/4 rows=939 |
| columns: unavailable |
| extrapolated-rows=disabled max-scan-range-rows=unavailable |
| parquet dictionary predicates: bool_col |
{code}
Here we still have the predicate in dictionary predicates:
{code:sql}
explain select count(*) from functional_parquet.alltypessmall where bool_col is true;
| 00:SCAN HDFS [functional_parquet.alltypessmall, RANDOM] |
| HDFS partitions=4/4 files=4 size=14.76KB |
| predicates: istrue(bool_col) |
| stored statistics: |
| table: rows=unavailable size=unavailable |
| partitions: 0/4 rows=939 |
| columns: unavailable |
| extrapolated-rows=disabled max-scan-range-rows=unavailable |
| parquet dictionary predicates: istrue(bool_col) |
{code}
But here we don't have any predicates pushed down to stats or dictionary:
{code:sql}
explain select count(*) from functional_parquet.alltypessmall where bool_col is not true;
| 00:SCAN HDFS [functional_parquet.alltypessmall, RANDOM] |
| HDFS partitions=4/4 files=4 size=14.76KB |
| predicates: isnottrue(bool_col) |
| stored statistics: |
| table: rows=unavailable size=unavailable |
| partitions: 0/4 rows=939 |
| columns: unavailable |
| extrapolated-rows=disabled max-scan-range-rows=unavailable |
| mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1 |
| tuple-ids=0 row-size=1B cardinality=94 |
| in pipelines: 00(GETNEXT) |
{code}
If we use a weird form "x < TRUE", we can see them both:
{code:sql}
explain select count(*) from functional_parquet.alltypessmall where bool_col < true;
| 00:SCAN HDFS [functional_parquet.alltypessmall] |
| HDFS partitions=4/4 files=4 size=14.76KB |
| predicates: bool_col < TRUE |
| stored statistics: |
| table: rows=unavailable size=unavailable |
| partitions: 0/4 rows=939 |
| columns: unavailable |
| extrapolated-rows=disabled max-scan-range-rows=unavailable |
| parquet statistics predicates: bool_col < TRUE |
| parquet dictionary predicates: bool_col < TRUE |
| mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1 |
{code}
Usually, we don't use this form for bool columns. So we should deal with the above mentioned forms as well.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org