You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2021/08/29 12:03:31 UTC

[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17815


Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................

WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

This patch pushs down more kinds of predicates into the ORC reader,
including EQUALS and IN-list predicates which can leverage the bloom
filters in the ORC files.

TODO: Push down IS NULL predicate

Tests
 * Add test in orc-stats.test

Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
4 files changed, 90 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/17815/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Qifan Chen, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17815

to look at the new patch set (#4).

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................

IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

This patch pushs down more kinds of predicates into the ORC reader,
including EQUALS, IN-list, and IS-NULL predicates to have more
improvements:
 - EQUALS and IN-list predicates can be evaluated inside the ORC reader
   with bloom filters in the ORC files.
 - Comparing to scanning parquet that converting an IN-list predicate
   into two binary predicates (i.e. LE and GE), the ORC reader can
   leverage IN-list predicates to skip ORC RowGroups. E.g. a RowGroup
   with int column 'x' in range [1, 100] will be skipped if we push down
   predicate "x in (0, 101)".
 - IS-NULL predicates (including IS-NOT-NULL) can also be used in the
   ORC reader to skip RowGroups.

Implementation:
FE will collect these kinds of predicates into 'min_max_conjuncts' of
THdfsScanNode. To better reflect the meaning, 'min_max_conjuncts' is
renamed to 'stats_conjuncts'. Same for other related variable names.

Parquet scanner will only pick binary min-max conjuncts (i.e. LT, GT,
LE, and GE) to keep the existing behavior. ORC scanner will build
SearchArgument based on all these conjuncts.

Tests
 * Add test in orc-stats.test

Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
9 files changed, 568 insertions(+), 162 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/17815/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@604
PS2, Line 604:         buildStatsPredicate(analyzer, slotRef, binaryPred, binaryPred.getOp());
Parquet has a somewhat hacky way of finding EQ predicates in the backend and using it in bloom filters: https://github.com/apache/impala/blob/master/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1884

It would be great to use a common logic here - I prefer doing the logic in FE, but we did it in BE because we (Daniel Becker + me) were more familiar with BE.



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 13 Sep 2021 10:00:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 9:

(4 comments)

Thank Qifan! Addressed the comments. I also refactored HdfsOrcScanner::PrepareSearchArguments to make it shorter.

http://gerrit.cloudera.org:8080/#/c/17815/8/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/8/be/src/exec/hdfs-orc-scanner.cc@1106
PS8, Line 1106: urn true;
              : }
              : 
> This logic is common to both IS_NULL/IS_NOT_NULL and binary predicate. It c
Done


http://gerrit.cloudera.org:8080/#/c/17815/8/be/src/exec/hdfs-orc-scanner.cc@1124
PS8, Line 1124: onst TupleDescriptor* stats_tuple_desc = scan_node_->stats_tuple_desc();
              :   if (!stats_tuple_desc) return Status::OK();
              : 
              :   // Clone the min/max statistics conjuncts.
              :   RETURN_IF_ERROR(ScalarExprEvaluator::Clone(&obj_pool_, state_,
              :      
> A DCHECK() here should be sufficient as the types have been checked in FE. 
Thanks for pointing this out! This reveals a bug in FE that we should check PrimitiveType instead of Type directly. Updated the FE checks.


http://gerrit.cloudera.org:8080/#/c/17815/8/be/src/exec/hdfs-orc-scanner.cc@1141
PS8, Line 1141: SlotDescriptor* slot_desc = stats_tuple_desc->slots()[i];
              :     // Resolve column path to determine col idx in file schema.
> Check TIMESTAMP too?
Done


http://gerrit.cloudera.org:8080/#/c/17815/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@587
PS8, Line 587:  Preconditions.checkState(slotDesc.isScanSlot());
             :     // Skip the slot ref if it refers to an array's "pos" field.
             :     if (slotDesc.isArrayPosRef()) return;
             : 
             :     Expr constExpr = binaryPred.getChild(1);
             :     // Only constant exprs can be evaluated against parquet::Statistics. This includes
             :     /
> The check logic on non-supported type for ORC push-down expression is dupli
Done. I hope we can remove these TODOs soon :)



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 08:31:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................

IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

This patch pushs down more kinds of predicates into the ORC reader,
including EQUALS, IN-list, and IS-NULL predicates to have more
improvements:
 - EQUALS and IN-list predicates can be evaluated inside the ORC reader
   with bloom filters in the ORC files.
 - Comparing to scanning parquet that converting an IN-list predicate
   into two binary predicates (i.e. LE and GE), the ORC reader can
   leverage IN-list predicates to skip ORC RowGroups. E.g. a RowGroup
   with int column 'x' in range [1, 100] will be skipped if we push down
   predicate "x in (0, 101)".
 - IS-NULL predicates (including IS-NOT-NULL) can also be used in the
   ORC reader to skip RowGroups.

Implementation:
FE will collect these kinds of predicates into 'min_max_conjuncts' of
THdfsScanNode. To better reflect the meaning, 'min_max_conjuncts' is
renamed to 'stats_conjuncts'. Same for other related variable names.

Parquet scanner will only pick binary min-max conjuncts (i.e. LT, GT,
LE, and GE) to keep the existing behavior. ORC scanner will build
SearchArgument based on all these conjuncts.

Tests
 * Add a new test table 'alltypessmall_bool_sorted' which has files
   contiaining sorted bool values.
 * Add test in orc-stats.test

Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Reviewed-on: http://gerrit.cloudera.org:8080/17815
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Qifan Chen <qc...@cloudera.com>
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M bin/impala-config.sh
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
12 files changed, 695 insertions(+), 210 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Qifan Chen: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17815/3/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/3/be/src/exec/hdfs-orc-scanner.cc@1096
PS3, Line 1096:       expr_perm_pool_.get(), context_->expr_results_pool(), scan_node_->stats_conjunct_evals(), &stats_conjunct_evals_));
line too long (121 > 90)


http://gerrit.cloudera.org:8080/#/c/17815/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@601
PS3, Line 601:         buildBinaryStatsPredicate(analyzer, slotRef, binaryPred, BinaryPredicate.Operator.LE);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17815/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@602
PS3, Line 602:         buildBinaryStatsPredicate(analyzer, slotRef, binaryPred, BinaryPredicate.Operator.GE);
line too long (94 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 26 Sep 2021 11:38:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046
PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) {
              :     // ORC reader only supports pushing down predicates that constant parts are literal.
              :     // We could get non-literal expr if expr rewrites are disabled.
              :     if (!eval->root().GetChild(i)->IsLiteral()) return false;
              :     in_list.emplace_back(GetLiteralSearchArguments(
              :         eval, i, slot_desc->type(), &predicate_type));
              :   }
> Since we have checked in FE on literals already, looks this loop can be rem
This loop is for generating 'in_list', the vector<orc::Literal>. The check inside it is also needed since we could get non-literal expr if expr rewrites are disabled (thus constant-folding is disabled).


http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@603
PS2, Line 603: EQ 
> nit. you mean binary?
Sorry, I mean EQUALS predicate. We have a check at line 595.



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 31 Aug 2021 09:15:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 30 Sep 2021 14:02:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9503/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 27 Sep 2021 02:44:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Qifan Chen, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17815

to look at the new patch set (#6).

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................

IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

This patch pushs down more kinds of predicates into the ORC reader,
including EQUALS, IN-list, and IS-NULL predicates to have more
improvements:
 - EQUALS and IN-list predicates can be evaluated inside the ORC reader
   with bloom filters in the ORC files.
 - Comparing to scanning parquet that converting an IN-list predicate
   into two binary predicates (i.e. LE and GE), the ORC reader can
   leverage IN-list predicates to skip ORC RowGroups. E.g. a RowGroup
   with int column 'x' in range [1, 100] will be skipped if we push down
   predicate "x in (0, 101)".
 - IS-NULL predicates (including IS-NOT-NULL) can also be used in the
   ORC reader to skip RowGroups.

Implementation:
FE will collect these kinds of predicates into 'min_max_conjuncts' of
THdfsScanNode. To better reflect the meaning, 'min_max_conjuncts' is
renamed to 'stats_conjuncts'. Same for other related variable names.

Parquet scanner will only pick binary min-max conjuncts (i.e. LT, GT,
LE, and GE) to keep the existing behavior. ORC scanner will build
SearchArgument based on all these conjuncts.

Tests
 * Add a new test table 'alltypessmall_bool_sorted' which has files
 * contiaining sorted bool values.
 * Add test in orc-stats.test

Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
11 files changed, 635 insertions(+), 167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/17815/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 9: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 15:23:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 7:

There is a bug in the ORC lib about using the bloom filters: https://issues.apache.org/jira/browse/ORC-1024
Let's wait until it got resolved.

The test failure is unrelated (IMPALA-10747). Thanks Csaba for pointing it out!


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 10:15:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9396/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 29 Aug 2021 12:54:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 9: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 15:45:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Qifan Chen, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17815

to look at the new patch set (#8).

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................

IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

This patch pushs down more kinds of predicates into the ORC reader,
including EQUALS, IN-list, and IS-NULL predicates to have more
improvements:
 - EQUALS and IN-list predicates can be evaluated inside the ORC reader
   with bloom filters in the ORC files.
 - Comparing to scanning parquet that converting an IN-list predicate
   into two binary predicates (i.e. LE and GE), the ORC reader can
   leverage IN-list predicates to skip ORC RowGroups. E.g. a RowGroup
   with int column 'x' in range [1, 100] will be skipped if we push down
   predicate "x in (0, 101)".
 - IS-NULL predicates (including IS-NOT-NULL) can also be used in the
   ORC reader to skip RowGroups.

Implementation:
FE will collect these kinds of predicates into 'min_max_conjuncts' of
THdfsScanNode. To better reflect the meaning, 'min_max_conjuncts' is
renamed to 'stats_conjuncts'. Same for other related variable names.

Parquet scanner will only pick binary min-max conjuncts (i.e. LT, GT,
LE, and GE) to keep the existing behavior. ORC scanner will build
SearchArgument based on all these conjuncts.

Tests
 * Add a new test table 'alltypessmall_bool_sorted' which has files
   contiaining sorted bool values.
 * Add test in orc-stats.test

Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
11 files changed, 635 insertions(+), 167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/17815/8
-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Qifan Chen, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17815

to look at the new patch set (#9).

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................

IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

This patch pushs down more kinds of predicates into the ORC reader,
including EQUALS, IN-list, and IS-NULL predicates to have more
improvements:
 - EQUALS and IN-list predicates can be evaluated inside the ORC reader
   with bloom filters in the ORC files.
 - Comparing to scanning parquet that converting an IN-list predicate
   into two binary predicates (i.e. LE and GE), the ORC reader can
   leverage IN-list predicates to skip ORC RowGroups. E.g. a RowGroup
   with int column 'x' in range [1, 100] will be skipped if we push down
   predicate "x in (0, 101)".
 - IS-NULL predicates (including IS-NOT-NULL) can also be used in the
   ORC reader to skip RowGroups.

Implementation:
FE will collect these kinds of predicates into 'min_max_conjuncts' of
THdfsScanNode. To better reflect the meaning, 'min_max_conjuncts' is
renamed to 'stats_conjuncts'. Same for other related variable names.

Parquet scanner will only pick binary min-max conjuncts (i.e. LT, GT,
LE, and GE) to keep the existing behavior. ORC scanner will build
SearchArgument based on all these conjuncts.

Tests
 * Add a new test table 'alltypessmall_bool_sorted' which has files
   contiaining sorted bool values.
 * Add test in orc-stats.test

Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M bin/impala-config.sh
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
12 files changed, 695 insertions(+), 210 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/17815/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046
PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) {
              :     // ORC reader only supports pushing down predicates that constant parts are literal.
              :     // We could get non-literal expr if expr rewrites are disabled.
              :     if (!eval->root().GetChild(i)->IsLiteral()) return false;
              :     in_list.emplace_back(GetLiteralSearchArguments(
              :         eval, i, slot_desc->type(), &predicate_type));
              :   }
> This loop is for generating 'in_list', the vector<orc::Literal>. The check 
Does the constant-folding happen in FE?


http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@603
PS2, Line 603: EQ 
> Sorry, I mean EQUALS predicate. We have a check at line 595.
I see. Yeah, push directly is nice. 

Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 31 Aug 2021 13:06:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17815

to look at the new patch set (#2).

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................

WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

This patch pushs down more kinds of predicates into the ORC reader,
including EQUALS and IN-list predicates which can leverage the bloom
filters in the ORC files.

TODO: Push down IS NULL predicate

Tests
 * Add test in orc-stats.test

Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
4 files changed, 91 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/17815/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 2:

(4 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046
PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) {
              :     // ORC reader only supports pushing down predicates that constant parts are literal.
              :     // We could get non-literal expr if expr rewrites are disabled.
              :     if (!eval->root().GetChild(i)->IsLiteral()) return false;
              :     in_list.emplace_back(GetLiteralSearchArguments(
              :         eval, i, slot_desc->type(), &predicate_type));
              :   }
Since we have checked in FE on literals already, looks this loop can be removed?


http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@551
PS2, Line 551: 1
nit. May need to assure that child0 is a reference to a column.


http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@603
PS2, Line 603: EQ 
nit. you mean binary?


http://gerrit.cloudera.org:8080/#/c/17815/2/testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
File testdata/workloads/functional-query/queries/QueryTest/orc-stats.test:

http://gerrit.cloudera.org:8080/#/c/17815/2/testdata/workloads/functional-query/queries/QueryTest/orc-stats.test@364
PS2, Line 364: 0, 7299
In lists on other data types should be added.



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Aug 2021 20:23:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9514/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 28 Sep 2021 07:33:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9395/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 29 Aug 2021 12:25:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7552/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 09:11:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 6: Code-Review+1

(1 comment)

Great work!

http://gerrit.cloudera.org:8080/#/c/17815/5/testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
File testdata/workloads/functional-query/queries/QueryTest/orc-stats.test:

http://gerrit.cloudera.org:8080/#/c/17815/5/testdata/workloads/functional-query/queries/QueryTest/orc-stats.test@214
PS5, Line 214: ---- RESULTS
> Done. Added a table with rows sorted by the bool column. So we can test boo
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 28 Sep 2021 13:05:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 5:

(10 comments)

Thank Qifan for catching the stale comments! Added some tests on bool columns.

http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.h@324
PS5, Line 324: min/max
> nit. stats
Done


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.h@339
PS5, Line 339: min/max
> nit. stats
Done


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.cc@1064
PS5, Line 1064: are
> nit. is.
Yeah, good point!


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.cc@1119
PS5, Line 1119: min-max
> nit. stats.
Done


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.cc@1136
PS5, Line 1136: // TODO(IMPALA-10882): push down min-max predicates on CHAR/VARCHAR.
              :     if (node->getKind() == orc::CHAR || node->getKind() == orc::VARCHAR) continue;
> can be removed since the test has been done in line 1121 and 1122.
We need this since it tests on the file schema but line 1121 and 1122 test on table schema. For schema evolution, an ORC table can have a STRING column mapping to the CHAR/VARCHAR column in ORC files. So we do the check here after we resolve the column, i.e. the ResolveColumn() call.


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-scan-node-base.cc@216
PS5, Line 216: min max
> nit stats
Done


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-scan-node-base.cc@444
PS5, Line 444:  min max
> nit. stats
Done


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-scan-node-base.cc@524
PS5, Line 524: min max
> nit. stats
Done


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-scan-node-base.cc@633
PS5, Line 633: min max
> nit. stats
Done


http://gerrit.cloudera.org:8080/#/c/17815/5/testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
File testdata/workloads/functional-query/queries/QueryTest/orc-stats.test:

http://gerrit.cloudera.org:8080/#/c/17815/5/testdata/workloads/functional-query/queries/QueryTest/orc-stats.test@214
PS5, Line 214: # Test on predicate x < a for float that can't filter out any RowGroups.
> I wonder if boolean type is supported by ORC for push-down. If so, we may n
Done. Added a table with rows sorted by the bool column. So we can test bool for this. Due to some limitation in FE, bool predicates like "x is true" are not added to stats conjuncts. I'll address this in IMPALA-10932. I just add tests like "x = true", "x in (true)".



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 28 Sep 2021 07:12:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Qifan Chen, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17815

to look at the new patch set (#3).

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................

IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

This patch pushs down more kinds of predicates into the ORC reader,
including EQUALS, IN-list, and IS-NULL predicates to have more
improvements:
 - EQUALS and IN-list predicates can be evaluated inside the ORC reader
   with bloom filters in the ORC files.
 - Comparing to scanning parquet that converting an IN-list predicate
   into two binary predicates (i.e. LE and GE), the ORC reader can
   leverage IN-list predicates to skip ORC RowGroups. E.g. a RowGroup
   with int column 'x' in range [1, 100] will be skipped if we push down
   predicate "x in (0, 101)".
 - IS-NULL predicates (including IS-NOT-NULL) can also be used in the
   ORC reader to skip RowGroups.

Implementation:
FE will collect these kinds of predicates into 'min_max_conjuncts' of
THdfsScanNode. To better reflect the meaning, 'min_max_conjuncts' is
renamed to 'stats_conjuncts'. Same for other related variable names.
Parquet scanner will only pick binary min-max conjuncts (i.e. LT, GT,
LE, and GE) to keep the existing behavior. ORC scanner will build
SearchArgument based on these conjuncts.

Tests
 * Add test in orc-stats.test

Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
9 files changed, 555 insertions(+), 153 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/17815/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046
PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) {
              :     // ORC reader only supports pushing down predicates that constant parts are literal.
              :     // We could get non-literal expr if expr rewrites are disabled.
              :     if (!eval->root().GetChild(i)->IsLiteral()) return false;
              :     in_list.emplace_back(GetLiteralSearchArguments(
              :         eval, i, slot_desc->type(), &predicate_type));
              :   }
> Does the constant-folding happen in FE?
Yes, here is the rule: https://github.com/apache/impala/blob/beb8019f5300bb163424e7fdfec50b8e4b796e26/fe/src/main/java/org/apache/impala/rewrite/FoldConstantsRule.java#L41

Here is the entry point for expr rewrite: https://github.com/apache/impala/blob/beb8019f5300bb163424e7fdfec50b8e4b796e26/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java#L521



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 31 Aug 2021 14:14:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 26 Sep 2021 17:53:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Qifan Chen, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17815

to look at the new patch set (#5).

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................

IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

This patch pushs down more kinds of predicates into the ORC reader,
including EQUALS, IN-list, and IS-NULL predicates to have more
improvements:
 - EQUALS and IN-list predicates can be evaluated inside the ORC reader
   with bloom filters in the ORC files.
 - Comparing to scanning parquet that converting an IN-list predicate
   into two binary predicates (i.e. LE and GE), the ORC reader can
   leverage IN-list predicates to skip ORC RowGroups. E.g. a RowGroup
   with int column 'x' in range [1, 100] will be skipped if we push down
   predicate "x in (0, 101)".
 - IS-NULL predicates (including IS-NOT-NULL) can also be used in the
   ORC reader to skip RowGroups.

Implementation:
FE will collect these kinds of predicates into 'min_max_conjuncts' of
THdfsScanNode. To better reflect the meaning, 'min_max_conjuncts' is
renamed to 'stats_conjuncts'. Same for other related variable names.

Parquet scanner will only pick binary min-max conjuncts (i.e. LT, GT,
LE, and GE) to keep the existing behavior. ORC scanner will build
SearchArgument based on all these conjuncts.

Tests
 * Add test in orc-stats.test

Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
9 files changed, 568 insertions(+), 162 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/17815/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9500/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 26 Sep 2021 12:00:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 3:

(4 comments)

Thanks for the feedback! Addressed the comments.

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046
PS2, Line 1046:   default:
              :       DCHECK(false) << "Invalid type";
              :       return orc::Literal(orc::PredicateDataType::BOOLEAN);
              :   }
              : }
              : 
              : boo
> Okay. That fits my understanding of constant folding. Thanks for the URLs. 
Ah yeah, replaced line 1049 with a DCHECK.


http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@551
PS2, Line 551: L
> nit. May need to assure that child0 is a reference to a column.
Sure. Actually 'inputSlot' is extracted from 'inputPred' at line 611. Added a check.


http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@604
PS2, Line 604:       if (fileFormats_.contains(HdfsFileFormat.ORC)) {
> I wonder if the complexity can be removed later on (for Parquet). 
Thanks for the links! I also prefer doing this in FE instead of in each executor. If you are ok, I can update the parquet logic in the next update.

Note that one good thing of the current BE codes of HdfsParquetScanner::CreateColIdx2EqConjunctMap() is that we can combine LE and GE into EQUALS if users do give the LE and GE predicates. But we can also optimize this corner case in FE.


http://gerrit.cloudera.org:8080/#/c/17815/2/testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
File testdata/workloads/functional-query/queries/QueryTest/orc-stats.test:

http://gerrit.cloudera.org:8080/#/c/17815/2/testdata/workloads/functional-query/queries/QueryTest/orc-stats.test@364
PS2, Line 364:  1.2345
> In lists on other data types should be added.
Done. Added more tests.



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 26 Sep 2021 11:38:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 8:

ORC-1024 is resolved. PS8 bumps the ORC version to 1.7.0-p3 to contain the fix.


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 18 Oct 2021 14:14:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 29 Aug 2021 19:38:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 2:

(1 comment)

Just a comment to Csaba's.

http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@604
PS2, Line 604:         buildStatsPredicate(analyzer, slotRef, binaryPred, binaryPred.getOp());
> Parquet has a somewhat hacky way of finding EQ predicates in the backend an
I wonder if the complexity can be removed later on (for Parquet). 

For ORC, I like the idea of directly utilizing the EQUALS form of predicate, which should translate to better performance in ORC.



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 16 Sep 2021 19:26:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 6: Code-Review+2

I think Csaba's concern has been addressed in the patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 30 Sep 2021 14:00:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9501/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 27 Sep 2021 00:55:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9633/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 08:51:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 8:

(4 comments)

Looks great. 

Just have some minor comments on code duplications.

http://gerrit.cloudera.org:8080/#/c/17815/8/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/8/be/src/exec/hdfs-orc-scanner.cc@1106
PS8, Line 1106:  RETURN_IF_ERROR(schema_resolver_->ResolveColumn(slot_desc->col_path(),
              :           &node, &pos_field, &missing_field));
              :       if (pos_field || missing_field) continue;
This logic is common to both IS_NULL/IS_NOT_NULL and binary predicate. It can be moved above line 1104.


http://gerrit.cloudera.org:8080/#/c/17815/8/be/src/exec/hdfs-orc-scanner.cc@1124
PS8, Line 1124:  // TODO(IMPALA-10882): push down stats predicates on CHAR/VARCHAR.
              :     if (const_expr->type().type == TYPE_CHAR || const_expr->type().type == TYPE_VARCHAR
              :         || slot_desc->type().type == TYPE_CHAR
              :         || slot_desc->type().type == TYPE_VARCHAR) {
              :       continue;
              :     }
A DCHECK() here should be sufficient as the types have been checked in FE. 

We should also do a test on TIMESTAMP too.


http://gerrit.cloudera.org:8080/#/c/17815/8/be/src/exec/hdfs-orc-scanner.cc@1141
PS8, Line 1141: // TODO(IMPALA-10882): push down min-max predicates on CHAR/VARCHAR.
              :     if (node->getKind() == orc::CHAR || node->getKind() == orc::VARCHAR) continue;
Check TIMESTAMP too?


http://gerrit.cloudera.org:8080/#/c/17815/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@587
PS8, Line 587:  // TODO(IMPALA-10882): Push down Min-Max predicates of CHAR/VARCHAR to ORC reader
             :     // TODO(IMPALA-10915): Push down Min-Max predicates of TIMESTAMP to ORC reader
             :     if (fileFormats_.contains(HdfsFileFormat.ORC) &&
             :         (slotDesc.getType() == Type.CHAR || slotDesc.getType() == Type.VARCHAR ||
             :             slotDesc.getType() == Type.TIMESTAMP)) {
             :       return;
             :     }
The check logic on non-supported type for ORC push-down expression is duplicated: here and inside tryComputeInListStatsPredicate() at line 623.

Maybe some refactoring?



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 18 Oct 2021 17:39:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7435/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 29 Aug 2021 13:20:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046
PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) {
              :     // ORC reader only supports pushing down predicates that constant parts are literal.
              :     // We could get non-literal expr if expr rewrites are disabled.
              :     if (!eval->root().GetChild(i)->IsLiteral()) return false;
              :     in_list.emplace_back(GetLiteralSearchArguments(
              :         eval, i, slot_desc->type(), &predicate_type));
              :   }
> Yes, here is the rule: https://github.com/apache/impala/blob/beb8019f5300bb
Okay. That fits my understanding of constant folding. Thanks for the URLs. 

So if we have tested the presence of literals in buildOrcInListStatsPredicate(), can we assume these literals will be saved in the plan and available in BE to build the In-list predicates with (i.e., to remove line 1049)?



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 31 Aug 2021 16:39:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 5:

(13 comments)

Looks very good. I added some more comments (most of them are minor). 

For plan tests, I wonder if there is one in this area that can demonstrate predicates being push-down. This will be very useful in diagnosing plan issues.

http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.h@324
PS5, Line 324: min/max
nit. stats


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.h@339
PS5, Line 339: min/max
nit. stats


http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046
PS2, Line 1046: 
              : 
              : bool HdfsOrcScanner::PrepareInListPredicates(const orc::Type* orc_col,
              :     SlotDescriptor* slot_desc, ScalarExprEvaluator* eval,
              :     orc::SearchArgumentBuilder* sarg) {
              :   DCHECK(orc_col != nullptr);
              :   s
> Ah yeah, replaced line 1049 with a DCHECK.
Done


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.cc@1064
PS5, Line 1064: are
nit. is.

In the future, it is better to do the conversion work in FE to avoid doing the same work #threads times in BE.


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.cc@1119
PS5, Line 1119: min-max
nit. stats.


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-orc-scanner.cc@1136
PS5, Line 1136: // TODO(IMPALA-10882): push down min-max predicates on CHAR/VARCHAR.
              :     if (node->getKind() == orc::CHAR || node->getKind() == orc::VARCHAR) continue;
can be removed since the test has been done in line 1121 and 1122.


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-scan-node-base.cc@216
PS5, Line 216: min max
nit stats


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-scan-node-base.cc@444
PS5, Line 444:  min max
nit. stats


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-scan-node-base.cc@524
PS5, Line 524: min max
nit. stats


http://gerrit.cloudera.org:8080/#/c/17815/5/be/src/exec/hdfs-scan-node-base.cc@633
PS5, Line 633: min max
nit. stats


http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@551
PS2, Line 551: L
> Sure. Actually 'inputSlot' is extracted from 'inputPred' at line 611. Added
Done


http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@604
PS2, Line 604:             BinaryPredicate.Operator.GE);
> Thanks for the links! I also prefer doing this in FE instead of in each exe
Yes, updating it in next patch sounds good to me.


http://gerrit.cloudera.org:8080/#/c/17815/5/testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
File testdata/workloads/functional-query/queries/QueryTest/orc-stats.test:

http://gerrit.cloudera.org:8080/#/c/17815/5/testdata/workloads/functional-query/queries/QueryTest/orc-stats.test@214
PS5, Line 214: # Test on predicate x < a for float that can't filter out any RowGroups.
I wonder if boolean type is supported by ORC for push-down. If so, we may need to add a couple here for the type.



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 27 Sep 2021 14:16:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17815/3/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/3/be/src/exec/hdfs-orc-scanner.cc@1096
PS3, Line 1096:     SlotDescriptor* slot_desc = stats_tuple_desc->slots()[i];
> line too long (121 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17815/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@601
PS3, Line 601:         buildBinaryStatsPredicate(analyzer, slotRef, binaryPred,
> line too long (94 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17815/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@602
PS3, Line 602:             BinaryPredicate.Operator.LE);
> line too long (94 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 27 Sep 2021 02:22:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7488/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 26 Sep 2021 11:39:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7506/


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 30 Sep 2021 20:22:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7506/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 30 Sep 2021 14:02:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 )

Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9617/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 18 Oct 2021 14:33:42 +0000
Gerrit-HasComments: No