You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2020/05/19 21:52:59 UTC

[Impala-ASF-CR] IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL

Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15959


Change subject: IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL
......................................................................

IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL

The min/max stat predicate is allowed when the left side is not a slot
but an implicit cast of a slot. This could lead to incorrectly dropping
a row group or page when min/max values were not castable to the type,
e.g. it is string with a pre 1400 date and we want to cast it to a
timestamp.

The change should only affect timestamps, as dates return an error
on failed cast from a string, and numeric types won't be cast
implicitly from string.

The fix is simply to accept NULL result for the min/max predicate in
the backend. Note that the alternative solution of casting the right
(const) side of the predicate instead of the left side would be tricky,
as more than one string can mean the same timestamp, e.g.
"1970-01-01" and "1970-01-01 00:00:00".

Testing:
- added an EE regression test and ran it

Change-Id: I35f66e1dfc4523624c249073004f9d5eddd07bb6
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exprs/scalar-expr-evaluator.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
3 files changed, 22 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/15959/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I35f66e1dfc4523624c249073004f9d5eddd07bb6
Gerrit-Change-Number: 15959
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>

[Impala-ASF-CR] IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15959 )

Change subject: IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5879/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/15959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35f66e1dfc4523624c249073004f9d5eddd07bb6
Gerrit-Change-Number: 15959
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 May 2020 13:38:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15959 )

Change subject: IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6102/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35f66e1dfc4523624c249073004f9d5eddd07bb6
Gerrit-Change-Number: 15959
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 19 May 2020 22:45:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15959 )

Change subject: IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35f66e1dfc4523624c249073004f9d5eddd07bb6
Gerrit-Change-Number: 15959
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 May 2020 13:38:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15959 )

Change subject: IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/15959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35f66e1dfc4523624c249073004f9d5eddd07bb6
Gerrit-Change-Number: 15959
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 May 2020 18:40:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15959 )

Change subject: IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35f66e1dfc4523624c249073004f9d5eddd07bb6
Gerrit-Change-Number: 15959
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 May 2020 03:21:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15959 )

Change subject: IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL
......................................................................

IMPALA-9707: fix Parquet stat filtering when min/max values are cast to NULL

The min/max stat predicate is allowed when the left side is not a slot
but an implicit cast of a slot. This could lead to incorrectly dropping
a row group or page when min/max values were not castable to the type,
e.g. it is string with a pre 1400 date and we want to cast it to a
timestamp.

The change should only affect timestamps, as dates return an error
on failed cast from a string, and numeric types won't be cast
implicitly from string.

The fix is simply to accept NULL result for the min/max predicate in
the backend. Note that the alternative solution of casting the right
(const) side of the predicate instead of the left side would be tricky,
as more than one string can mean the same timestamp, e.g.
"1970-01-01" and "1970-01-01 00:00:00".

Testing:
- added an EE regression test and ran it

Change-Id: I35f66e1dfc4523624c249073004f9d5eddd07bb6
Reviewed-on: http://gerrit.cloudera.org:8080/15959
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exprs/scalar-expr-evaluator.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
3 files changed, 22 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I35f66e1dfc4523624c249073004f9d5eddd07bb6
Gerrit-Change-Number: 15959
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>