You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org> on 2020/01/22 18:23:41 UTC

[Impala-ASF-CR] IMPALA-8110 Fix the Parquet stats filtering issue to correctly handle narrowed int types

Wenzhe Zhou has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15087


Change subject: IMPALA-8110 Fix the Parquet stats filtering issue to correctly handle narrowed int types
......................................................................

IMPALA-8110 Fix the Parquet stats filtering issue to correctly handle narrowed int types

This patch add validation for the paired stats value for int8 and int16 data type
when reading min/max column stats value from Parquet file.

Testing:
1) Manual test - create table with column as int32 type, intert some values, then
   alter table to change the column data type as tinyint (int8) or smallint (int16),
   make sure the query return correct number of rows when PARQUET_READ_STATISTICS is
   set as 1.
2) Passed pre-review-test on Jenkins.

Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
3 files changed, 40 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/15087/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15087/5/be/src/exec/parquet/parquet-column-stats.cc
File be/src/exec/parquet/parquet-column-stats.cc:

http://gerrit.cloudera.org:8080/#/c/15087/5/be/src/exec/parquet/parquet-column-stats.cc@51
PS5, Line 51: has_paired_stats
This  variable seems redundant as it is set to true in all cases of the switch. Using paired_stats_value's nullness seems enough to express whether the paired stat exists.


http://gerrit.cloudera.org:8080/#/c/15087/5/be/src/exec/parquet/parquet-column-stats.cc@116
PS5, Line 116: has_paired_stats
I think that this could be simplified a lot - if has_paired_stats is false, and the type is a TINYINT/SMALLINT, then we can simply return false without parsing the stats. The same applies to the case when the stat exists, but we couldn't parse it (around line 106) - we should immediately return and in the rest of the switch case we should assume that the paired stats exists and was read successfully.


http://gerrit.cloudera.org:8080/#/c/15087/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/15087/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@691
PS5, Line 691: PARQUET_READ_STATISTICS
PARQUET_READ_STATISTICS=1 is the default, so it is possibe to omit these lines.


http://gerrit.cloudera.org:8080/#/c/15087/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@692
PS5, Line 692: <
Please add a test with = and > to be sure that those work too.



-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 11:34:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................

IMPALA-8110: Fix the Parquet stats filtering issue to correctly
handle narrowed integer types

This patch adds validation for the paired stats values of tinyint
and smallint column data type when reading min/max column stats
value from Parquet file.

Testing:
 - Did Manual tests: create table with column as int type, intert
   some values, then alter table to change the column data type as
   tinyint (int8), insert more values, verify that the query return
   correct number of rows when PARQUET_READ_STATISTICS is set as 1.
   Did similar tests to change column data type from int to smallint,
   and from smallint to tinyint.
 - Added automatic test cases in parquet-stats.test for column data
   type been changed from int to tinyint, from smallint to tinyint
   and from int to smallint.
 - Passed EE tests.
 - Passed all core tests.

Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
4 files changed, 124 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/15087/7
-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................

IMPALA-8110: Fix the Parquet stats filtering issue to correctly
handle narrowed integer types

This patch adds validation for the paired stats values of tinyint
and smallint column data type when reading min/max column stats
value from Parquet file.

Testing:
 - Did Manual tests: create table with column as int type, intert
   some values, then alter table to change the column data type as
   tinyint (int8), insert more values, verify that the query return
   correct number of rows when PARQUET_READ_STATISTICS is set as 1.
   Did similar tests to change column data type from int to smallint,
   and from smallint to tinyint.
 - Added automatic test cases in parquet-stats.test for column data
   type been changed from int to tinyint, from smallint to tinyint
   and from int to smallint.
 - Passed EE tests.
 - Passed all core tests.

Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
4 files changed, 121 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/15087/8
-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-8110: Fix Parquet min/max filters for narrowed integer types

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix Parquet min/max filters for narrowed integer types
......................................................................

IMPALA-8110: Fix Parquet min/max filters for narrowed integer types

This patch adds validation for the paired stats values of tinyint
and smallint column data type when reading min/max column stats
value from Parquet file.

Testing:
 - Added automatic test cases in parquet-stats.test for column data
   type been changed from int to tinyint, from smallint to tinyint
   and from int to smallint.
 - Passed EE tests.
 - Passed all core tests.

Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
4 files changed, 121 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/15087/9
-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................


Patch Set 7: Code-Review+1

(4 comments)

Thanks for the fixes so far! A few more nits, mainly about readability and compactness.

http://gerrit.cloudera.org:8080/#/c/15087/7/be/src/exec/parquet/parquet-column-stats.cc
File be/src/exec/parquet/parquet-column-stats.cc:

http://gerrit.cloudera.org:8080/#/c/15087/7/be/src/exec/parquet/parquet-column-stats.cc@100
PS7, Line 100:         return false;
here + for SMALLINT: Fits to a one line if without braces.


http://gerrit.cloudera.org:8080/#/c/15087/7/be/src/exec/parquet/parquet-column-stats.cc@102
PS7, Line 102:         ret = ColumnStats<int32_t>::DecodePlainValue(*paired_stats_value,
             :             &paired_stats_val, parquet::Type::INT32);
here + for SMALLINT: doesn't need to be in an else block, as the "if" block returns.


http://gerrit.cloudera.org:8080/#/c/15087/7/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/15087/7/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@690
PS7, Line 690: i
An idea to improve readability: i / j / k could be renamed to contain their type, e.g. i32_to_8, i16_to_8, i32_to_16.


http://gerrit.cloudera.org:8080/#/c/15087/7/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@717
PS7, Line 717: 200
Here and in other = predicates: it seems more interesting to test for the overflown value instead of the out of range positive value (where the result of the count should be 1), as we are mainly hunting for false negatives in the tests.



-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Feb 2020 14:48:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5160/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Feb 2020 19:10:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................

IMPALA-8110: Fix the Parquet stats filtering issue to correctly
handle narrowed integer types

This patch adds validation for the paired stats values of tinyint
and smallint column data type when reading min/max column stats
value from Parquet file.

Testing:
 - Did Manual tests: create table with column as int type, intert
   some values, then alter table to change the column data type as
   tinyint (int8), insert more values, verify that the query return
   correct number of rows when PARQUET_READ_STATISTICS is set as 1.
   Did similar tests to change column data type from int to smallint,
   and from smallint to tinyint.
 - Added automatic test cases in parquet-stats.test for column data
   type been changed from int to tinyint, from smallint to tinyint
   and from int to smallint.
 - Passed EE tests.
 - Passed all core tests.

Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
4 files changed, 126 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/15087/6
-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG@10
PS3, Line 10: add
> nit: adds
will fix it


http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG@14
PS3, Line 14: ting:
> Please create an automatic test for this. Checking the example in the jira 
will add automatic tests in parquet-stats.test.


http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG@16
PS3, Line 16:    some values, then alter table to change the column data type as
> nit: please wrap the commit message at 72
will fix it.


http://gerrit.cloudera.org:8080/#/c/15087/3/be/src/exec/parquet/parquet-column-stats.cc
File be/src/exec/parquet/parquet-column-stats.cc:

http://gerrit.cloudera.org:8080/#/c/15087/3/be/src/exec/parquet/parquet-column-stats.cc@54
PS3, Line 54: tatsField::MIN:
> This check is needed with max_value before using stats.max_value for paired
will fix it.


http://gerrit.cloudera.org:8080/#/c/15087/3/be/src/exec/parquet/parquet-column-stats.cc@106
PS3, Line 106: (ColumnStats<int32
> I think that we should return false even if the paired stats do not exist -
will fix it as suggested.



-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 00:37:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5545/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 01:30:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed int types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed int types
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5498/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Jan 2020 20:40:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8110: Fix Parquet min/max filters for narrowed integer types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix Parquet min/max filters for narrowed integer types
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5308/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sun, 09 Feb 2020 18:27:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8110: Fix Parquet min/max filters for narrowed integer types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix Parquet min/max filters for narrowed integer types
......................................................................


Patch Set 10: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sun, 09 Feb 2020 23:18:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................


Patch Set 8:

(2 comments)

Thanks for the changes! The code looks good to me, I have comments only about the commit message. Sorry for not pointing this out earlier.

http://gerrit.cloudera.org:8080/#/c/15087/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15087/8//COMMIT_MSG@7
PS8, Line 7: IMPALA-8110: Fix the Parquet stats filtering issue to correctly
           : handle narrowed integer types
nit: can we use a bit shorter title? e.g. Fix Parquet min/max filters for narrowed integers types


http://gerrit.cloudera.org:8080/#/c/15087/8//COMMIT_MSG@15
PS8, Line 15:  - Did Manual tests: create table with column as int type, intert
            :    some values, then alter table to change the column data type as
            :    tinyint (int8), insert more values, verify that the query return
            :    correct number of rows when PARQUET_READ_STATISTICS is set as 1.
            :    Did similar tests to change column data type from int to smallint,
            :    and from smallint to tinyint.
I think that this comment about manual testing is no longer, informative, as the .test file covers all these cases.



-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 08 Feb 2020 09:50:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5150/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Feb 2020 05:13:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5544/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jan 2020 01:21:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8110 Fix the Parquet stats filtering issue to correctly handle narrowed int types

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110 Fix the Parquet stats filtering issue to correctly handle narrowed int types
......................................................................

IMPALA-8110 Fix the Parquet stats filtering issue to correctly handle narrowed int types

This patch add validation for the paired stats value for int8 and int16 data type
when reading min/max column stats value from Parquet file.

Testing:
1) Manual test - create table with column as int32 type, intert some values, then
   alter table to change the column data type as tinyint (int8) or smallint (int16),
   make sure the query return correct number of rows when PARQUET_READ_STATISTICS is
   set as 1.
2) Passed pre-review-test on Jenkins.

Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
3 files changed, 40 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/15087/2
-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed int types

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed int types
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG@10
PS3, Line 10: add
nit: adds


http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG@14
PS3, Line 14: Manual test
Please create an automatic test for this. Checking the example in the jira with <, =, > predicates + the sames with smallint instead of int seems enough to me.

A possible place for the test: https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test


http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG@16
PS3, Line 16:    smallint (int16), make sure the query return correct number of rows when
nit: please wrap the commit message at 72


http://gerrit.cloudera.org:8080/#/c/15087/3/be/src/exec/parquet/parquet-column-stats.cc
File be/src/exec/parquet/parquet-column-stats.cc:

http://gerrit.cloudera.org:8080/#/c/15087/3/be/src/exec/parquet/parquet-column-stats.cc@54
PS3, Line 54: stats.__isset.min_value
This check is needed with max_value before using stats.max_value for paired stats


http://gerrit.cloudera.org:8080/#/c/15087/3/be/src/exec/parquet/parquet-column-stats.cc@106
PS3, Line 106: check_paired_stats
I think that we should return false even if the paired stats do not exist - e.g. assume the case in the jira with [1,201] where 201 was not written to the stats for some reason (I do not know of such a writer) and the predicate is "i < 1". Even if 1 was the smallest value as int32, large values that overflow during the conversion can become negative, so dropping the whole row group is not correct.

So I think that min/max values can be only valid if both the stat and its paired stat exist and within the valid range.



-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Jan 2020 16:58:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15087/7/be/src/exec/parquet/parquet-column-stats.cc
File be/src/exec/parquet/parquet-column-stats.cc:

http://gerrit.cloudera.org:8080/#/c/15087/7/be/src/exec/parquet/parquet-column-stats.cc@100
PS7, Line 100:         return false;
> here + for SMALLINT: Fits to a one line if without braces.
will remove braces and make the code to one line.


http://gerrit.cloudera.org:8080/#/c/15087/7/be/src/exec/parquet/parquet-column-stats.cc@102
PS7, Line 102:         ret = ColumnStats<int32_t>::DecodePlainValue(*paired_stats_value,
             :             &paired_stats_val, parquet::Type::INT32);
> here + for SMALLINT: doesn't need to be in an else block, as the "if" block
will remove else.


http://gerrit.cloudera.org:8080/#/c/15087/7/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/15087/7/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@690
PS7, Line 690: i
> An idea to improve readability: i / j / k could be renamed to contain their
Will change column names as suggested.


http://gerrit.cloudera.org:8080/#/c/15087/7/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@717
PS7, Line 717: 200
> Here and in other = predicates: it seems more interesting to test for the o
will change code to check overlown value.



-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Feb 2020 18:44:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed int types

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed int types
......................................................................

IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle
narrowed int types

This patch add validation for the paired stats value for int8 and int16
data type when reading min/max column stats value from Parquet file.

Testing:
1) Manual test - create table with column as int32 type, intert some values,
   then alter table to change the column data type as tinyint (int8) or
   smallint (int16), make sure the query return correct number of rows when
   PARQUET_READ_STATISTICS is set as 1.
2) Passed pre-review-test on Jenkins.

Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
3 files changed, 40 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/15087/3
-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed int types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed int types
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5496/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Jan 2020 20:25:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8110: Fix Parquet min/max filters for narrowed integer types

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix Parquet min/max filters for narrowed integer types
......................................................................


Patch Set 9: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sun, 09 Feb 2020 18:25:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8110: Fix Parquet min/max filters for narrowed integer types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix Parquet min/max filters for narrowed integer types
......................................................................


Patch Set 10: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sun, 09 Feb 2020 18:27:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15087/5/be/src/exec/parquet/parquet-column-stats.cc
File be/src/exec/parquet/parquet-column-stats.cc:

http://gerrit.cloudera.org:8080/#/c/15087/5/be/src/exec/parquet/parquet-column-stats.cc@51
PS5, Line 51: has_paired_stats
> This  variable seems redundant as it is set to true in all cases of the swi
I was thinking we may add other stats field in the future which is defined without paired stats.
Will remote it as suggested.


http://gerrit.cloudera.org:8080/#/c/15087/5/be/src/exec/parquet/parquet-column-stats.cc@116
PS5, Line 116: has_paired_stats
> I think that this could be simplified a lot - if has_paired_stats is false,
Will change code to return true only when both the stats value and its paired stats value are in valid range.


http://gerrit.cloudera.org:8080/#/c/15087/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/15087/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@691
PS5, Line 691: PARQUET_READ_STATISTICS
> PARQUET_READ_STATISTICS=1 is the default, so it is possibe to omit these li
PARQUET_READ_STATISTICS is set as 0 in previous test cases (line 538). To be safe, we should set PARQUET_READ_STATISTICS as 1 here.


http://gerrit.cloudera.org:8080/#/c/15087/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@692
PS5, Line 692: <
> Please add a test with = and > to be sure that those work too.
will add test cases as suggested



-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 31 Jan 2020 19:45:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8110 Fix the Parquet stats filtering issue to correctly handle narrowed int types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110 Fix the Parquet stats filtering issue to correctly handle narrowed int types
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5495/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Jan 2020 19:08:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5576/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 31 Jan 2020 20:52:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8110: Fix Parquet min/max filters for narrowed integer types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix Parquet min/max filters for narrowed integer types
......................................................................

IMPALA-8110: Fix Parquet min/max filters for narrowed integer types

This patch adds validation for the paired stats values of tinyint
and smallint column data type when reading min/max column stats
value from Parquet file.

Testing:
 - Added automatic test cases in parquet-stats.test for column data
   type been changed from int to tinyint, from smallint to tinyint
   and from int to smallint.
 - Passed EE tests.
 - Passed all core tests.

Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Reviewed-on: http://gerrit.cloudera.org:8080/15087
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
4 files changed, 121 insertions(+), 8 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 11
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15087/5/be/src/exec/parquet/parquet-column-stats.cc
File be/src/exec/parquet/parquet-column-stats.cc:

http://gerrit.cloudera.org:8080/#/c/15087/5/be/src/exec/parquet/parquet-column-stats.cc@116
PS5, Line 116: ic_cast<int8_t*>
> Will change code to return true only when both the stats value and its pair
will remove boolean variable paired_stats_val_is_set, and return false earlier if ret is false, or paired_stats_value is null.



-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Feb 2020 19:59:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8110: Fix Parquet min/max filters for narrowed integer types

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix Parquet min/max filters for narrowed integer types
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15087/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15087/8//COMMIT_MSG@7
PS8, Line 7: IMPALA-8110: Fix the Parquet stats filtering issue to correctly
           : handle narrowed integer types
> nit: can we use a bit shorter title? e.g. Fix Parquet min/max filters for n
will fix it


http://gerrit.cloudera.org:8080/#/c/15087/8//COMMIT_MSG@15
PS8, Line 15:  - Did Manual tests: create table with column as int type, intert
            :    some values, then alter table to change the column data type as
            :    tinyint (int8), insert more values, verify that the query return
            :    correct number of rows when PARQUET_READ_STATISTICS is set as 1.
            :    Did similar tests to change column data type from int to smallint,
            :    and from smallint to tinyint.
> I think that this comment about manual testing is no longer, informative, a
will remove it



-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 08 Feb 2020 17:37:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................

IMPALA-8110: Fix the Parquet stats filtering issue to correctly
handle narrowed integer types

This patch adds validation for the paired stats values of tinyint
and smallint column data type when reading min/max column stats
value from Parquet file.

Testing:
 - Did Manual tests: create table with column as int type, intert
   some values, then alter table to change the column data type as
   tinyint (int8), insert more values, verify the query return
   correct number of rows when PARQUET_READ_STATISTICS is set as 1.
   Did similar tests to change column data type from int to smallint,
   and from smallint to tinyint.
 - Added test cases in parquet-stats.test for column data type been
   changed from int to tinyint, from smallint to tinyint and from
   int to smallint.
 - Passed all EE tests.
 - Passed pre-review-test on Jenkins, including FE tests, BE tests,
   EE tests, JDBS test and cluster tests.

Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
4 files changed, 94 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/15087/4
-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed integer types
......................................................................

IMPALA-8110: Fix the Parquet stats filtering issue to correctly
handle narrowed integer types

This patch adds validation for the paired stats values of tinyint
and smallint column data type when reading min/max column stats
value from Parquet file.

Testing:
 - Did Manual tests: create table with column as int type, intert
   some values, then alter table to change the column data type as
   tinyint (int8), insert more values, verify that the query return
   correct number of rows when PARQUET_READ_STATISTICS is set as 1.
   Did similar tests to change column data type from int to smallint,
   and from smallint to tinyint.
 - Added automatic test cases in parquet-stats.test for column data
   type been changed from int to tinyint, from smallint to tinyint
   and from int to smallint.
 - Passed EE tests.
 - Passed pre-review-test on Jenkins, including FE tests, BE tests,
   EE tests, JDBS test and cluster tests.

Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
4 files changed, 94 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/15087/5
-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-8110: Fix Parquet min/max filters for narrowed integer types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix Parquet min/max filters for narrowed integer types
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5179/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 08 Feb 2020 23:19:26 +0000
Gerrit-HasComments: No