You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2021/08/03 09:06:55 UTC

[Impala-ASF-CR] IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17748


Change subject: IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas
......................................................................

IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

The previous patch added checks on illegal decimal schemas of parquet
files. However, it doesn't return a non-ok status in
ParquetMetadataUtils::ValidateColumn if abort_on_error is set to false.
So we continue to use the illegal file schema and hit the DCHECK.

This patch fixes this and adding test coverage for illegal decimal
schemas.

Tests:
 - Add a bad parquet file with illegal decimal schemas.
 - Add e2e tests on the file.

Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
---
M be/src/exec/parquet/parquet-metadata-utils.cc
M common/thrift/generate_error_codes.py
M testdata/bad_parquet_data/README
A testdata/bad_parquet_data/illegal_decimals.parq
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/parquet-abort-on-error.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-continue-on-error.test
8 files changed, 144 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17748/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Gerrit-Change-Number: 17748
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17748 )

Change subject: IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7371/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Gerrit-Change-Number: 17748
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Aug 2021 01:48:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17748 )

Change subject: IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas
......................................................................

IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

The previous patch added checks on illegal decimal schemas of parquet
files. However, it doesn't return a non-ok status in
ParquetMetadataUtils::ValidateColumn if abort_on_error is set to false.
So we continue to use the illegal file schema and hit the DCHECK.

This patch fixes this and adding test coverage for illegal decimal
schemas.

Tests:
 - Add a bad parquet file with illegal decimal schemas.
 - Add e2e tests on the file.
 - Ran test_fuzz_decimal_tbl 100 times. Saw the errors are caught as
   expected.

Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Reviewed-on: http://gerrit.cloudera.org:8080/17748
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/parquet/parquet-metadata-utils.cc
M common/thrift/generate_error_codes.py
M testdata/bad_parquet_data/README
A testdata/bad_parquet_data/illegal_decimals.parq
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/parquet-abort-on-error.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-continue-on-error.test
8 files changed, 144 insertions(+), 3 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Gerrit-Change-Number: 17748
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17748 )

Change subject: IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9225/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Gerrit-Change-Number: 17748
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Aug 2021 09:26:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17748

to look at the new patch set (#3).

Change subject: IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas
......................................................................

IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

The previous patch added checks on illegal decimal schemas of parquet
files. However, it doesn't return a non-ok status in
ParquetMetadataUtils::ValidateColumn if abort_on_error is set to false.
So we continue to use the illegal file schema and hit the DCHECK.

This patch fixes this and adding test coverage for illegal decimal
schemas.

Tests:
 - Add a bad parquet file with illegal decimal schemas.
 - Add e2e tests on the file.
 - Ran test_fuzz_decimal_tbl 100 times. Saw the errors are caught as
   expected.

Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
---
M be/src/exec/parquet/parquet-metadata-utils.cc
M common/thrift/generate_error_codes.py
M testdata/bad_parquet_data/README
A testdata/bad_parquet_data/illegal_decimals.parq
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/parquet-abort-on-error.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-continue-on-error.test
8 files changed, 144 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17748/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Gerrit-Change-Number: 17748
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17748

to look at the new patch set (#2).

Change subject: IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas
......................................................................

IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

The previous patch added checks on illegal decimal schemas of parquet
files. However, it doesn't return a non-ok status in
ParquetMetadataUtils::ValidateColumn if abort_on_error is set to false.
So we continue to use the illegal file schema and hit the DCHECK.

This patch fixes this and adding test coverage for illegal decimal
schemas.

Tests:
 - Add a bad parquet file with illegal decimal schemas.
 - Add e2e tests on the file.
 - Ran test_fuzz_decimal_tbl 100 times. Saw the errors are caught as
   expected.

Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
---
M be/src/exec/parquet/parquet-metadata-utils.cc
M common/thrift/generate_error_codes.py
M testdata/bad_parquet_data/README
A testdata/bad_parquet_data/illegal_decimals.parq
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/parquet-abort-on-error.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-continue-on-error.test
8 files changed, 144 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17748/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Gerrit-Change-Number: 17748
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17748 )

Change subject: IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas
......................................................................


Patch Set 2: Code-Review+2

(2 comments)

Thanks you for fixing this and adding tests!

http://gerrit.cloudera.org:8080/#/c/17748/2/testdata/bad_parquet_data/README
File testdata/bad_parquet_data/README:

http://gerrit.cloudera.org:8080/#/c/17748/2/testdata/bad_parquet_data/README@14
PS2, Line 14: Both of them
nit: 'All of them'?


http://gerrit.cloudera.org:8080/#/c/17748/2/testdata/bad_parquet_data/README@15
PS2, Line 15: modifyiing
modifying



-- 
To view, visit http://gerrit.cloudera.org:8080/17748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Gerrit-Change-Number: 17748
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 12:26:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17748 )

Change subject: IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9241/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Gerrit-Change-Number: 17748
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Aug 2021 02:08:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17748 )

Change subject: IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Gerrit-Change-Number: 17748
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Aug 2021 01:48:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17748 )

Change subject: IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas
......................................................................


Patch Set 3: Code-Review+2

(2 comments)

Carry on Zoltan's +2.

http://gerrit.cloudera.org:8080/#/c/17748/2/testdata/bad_parquet_data/README
File testdata/bad_parquet_data/README:

http://gerrit.cloudera.org:8080/#/c/17748/2/testdata/bad_parquet_data/README@14
PS2, Line 14: All of them 
> nit: 'All of them'?
Done


http://gerrit.cloudera.org:8080/#/c/17748/2/testdata/bad_parquet_data/README@15
PS2, Line 15: modifying 
> modifying
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Gerrit-Change-Number: 17748
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Aug 2021 01:47:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17748 )

Change subject: IMPALA-10808: (addendum) Abort on illegal decimal parquet schemas
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Gerrit-Change-Number: 17748
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Aug 2021 07:55:25 +0000
Gerrit-HasComments: No