You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2016/10/28 15:24:16 UTC

[Impala-ASF-CR] IMPALA-3487: validate decimal type in Avro file schema

Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4876

Change subject: IMPALA-3487: validate decimal type in Avro file schema
......................................................................

IMPALA-3487: validate decimal type in Avro file schema

This patch prevents an invalid decimal type in an Avro file schema from
crashing Impala. Most invalid Avro schemas are caught by the frontend,
but file schemas still need to be validated by the backend.

After this patch files with bad schemas are skipped.

Testing:
This was hit very rarely by the scanner fuzzing. Added a regression test that scans
a file with a bad schema.

Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/types.h
M be/src/util/avro-util.cc
M be/src/util/avro-util.h
M common/thrift/generate_error_codes.py
M testdata/bad_avro_snap/README
A testdata/bad_avro_snap/invalid_decimal_schema.avro
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
11 files changed, 102 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/4876/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-4387: validate decimal type in Avro file schema
......................................................................

IMPALA-4387: validate decimal type in Avro file schema

This patch prevents an invalid decimal type in an Avro file schema from
crashing Impala. Most invalid Avro schemas are caught by the frontend,
but file schemas still need to be validated by the backend.

After this patch files with bad schemas are skipped.

Testing:
This was hit very rarely by the scanner fuzzing. Added a regression test that scans
a file with a bad schema.

Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/types.h
M be/src/util/avro-util.cc
M be/src/util/avro-util.h
M common/thrift/generate_error_codes.py
M testdata/bad_avro_snap/README
A testdata/bad_avro_snap/invalid_decimal_schema.avro
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
11 files changed, 102 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/4876/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi...@cloudera.com>

[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4387: validate decimal type in Avro file schema
......................................................................


Patch Set 4: Verified-1

Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/379/

-- 
To view, visit http://gerrit.cloudera.org:8080/4876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change.

Change subject: IMPALA-4387: validate decimal type in Avro file schema
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4876/2//COMMIT_MSG
Commit Message:

PS2, Line 16: scans
nit: wrap at 80 chars.


http://gerrit.cloudera.org:8080/#/c/4876/2/be/src/util/avro-util.cc
File be/src/util/avro-util.cc:

Line 109:     case AVRO_BYTES:
maybe add a // fallthrough comment at the end of the line?


Line 139:       return Status::OK();
Wouldn't it be better to return a non-OK status in a release build? If not, can you add a comment explaining why / what the consequences are?


http://gerrit.cloudera.org:8080/#/c/4876/2/be/src/util/avro-util.h
File be/src/util/avro-util.h:

PS2, Line 84: name
column_name?


-- 
To view, visit http://gerrit.cloudera.org:8080/4876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4387: validate decimal type in Avro file schema
......................................................................


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4387: validate decimal type in Avro file schema
......................................................................


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-4387: validate decimal type in Avro file schema
......................................................................

IMPALA-4387: validate decimal type in Avro file schema

This patch prevents an invalid decimal type in an Avro file schema from
crashing Impala. Most invalid Avro schemas are caught by the frontend,
but file schemas still need to be validated by the backend.

After this patch files with bad schemas are skipped.

Testing:
This was hit very rarely by the scanner fuzzing. Added a regression test that
scans a file with a bad schema.

Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/types.h
M be/src/util/avro-util.cc
M be/src/util/avro-util.h
M common/thrift/generate_error_codes.py
M testdata/bad_avro_snap/README
A testdata/bad_avro_snap/invalid_decimal_schema.avro
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
11 files changed, 104 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/4876/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi...@cloudera.com>

[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4387: validate decimal type in Avro file schema
......................................................................


Patch Set 4: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/4876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change.

Change subject: IMPALA-4387: validate decimal type in Avro file schema
......................................................................


Patch Set 3: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4387: validate decimal type in Avro file schema
......................................................................


IMPALA-4387: validate decimal type in Avro file schema

This patch prevents an invalid decimal type in an Avro file schema from
crashing Impala. Most invalid Avro schemas are caught by the frontend,
but file schemas still need to be validated by the backend.

After this patch files with bad schemas are skipped.

Testing:
This was hit very rarely by the scanner fuzzing. Added a regression test that
scans a file with a bad schema.

Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
Reviewed-on: http://gerrit.cloudera.org:8080/4876
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Internal Jenkins
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/types.h
M be/src/util/avro-util.cc
M be/src/util/avro-util.h
M common/thrift/generate_error_codes.py
M testdata/bad_avro_snap/README
A testdata/bad_avro_snap/invalid_decimal_schema.avro
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
11 files changed, 104 insertions(+), 27 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi...@cloudera.com>

[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4387: validate decimal type in Avro file schema
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4876/2//COMMIT_MSG
Commit Message:

PS2, Line 16: scans
> nit: wrap at 80 chars.
Done


http://gerrit.cloudera.org:8080/#/c/4876/2/be/src/util/avro-util.cc
File be/src/util/avro-util.cc:

Line 109:     case AVRO_BYTES:
> maybe add a // fallthrough comment at the end of the line?
It doesn't seem necessary when the case labels are adjacent like this, we don't do it in other parts of the codebase.


Line 139:       return Status::OK();
> Wouldn't it be better to return a non-OK status in a release build? If not,
Good point, this value comes from the Avro library so we don't have any guarantee it is a supported type. Fixed it.


http://gerrit.cloudera.org:8080/#/c/4876/2/be/src/util/avro-util.h
File be/src/util/avro-util.h:

PS2, Line 84: name
> column_name?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4876
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi...@cloudera.com>
Gerrit-HasComments: Yes