You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2016/06/14 22:01:02 UTC

[Impala-CR](cdh5-trunk) IMPALA-3732: handle string length overflow in avro files

Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3383

Change subject: IMPALA-3732: handle string length overflow in avro files
......................................................................

IMPALA-3732: handle string length overflow in avro files

Avro string lengths are encoded as 64-bit integers. Impala can only
handle up to 32-bit integers, so we need to be careful about handling
out-of-range integers. Negative integers were already handled by a
previous patch, but if a positive 64-bit integer s truncated to a
32-bit integer, the result is a negative length.

This patch fixes CHAR/VARCHAR behaviour, where we can just truncate
the string, and STRING, where we can't truncate the string, so must
return an error.

Testing:
Added unit tests for STRING, CHAR, and VARCHAR that exercise the string
overflow handling.

Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
---
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner-test.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M common/thrift/generate_error_codes.py
5 files changed, 105 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/83/3383/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3732: handle string length overflow in avro files

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3383

to look at the new patch set (#3).

Change subject: IMPALA-3732: handle string length overflow in avro files
......................................................................

IMPALA-3732: handle string length overflow in avro files

Avro string lengths are encoded as 64-bit integers. Impala can only
handle up to 32-bit integers, so we need to be careful about handling
out-of-range integers. Negative integers were already handled by a
previous patch, but if a positive 64-bit integer is truncated to a
32-bit integer, the result can be a negative length.

This patch fixes CHAR/VARCHAR behaviour, where we can just truncate
the string, and STRING, where we can't truncate the string, so must
return an error.

Testing:
Added unit tests for STRING, CHAR, and VARCHAR that exercise the string
overflow handling.

Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
---
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner-test.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M common/thrift/generate_error_codes.py
5 files changed, 108 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/83/3383/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3732: handle string length overflow in avro files

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3732: handle string length overflow in avro files
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3383/1//COMMIT_MSG
Commit Message:

PS1, Line 13: is
> nit: can be
Done


http://gerrit.cloudera.org:8080/#/c/3383/1/be/src/exec/hdfs-avro-scanner-ir.cc
File be/src/exec/hdfs-avro-scanner-ir.cc:

PS1, Line 188: ///
> nit: these should be double slash
Done


PS1, Line 206: ///
> same
Done


http://gerrit.cloudera.org:8080/#/c/3383/1/be/src/exec/hdfs-avro-scanner-test.cc
File be/src/exec/hdfs-avro-scanner-test.cc:

Line 330:   /// Test handling of strings that would overflow the StringValue::len 32-bit integer.
> maybe test both cases where bit 31 is 0 and 1 (i.e. truncated result is pos
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3732: handle string length overflow in avro files

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3732: handle string length overflow in avro files
......................................................................


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3732: handle string length overflow in avro files

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3732: handle string length overflow in avro files
......................................................................


Patch Set 5: Verified-1

Build failed: http://sandbox.jenkins.cloudera.com/job/mikeb-gvm/18/

-- 
To view, visit http://gerrit.cloudera.org:8080/3383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3732: handle string length overflow in avro files

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3732: handle string length overflow in avro files
......................................................................


Patch Set 5: Code-Review+2

Noticed an off-by-one bug in the test code for inlined char fields.

-- 
To view, visit http://gerrit.cloudera.org:8080/3383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3732: handle string length overflow in avro files

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3732: handle string length overflow in avro files
......................................................................


IMPALA-3732: handle string length overflow in avro files

Avro string lengths are encoded as 64-bit integers. Impala can only
handle up to 32-bit integers, so we need to be careful about handling
out-of-range integers. Negative integers were already handled by a
previous patch, but if a positive 64-bit integer is truncated to a
32-bit integer, the result can be a negative length.

This patch fixes CHAR/VARCHAR behaviour, where we can just truncate
the string, and STRING, where we can't truncate the string, so must
return an error.

Testing:
Added unit tests for STRING, CHAR, and VARCHAR that exercise the string
overflow handling.

Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
Reviewed-on: http://gerrit.cloudera.org:8080/3383
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Internal Jenkins
---
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner-test.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M common/thrift/generate_error_codes.py
5 files changed, 108 insertions(+), 16 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/3383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3732: handle string length overflow in avro files

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3383

to look at the new patch set (#2).

Change subject: IMPALA-3732: handle string length overflow in avro files
......................................................................

IMPALA-3732: handle string length overflow in avro files

Avro string lengths are encoded as 64-bit integers. Impala can only
handle up to 32-bit integers, so we need to be careful about handling
out-of-range integers. Negative integers were already handled by a
previous patch, but if a positive 64-bit integer is truncated to a
32-bit integer, the result can be a negative length.

This patch fixes CHAR/VARCHAR behaviour, where we can just truncate
the string, and STRING, where we can't truncate the string, so must
return an error.

Testing:
Added unit tests for STRING, CHAR, and VARCHAR that exercise the string
overflow handling.

Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
---
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner-test.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M common/thrift/generate_error_codes.py
5 files changed, 108 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/83/3383/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3732: handle string length overflow in avro files

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3732: handle string length overflow in avro files
......................................................................


Patch Set 4: Code-Review+2

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/3383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3732: handle string length overflow in avro files

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3732: handle string length overflow in avro files
......................................................................


Patch Set 1: Code-Review+2

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3383/1//COMMIT_MSG
Commit Message:

PS1, Line 13: is
nit: can be


http://gerrit.cloudera.org:8080/#/c/3383/1/be/src/exec/hdfs-avro-scanner-ir.cc
File be/src/exec/hdfs-avro-scanner-ir.cc:

PS1, Line 188: ///
nit: these should be double slash


PS1, Line 206: ///
same


http://gerrit.cloudera.org:8080/#/c/3383/1/be/src/exec/hdfs-avro-scanner-test.cc
File be/src/exec/hdfs-avro-scanner-test.cc:

Line 330:   /// Test handling of strings that would overflow the StringValue::len 32-bit integer.
maybe test both cases where bit 31 is 0 and 1 (i.e. truncated result is positive and negative)?


-- 
To view, visit http://gerrit.cloudera.org:8080/3383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3732: handle string length overflow in avro files

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3383

to look at the new patch set (#5).

Change subject: IMPALA-3732: handle string length overflow in avro files
......................................................................

IMPALA-3732: handle string length overflow in avro files

Avro string lengths are encoded as 64-bit integers. Impala can only
handle up to 32-bit integers, so we need to be careful about handling
out-of-range integers. Negative integers were already handled by a
previous patch, but if a positive 64-bit integer is truncated to a
32-bit integer, the result can be a negative length.

This patch fixes CHAR/VARCHAR behaviour, where we can just truncate
the string, and STRING, where we can't truncate the string, so must
return an error.

Testing:
Added unit tests for STRING, CHAR, and VARCHAR that exercise the string
overflow handling.

Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
---
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner-test.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M common/thrift/generate_error_codes.py
5 files changed, 108 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/83/3383/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>