You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Dan Hecht (Code Review)" <ge...@cloudera.org> on 2016/02/26 03:10:43 UTC
[Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-1886/IMPALA-2154: Add support for multi-stream bz2/gzip compressed files.
Dan Hecht has posted comments on this change.
Change subject: IMPALA-1886/IMPALA-2154: Add support for multi-stream bz2/gzip compressed files.
......................................................................
Patch Set 14:
(8 comments)
http://gerrit.cloudera.org:8080/#/c/2219/14/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:
Line 484: }
L469-484 should be restructured as:
if (stream_->eosr()) {
if (stream_end) {
*eosr = true;
} else {
return Status(truncated);
}
} else if (*decompressed_len == 0) {
return Status(NO_PRORESS);
}
http://gerrit.cloudera.org:8080/#/c/2219/14/be/src/util/codec.h
File be/src/util/codec.h:
Line 130: /// stream_end: if decompressor consumed all input and reached end of compressed stream.
I think we should remove the "if decompressor consumed all input" condition, and this should just signify that the end of the 'output' buffer corresponds to the end of a compression stream (and then see my comments in the code). The caller should (and already does check) that there is no more input via the stream_->eosr() check.
http://gerrit.cloudera.org:8080/#/c/2219/14/be/src/util/decompress.cc
File be/src/util/decompress.cc:
Line 89: *stream_end = false;
move this to L91 inside the loop. See comment at line 113 for why.
Line 113: if (stream_.avail_in == 0) *stream_end = true;
As mentioned elsewhere, let's simplify this and just uncondtionally set *stream_end = true here. The avail_in check isn't interesting, since only the caller knows whether there really is more input or not.
Line 342: *stream_end = false;
same comments as above
Line 352: if (stream_.avail_in == 0) *stream_end = true;
and here
http://gerrit.cloudera.org:8080/#/c/2219/14/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:
Line 226: decompressed
compressed
http://gerrit.cloudera.org:8080/#/c/2219/14/testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test
File testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test:
Line 194: decompressed
this should say "compressed" file. From the users perspective, it's the compressed file that was malformed.
--
To view, visit http://gerrit.cloudera.org:8080/2219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbe617d03a69953f0bf3aa0f7c30d34bc612f9f8
Gerrit-PatchSet: 14
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Juan Yu <jy...@cloudera.com>
Gerrit-Reviewer: Skye Wanderman-Milne <sk...@cloudera.com>
Gerrit-HasComments: Yes