You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org> on 2016/09/01 19:44:52 UTC

[Impala-CR] IMPALA-3764,3914: fuzz test HDFS scanners and fix parquet bugs found

Hello Internal Jenkins, Tim Armstrong,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/4224

to review the following change.

Change subject: IMPALA-3764,3914: fuzz test HDFS scanners and fix parquet bugs found
......................................................................

IMPALA-3764,3914: fuzz test HDFS scanners and fix parquet bugs found

This adds a test that performs some simple fuzz testing of HDFS
scanners. It creates a copy of a given HDFS table, with each
file in the table corrupted in a random way: either a single
byte is set to a random value, or the file is truncated to a
random length. It then runs a query that scans the whole table
with several different batch_size settings. I made some effort
to make the failures reproducible by explicitly seeding the
random number generator, and providing a mechanism to override
the seed.

The fuzzer has found crashes resulting from corrupted or truncated
input files for RCFile, SequenceFile, Parquet, and Text LZO so far.
Avro only had a small buffer read overrun detected by ASAN.

Includes fixes for Parquet crashes found by the fuzzer, a small
buffer overrun in Avro, and a DCHECK in MemPool.

Initially it is only enabled for Avro, Parquet, and uncompressed
text. As follow-up work we should fix the bugs in the other scanners
and enable the test for them.

We also don't implement abort_on_error=0 correctly in Parquet:
for some file formats, corrupt headers result in the query being
aborted, so an exception will xfail the test.

Testing:
Ran the test with exploration_strategy=exhaustive in a loop locally
with both DEBUG and ASAN builds for a couple of days over a weekend.
Also ran exhaustive private build.

Change-Id: I50cf43195a7c582caa02c85ae400ea2256fa3a3b
Reviewed-on: http://gerrit.cloudera.org:8080/3833
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Internal Jenkins
(cherry picked from commit 5afd9f7df765006c067ef5f57d7f7431fe9e1247)
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/runtime/disk-io-mgr.cc
A be/src/runtime/scoped-buffer.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/compress.cc
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
M be/src/util/rle-encoding.h
M be/src/util/rle-test.cc
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners.py
A tests/query_test/test_scanners_fuzz.py
19 files changed, 427 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/24/4224/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4224
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I50cf43195a7c582caa02c85ae400ea2256fa3a3b
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR] IMPALA-3764,3914: fuzz test HDFS scanners and fix parquet bugs found

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has abandoned this change.

Change subject: IMPALA-3764,3914: fuzz test HDFS scanners and fix parquet bugs found
......................................................................


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/4224
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I50cf43195a7c582caa02c85ae400ea2256fa3a3b
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>