You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Skye Wanderman-Milne (Code Review)" <ge...@cloudera.org> on 2016/05/14 00:43:19 UTC
[Impala-CR](cdh5-trunk) PREVIEW IMPALA-3441: check for malformed Avro data (prototype)
Skye Wanderman-Milne has uploaded a new change for review.
http://gerrit.cloudera.org:8080/3072
Change subject: PREVIEW IMPALA-3441: check for malformed Avro data (prototype)
......................................................................
PREVIEW IMPALA-3441: check for malformed Avro data (prototype)
This patch adds the plumbing to do more error checking the Avro
scanner (both the codegen'd and interpreted paths), and does the
out-of-bounds checks for encoded ints and at the beginnning of each
tuple.
I ran a local benchmark using the following query:
set num_scanner_threads=1;
select max(i) from default.avro_ints_big;
where avro_ints_big is an Avro table with a single int column
containing ~90MM values. With this patch, the total query time goes
from 1.6s to 1.8s (12% increase), with the MaterializeTupleTime going
from 975ms to 1194ms (22% increase).
The one check missing from this prototype that will have affect the
above benckmark is checking for a valid union value when determining
whether a value is null. I'm working on adding this, and then the
prototype will fully implement this benchmark query.
If we're happy with this overall approach, I can add error checking
for the other types as well.
Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132
---
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/read-write-util.cc
M be/src/exec/read-write-util.h
5 files changed, 133 insertions(+), 76 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/72/3072/1
--
To view, visit http://gerrit.cloudera.org:8080/3072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Skye Wanderman-Milne <sk...@cloudera.com>