You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org> on 2016/05/12 16:50:56 UTC
[Impala-CR](cdh5-trunk) IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables
Bharath Vissapragada has uploaded a new change for review.
http://gerrit.cloudera.org:8080/3045
Change subject: IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables
......................................................................
IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables
Bug: Impalads crash if we query a partitioned table with multiple file
formats and one of them is avro and the base table is non-avro.
Cause: This happens because we don't set avroSchema_ in HdfsTable during
metadata load if the base table is not backed by AVRO file format. Hence
it is not propagated to the avro scanner which doesn't have appropriate
checks to make sure the schema is non-null.
Fix: The fix has two parts.
1. Avro scanner should gracefully handle the case where the avro schema
is not set. Appropriate null checks have been added.
2. avroSchema_ should be set in HdfsTable even if any subset of
partitions are backed by avro. This is done by decoupling the code
that sets the avroSchema_ from HdfsTable#loadSchema() as we need the
partition metadata to make this decision. So we set once the
partition information is loaded.
Testing: This patch adds a new table 'multifileformat_tbl' to the functional
test schema. This table is based with TEXTFILE format with 4 partitions
of different file formats (text, parquet, avro, rcfile). We run a
count(*) on this table to tally the row count. Without this patch,
this query deterministically crashes the impalads.
Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M fe/src/main/java/com/cloudera/impala/catalog/HdfsPartition.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-format.test
7 files changed, 114 insertions(+), 19 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/45/3045/1
--
To view, visit http://gerrit.cloudera.org:8080/3045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>