You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org> on 2016/05/12 16:50:56 UTC

[Impala-CR](cdh5-trunk) IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables

Bharath Vissapragada has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3045

Change subject: IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables
......................................................................

IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables

Bug: Impalads crash if we query a partitioned table with multiple file
formats and one of them is avro and the base table is non-avro.

Cause: This happens because we don't set avroSchema_ in HdfsTable during
metadata load if the base table is not backed by AVRO file format. Hence
it is not propagated to the avro scanner which doesn't have appropriate
checks to make sure the schema is non-null.

Fix: The fix has two parts.

1. Avro scanner should gracefully handle the case where the avro schema
   is not set. Appropriate null checks have been added.

2. avroSchema_ should be set in HdfsTable even if any subset of
   partitions are backed by avro. This is done by decoupling the code
   that sets the avroSchema_ from HdfsTable#loadSchema() as we need the
   partition metadata to make this decision. So we set once the
   partition information is loaded.

Testing: This patch adds a new table 'multifileformat_tbl' to the functional
test schema. This table is based with TEXTFILE format with 4 partitions
of different file formats (text, parquet, avro, rcfile). We run a
count(*) on this table to tally the row count. Without this patch,
this query deterministically crashes the impalads.

Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M fe/src/main/java/com/cloudera/impala/catalog/HdfsPartition.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/mixed-format.test
7 files changed, 114 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/45/3045/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>