You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mj...@apache.org on 2017/07/19 15:49:06 UTC
[2/2] incubator-impala git commit: IMPALA-5407: Fix crash in HdfsSequenceTableWriter

IMPALA-5407: Fix crash in HdfsSequenceTableWriter

The following use of sequence file writer can lead to a crash:
> set compression_codec=gzip;
> set seq_compression_mode=record;
> set allow_unsupported_formats=1;
> create table seq_tbl like tbl stored as sequencefile;
> insert into seq_tbl select * from tbl;

This fix removes the MemPool::FreeAll() call from
HdfsSequenceTableWriter::Flush(). Freeing the memory pool in Flush()
is incorrect because a memory pool buffer is cached by the compressor
in the table writer which isn't reset across calls to Flush().

If the file that is being written is big enough,
HdfsSequenceTableWriter::AppendRows() will call Flush() multiple
times causing memory corruption.

Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec
Reviewed-on: http://gerrit.cloudera.org:8080/7394
Reviewed-by: Michael Ho <kw...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/bc56d3c4
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/bc56d3c4
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/bc56d3c4

Branch: refs/heads/master
Commit: bc56d3c48c3629bda989e1f6b8265bd42c1b5c63
Parents: 3bd21bc
Author: Attila Jeges <at...@cloudera.com>
Authored: Fri Jun 16 16:37:03 2017 +0200
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Wed Jul 19 06:48:06 2017 +0000

----------------------------------------------------------------------
 be/src/exec/hdfs-sequence-table-writer.cc         |  1 -
 .../queries/QueryTest/seq-writer.test             | 18 ++++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bc56d3c4/be/src/exec/hdfs-sequence-table-writer.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-sequence-table-writer.cc b/be/src/exec/hdfs-sequence-table-writer.cc
index f8d7b4c..4a66c5e 100644
--- a/be/src/exec/hdfs-sequence-table-writer.cc
+++ b/be/src/exec/hdfs-sequence-table-writer.cc
@@ -348,7 +348,6 @@ Status HdfsSequenceTableWriter::Flush() {
   }
   out_.Clear();
   out_value_lengths_block_.Clear();
-  mem_pool_->FreeAll();
   unflushed_rows_ = 0;
   return Status::OK();
 }

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bc56d3c4/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test b/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
index 753eb0f..7e2363f 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
@@ -288,3 +288,21 @@ select count(*) from store_sales_seq_gzip_block;
 ---- TYPES
 BIGINT
 ====
+---- QUERY
+# IMPALA-5407: Create a table containing seq files with GZIP+RECORD. If the number of
+# impalad workers is three, three files will be created, two of which are large enough
+# (> 64MB) to force multiple flushes. Make sure that the files have been created
+# successfully.
+SET COMPRESSION_CODEC=GZIP;
+SET SEQ_COMPRESSION_MODE=RECORD;
+SET ALLOW_UNSUPPORTED_FORMATS=1;
+create table catalog_sales_seq_gzip_rec like tpcds.catalog_sales stored as SEQUENCEFILE;
+insert into catalog_sales_seq_gzip_rec select * from tpcds.catalog_sales;
+====
+---- QUERY
+select count(*) from catalog_sales_seq_gzip_rec;
+---- RESULTS
+1441548
+---- TYPES
+BIGINT
+====