You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ab...@apache.org on 2016/10/12 13:40:33 UTC

[1/4] incubator-impala git commit: IMPALA-3853: squeasel is MIT (and dual copyright) not Apache

Repository: incubator-impala
Updated Branches:
  refs/heads/master a9c405955 -> 0449b5bea


IMPALA-3853: squeasel is MIT (and dual copyright) not Apache

Squeasel was erroneously marked Apache 2.0 in the top-level
LICENSE.txt, with no copyright notice. It's actually MIT, with two
copyrights.

Change-Id: I9711ad60dbe00c3b8b1ce7b9ccc3ca1dd637b88c
Reviewed-on: http://gerrit.cloudera.org:8080/4646
Reviewed-by: Jim Apple <jb...@cloudera.com>
Tested-by: Internal Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/3eb051f6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/3eb051f6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/3eb051f6

Branch: refs/heads/master
Commit: 3eb051f6f6f2dc7697380896bbec87f1db799d76
Parents: a9c4059
Author: Jim Apple <jb...@cloudera.com>
Authored: Thu Oct 6 07:54:18 2016 -0700
Committer: Internal Jenkins <cl...@gerrit.cloudera.org>
Committed: Sun Oct 9 02:15:23 2016 +0000

----------------------------------------------------------------------
 LICENSE.txt                        | 26 +++++++++++++++++++++++++-
 be/src/thirdparty/squeasel/LICENSE |  3 ++-
 2 files changed, 27 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3eb051f6/LICENSE.txt
----------------------------------------------------------------------
diff --git a/LICENSE.txt b/LICENSE.txt
index 98ac0d8..eb5990d 100644
--- a/LICENSE.txt
+++ b/LICENSE.txt
@@ -548,7 +548,31 @@ shell/ext-py/sqlparse-0.1.14: 3-clause BSD
 
 --------------------------------------------------------------------------------
 
-be/src/thirdparty/squeasel: Apache 2.0 license
+be/src/thirdparty/squeasel: MIT license
+
+Some portions Copyright (c) 2004-2013 Sergey Lyubka
+Some portions Copyright (c) 2013 Cloudera Inc.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+--------------------------------------------------------------------------------
+
 be/src/thirdparty/mustache: Apache 2.0 license
 be/src/expr/hll-bias.h: Apache 2.0 license
 shell/ext-py/sasl-0.1.1: Apache 2.0 license

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3eb051f6/be/src/thirdparty/squeasel/LICENSE
----------------------------------------------------------------------
diff --git a/be/src/thirdparty/squeasel/LICENSE b/be/src/thirdparty/squeasel/LICENSE
index edb1983..3747c51 100644
--- a/be/src/thirdparty/squeasel/LICENSE
+++ b/be/src/thirdparty/squeasel/LICENSE
@@ -1,4 +1,5 @@
-Copyright (c) 2004-2013 Sergey Lyubka
+Some portions Copyright (c) 2004-2013 Sergey Lyubka
+Some portions Copyright (c) 2013 Cloudera Inc.
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal


[2/4] incubator-impala git commit: Remove unnecessary Kudu table sink BE test

Posted by ab...@apache.org.
Remove unnecessary Kudu table sink BE test

Now that we have functional tests for Kudu (IMPALA-3718), we
can remove the BE Kudu table sink test which duplicate existing
coverage and are expensive to maintain.

Change-Id: Ice1924d525c363ee65418c3495ed56647a352a52
Reviewed-on: http://gerrit.cloudera.org:8080/4686
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Internal Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/3e23e405
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/3e23e405
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/3e23e405

Branch: refs/heads/master
Commit: 3e23e40504000dd896fc1862809b659e41d468c1
Parents: 3eb051f
Author: Matthew Jacobs <mj...@cloudera.com>
Authored: Tue Oct 11 12:34:41 2016 -0700
Committer: Internal Jenkins <cl...@gerrit.cloudera.org>
Committed: Wed Oct 12 00:06:51 2016 +0000

----------------------------------------------------------------------
 be/src/exec/CMakeLists.txt          |   1 -
 be/src/exec/kudu-table-sink-test.cc | 314 -------------------------------
 be/src/exec/kudu-testutil.h         | 262 --------------------------
 3 files changed, 577 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3e23e405/be/src/exec/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/be/src/exec/CMakeLists.txt b/be/src/exec/CMakeLists.txt
index b2d9663..571198f 100644
--- a/be/src/exec/CMakeLists.txt
+++ b/be/src/exec/CMakeLists.txt
@@ -107,5 +107,4 @@ ADD_BE_TEST(parquet-plain-test)
 ADD_BE_TEST(parquet-version-test)
 ADD_BE_TEST(row-batch-list-test)
 ADD_BE_TEST(incr-stats-util-test)
-ADD_BE_TEST(kudu-table-sink-test)
 ADD_BE_TEST(hdfs-avro-scanner-test)

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3e23e405/be/src/exec/kudu-table-sink-test.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/kudu-table-sink-test.cc b/be/src/exec/kudu-table-sink-test.cc
deleted file mode 100644
index a1cbb68..0000000
--- a/be/src/exec/kudu-table-sink-test.cc
+++ /dev/null
@@ -1,314 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-#include "exec/kudu-testutil.h"
-
-#include "common/init.h"
-#include "codegen/llvm-codegen.h"
-#include "exec/kudu-table-sink.h"
-#include "exec/kudu-util.h"
-#include "gen-cpp/ImpalaInternalService_types.h"
-#include "gen-cpp/PlanNodes_types.h"
-#include "gen-cpp/Types_types.h"
-#include "gutil/strings/split.h"
-#include "gutil/stl_util.h"
-#include "kudu/client/row_result.h"
-#include "runtime/descriptors.h"
-#include "runtime/mem-tracker.h"
-#include "runtime/row-batch.h"
-#include "runtime/runtime-state.h"
-#include "runtime/tuple-row.h"
-#include "service/fe-support.h"
-#include "testutil/desc-tbl-builder.h"
-#include "testutil/test-macros.h"
-#include "util/cpu-info.h"
-#include "util/test-info.h"
-
-using apache::thrift::ThriftDebugString;
-
-namespace impala {
-
-static const char* BASE_TABLE_NAME = "TestInsertNodeTable";
-static const int FIRST_SLOT_ID = 1;
-static const int SECOND_SLOT_ID = 2;
-static const int THIRD_SLOT_ID = 3;
-
-class KuduTableSinkTest : public testing::Test {
- public:
-  KuduTableSinkTest() : runtime_state_(TExecPlanFragmentParams(), &exec_env_) {}
-
-  virtual void SetUp() {
-    // Create a Kudu client and the table (this will abort the test here
-    // if a Kudu cluster is not available).
-    kudu_test_helper_.CreateClient();
-    kudu_test_helper_.CreateTable(BASE_TABLE_NAME);
-
-    // Initialize the environment/runtime so that we can use a scan node in
-    // isolation.
-    Status s = exec_env_.InitForFeTests();
-    DCHECK(s.ok());
-    runtime_state_.InitMemTrackers(TUniqueId(), NULL, -1);
-    exec_env_.disk_io_mgr()->Init(&mem_tracker_);
-  }
-
-  void BuildRuntimeState(int num_cols_to_insert,
-      TSinkAction::type sink_action) {
-    TTableSink table_sink;
-    table_sink.__set_target_table_id(0);
-    table_sink.__set_action(sink_action);
-
-    // For tests ignore not found keys in delete test. Other paths are exercised via
-    // end-to-end tests.
-    TKuduTableSink kudu_table_sink;
-    kudu_table_sink.__set_ignore_not_found_or_duplicate(true);
-    table_sink.__set_kudu_table_sink(kudu_table_sink);
-
-    data_sink_.__set_type(TDataSinkType::TABLE_SINK);
-    data_sink_.__set_table_sink(table_sink);
-
-    kudu_test_helper_.CreateTableDescriptor(num_cols_to_insert, &desc_tbl_);
-
-    row_desc_ = obj_pool_.Add(
-        new RowDescriptor(*desc_tbl_,
-                          boost::assign::list_of(0),
-                          boost::assign::list_of(false)));
-
-    runtime_state_.set_desc_tbl(desc_tbl_);
-  }
-
-  void CreateTExprNode(int slot_id, TPrimitiveType::type type, TExpr* expr) {
-    TExprNode expr_node;
-    expr_node.node_type = TExprNodeType::SLOT_REF;
-    expr_node.type.types.push_back(TTypeNode());
-    expr_node.type.types.back().__isset.scalar_type = true;
-    expr_node.type.types.back().scalar_type.type = type;
-    expr_node.num_children = 0;
-    TSlotRef slot_ref;
-    slot_ref.slot_id = slot_id;
-    expr_node.__set_slot_ref(slot_ref);
-    expr->nodes.push_back(expr_node);
-  }
-
-  void CreateTExpr(int num_cols_to_insert, vector<TExpr>* exprs) {
-    DCHECK(num_cols_to_insert > 0 && num_cols_to_insert <= 3);
-    TExpr expr_1;
-    CreateTExprNode(FIRST_SLOT_ID, TPrimitiveType::INT, &expr_1);
-    exprs->push_back(expr_1);
-    if (num_cols_to_insert == 1) return;
-    TExpr expr_2;
-    CreateTExprNode(SECOND_SLOT_ID, TPrimitiveType::INT, &expr_2);
-    exprs->push_back(expr_2);
-    if (num_cols_to_insert == 2) return;
-    TExpr expr_3;
-    CreateTExprNode(THIRD_SLOT_ID, TPrimitiveType::STRING, &expr_3);
-    exprs->push_back(expr_3);
-  }
-
-  // Create a batch and fill it according to the tuple descriptor.
-  // Parameters:
-  //   - first_row - offset used to calculate the values to be written
-  //   - batch_size - maximum number of rows to generate
-  //   - factor - multiplier used to modify the value to be written, used in update tests
-  //   - val - free string value passed to the string column
-  //   - skip_val - skips rows where  (row_pos % skip_val) == 0
-  RowBatch* CreateRowBatch(int first_row, int batch_size, int factor, string val,
-      int skip_val) {
-    DCHECK(desc_tbl_->GetTupleDescriptor(0) != NULL);
-    DCHECK_GE(skip_val, 1);
-    TupleDescriptor* tuple_desc = desc_tbl_->GetTupleDescriptor(0);
-    RowBatch* batch = new RowBatch(*row_desc_, batch_size, &mem_tracker_);
-    int tuple_buffer_size = batch->capacity() * tuple_desc->byte_size();
-    void* tuple_buffer_ = batch->tuple_data_pool()->TryAllocate(tuple_buffer_size);
-    DCHECK(tuple_buffer_ != NULL);
-    Tuple* tuple = reinterpret_cast<Tuple*>(tuple_buffer_);
-
-    memset(tuple_buffer_, 0, tuple_buffer_size);
-    for (int i = 0; i < batch_size; ++i) {
-      if (skip_val != 1 && ((i + first_row) % skip_val) == 0) continue;
-      int idx = batch->AddRow();
-      TupleRow* row = batch->GetRow(idx);
-      row->SetTuple(0, tuple);
-
-      for (int j = 0; j < tuple_desc->slots().size(); j++) {
-        void* slot = tuple->GetSlot(tuple_desc->slots()[j]->tuple_offset());
-        DCHECK(slot != NULL);
-        switch(j) {
-          case 0: {
-            int32_t* int_slot = reinterpret_cast<int32_t*>(slot);
-            *int_slot = first_row + i;
-            break;
-          }
-          case 1: {
-            int32_t* int_slot = reinterpret_cast<int32_t*>(slot);
-            *int_slot = (first_row + i) * factor;
-            break;
-          }
-          case 2: {
-            string value = strings::Substitute("$0_$1", val, first_row + i);
-            char* buffer = reinterpret_cast<char*>(
-                batch->tuple_data_pool()->TryAllocate(value.size()));
-            DCHECK(buffer != NULL);
-            memcpy(buffer, value.data(), value.size());
-            reinterpret_cast<StringValue*>(slot)->ptr = buffer;
-            reinterpret_cast<StringValue*>(slot)->len = value.size();
-            break;
-          }
-          default:
-            DCHECK(false) << "Wrong number of slots.";
-        }
-      }
-      batch->CommitLastRow();
-      uint8_t* mem = reinterpret_cast<uint8_t*>(tuple);
-      tuple = reinterpret_cast<Tuple*>(mem + tuple_desc->byte_size());
-    }
-    return batch;
-  }
-
-  void Verify(int num_columns, int expected_num_rows, int factor, string val,
-      int skip_val) {
-    kudu::client::KuduScanner scanner(kudu_test_helper_.table().get());
-    KUDU_ASSERT_OK(scanner.SetReadMode(kudu::client::KuduScanner::READ_AT_SNAPSHOT));
-    KUDU_ASSERT_OK(scanner.SetFaultTolerant());
-    KUDU_ASSERT_OK(scanner.Open());
-    int row_idx = 0;
-    while (scanner.HasMoreRows()) {
-      vector<kudu::client::KuduRowResult> rows;
-      KUDU_ASSERT_OK(scanner.NextBatch(&rows));
-      for (const kudu::client::KuduRowResult& row: rows) {
-        switch(num_columns) {
-          case 1:
-            ASSERT_EQ(row.ToString(), strings::Substitute(
-                "(int32 key=$0, int32 int_val=NULL, string string_val=NULL)",
-                row_idx * skip_val));
-            break;
-          case 2:
-            ASSERT_EQ(row.ToString(), strings::Substitute(
-                "(int32 key=$0, int32 int_val=$1, string string_val=NULL)",
-                row_idx * skip_val, row_idx * skip_val * factor));
-            break;
-          case 3:
-            ASSERT_EQ(row.ToString(), strings::Substitute(
-                "(int32 key=$0, int32 int_val=$1, string string_val=$2_$3)",
-                row_idx * skip_val, row_idx * skip_val * factor, val, row_idx * skip_val));
-            break;
-        }
-        ++row_idx;
-      }
-    }
-    ASSERT_EQ(row_idx,
-        skip_val == 1 ? expected_num_rows : (expected_num_rows + 1) / skip_val);
-  }
-
-  void WriteAndVerify(int num_columns, TSinkAction::type type, int factor, string val,
-      int skip_val) {
-    const int kNumRowsPerBatch = 10;
-    // For deletes only populate the key column, in other cases populate all columns
-    int schema_cols = num_columns;
-    if (type == TSinkAction::DELETE) schema_cols = 1;
-    BuildRuntimeState(schema_cols, type);
-    vector<TExpr> exprs;
-    CreateTExpr(schema_cols, &exprs);
-    KuduTableSink sink(*row_desc_, exprs, data_sink_);
-    ASSERT_OK(sink.Prepare(&runtime_state_, &mem_tracker_));
-    ASSERT_OK(sink.Open(&runtime_state_));
-    vector<RowBatch*> row_batches;
-    row_batches.push_back(CreateRowBatch(0, kNumRowsPerBatch, factor, val, skip_val));
-    ASSERT_OK(sink.Send(&runtime_state_, row_batches.front()));
-    row_batches.push_back(CreateRowBatch(kNumRowsPerBatch, kNumRowsPerBatch, factor, val,
-                                         skip_val));
-    ASSERT_OK(sink.Send(&runtime_state_,row_batches.back()));
-    ASSERT_OK(sink.FlushFinal(&runtime_state_));
-    STLDeleteElements(&row_batches);
-    sink.Close(&runtime_state_);
-    Verify(num_columns, 2 * kNumRowsPerBatch, factor, val, skip_val);
-  }
-
-  void InsertAndVerify(int num_columns) {
-    WriteAndVerify(num_columns, TSinkAction::INSERT, 2, "hello", 1);
-  }
-
-  void UpdateAndVerify(int num_columns) {
-    WriteAndVerify(num_columns, TSinkAction::UPDATE, 3, "world", 1);
-  }
-
-  void DeleteAndVerify(int num_columns, int skip_val) {
-    WriteAndVerify(num_columns, TSinkAction::DELETE, 2, "hello", skip_val);
-  }
-
-  virtual void TearDown() {
-    kudu_test_helper_.DeleteTable();
-  }
-
- protected:
-  KuduTestHelper kudu_test_helper_;
-  MemTracker mem_tracker_;
-  ObjectPool obj_pool_;
-  ExecEnv exec_env_;
-  RuntimeState runtime_state_;
-  TDataSink data_sink_;
-  TTableDescriptor t_tbl_desc_;
-  DescriptorTbl* desc_tbl_;
-  RowDescriptor* row_desc_;
-};
-
-TEST_F(KuduTableSinkTest, TestInsertJustKey) {
-  InsertAndVerify(1);
-}
-
-TEST_F(KuduTableSinkTest, TestInsertTwoCols) {
-  InsertAndVerify(2);
-}
-
-TEST_F(KuduTableSinkTest, TestInsertAllCols) {
-  InsertAndVerify(3);
-}
-
-TEST_F(KuduTableSinkTest, UpdateTwoCols) {
-  InsertAndVerify(2);
-  UpdateAndVerify(2);
-}
-
-TEST_F(KuduTableSinkTest, UpdateAllCols) {
-  InsertAndVerify(3);
-  UpdateAndVerify(3);
-}
-
-TEST_F(KuduTableSinkTest, DeleteModThree) {
-  // 3 cols, delete all rows idx % 3 != 0
-  InsertAndVerify(3);
-  DeleteAndVerify(3, 3);
-}
-
-TEST_F(KuduTableSinkTest, DeleteModThreeTwice) {
-  // 3 cols, delete all rows idx % 3 != 0
-  InsertAndVerify(3);
-  DeleteAndVerify(3, 3);
-  // Deleting the same rows does not have an impact
-  DeleteAndVerify(3, 3);
-}
-
-} // namespace impala
-
-int main(int argc, char** argv) {
-  if (!impala::KuduClientIsSupported()) return 0;
-  ::testing::InitGoogleTest(&argc, argv);
-  impala::InitCommonRuntime(argc, argv, true, impala::TestInfo::BE_TEST);
-  impala::InitFeSupport();
-  impala::InitKuduLogging();
-  impala::LlvmCodeGen::InitializeLlvm();
-  return RUN_ALL_TESTS();
-}

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3e23e405/be/src/exec/kudu-testutil.h
----------------------------------------------------------------------
diff --git a/be/src/exec/kudu-testutil.h b/be/src/exec/kudu-testutil.h
deleted file mode 100644
index 1fd2c41..0000000
--- a/be/src/exec/kudu-testutil.h
+++ /dev/null
@@ -1,262 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-#ifndef IMPALA_EXEC_KUDU_TESTUTIL_H
-#define IMPALA_EXEC_KUDU_TESTUTIL_H
-
-#include <boost/assign/list_of.hpp>
-#include <gtest/gtest.h>
-#include <kudu/client/client.h>
-#include <kudu/util/slice.h>
-#include <kudu/util/status.h>
-#include <string>
-#include <tr1/memory>
-#include <vector>
-
-#include "common/object-pool.h"
-#include "gutil/gscoped_ptr.h"
-#include "runtime/exec-env.h"
-#include "testutil/desc-tbl-builder.h"
-
-#include "common/names.h"
-
-typedef kudu::Status KuduStatus;
-typedef impala::Status ImpalaStatus;
-
-namespace impala {
-
-using kudu::client::KuduClient;
-using kudu::client::KuduClientBuilder;
-using kudu::client::KuduColumnSchema;
-using kudu::client::KuduInsert;
-using kudu::client::KuduSchema;
-using kudu::client::KuduSchemaBuilder;
-using kudu::client::KuduSession;
-using kudu::client::KuduTable;
-using kudu::KuduPartialRow;
-using kudu::Slice;
-
-#define KUDU_ASSERT_OK(status) \
-  do { \
-    KuduStatus _s = status; \
-    if (_s.ok()) { \
-      SUCCEED(); \
-    } else { \
-      FAIL() << "Bad Kudu Status: " << _s.ToString();  \
-    } \
-  } while (0);
-
-
-// Helper class to assist in tests agains a Kudu cluster, namely with
-// table creation/deletion with insertion of rows.
-class KuduTestHelper {
- public:
-
-  void CreateClient() {
-    LOG(INFO) << "Creating Kudu client.";
-    KUDU_ASSERT_OK(KuduClientBuilder()
-                   .add_master_server_addr("127.0.0.1:7051")
-                   .Build(&client_));
-    KuduSchemaBuilder builder;
-    builder.AddColumn("key")->Type(KuduColumnSchema::INT32)->NotNull()->PrimaryKey();
-    builder.AddColumn("int_val")->Type(KuduColumnSchema::INT32)->Nullable();
-    builder.AddColumn("string_val")->Type(KuduColumnSchema::STRING)->Nullable();
-    KUDU_ASSERT_OK(builder.Build(&test_schema_));
-  }
-
-  void CreateTable(const string& table_name_prefix,
-                   vector<const KuduPartialRow*>* split_rows = NULL) {
-
-    vector<const KuduPartialRow*> splits;
-    if (split_rows != NULL) {
-      splits = *split_rows;
-    } else {
-      splits = DefaultSplitRows();
-    }
-
-    // Kudu's table delete functionality is in flux, meaning a table may reappear
-    // after being deleted. To work around this we add the time in milliseconds to
-    // the required table name, making it unique. When Kudu's delete table functionality
-    // is solid we should change this to avoid creating, and possibly leaving, many
-    // similar tables in the local Kudu test cluster. See KUDU-676
-    struct timeval tv;
-    gettimeofday(&tv, NULL);
-    int64_t millis = tv.tv_sec * 1000 + tv.tv_usec / 1000;
-    table_name_ = strings::Substitute("$0-$1", table_name_prefix, millis);
-
-    while(true) {
-      LOG(INFO) << "Creating Kudu table: " << table_name_;
-      kudu::Status s = client_->NewTableCreator()->table_name(table_name_)
-                             .schema(&test_schema_)
-                             .num_replicas(3)
-                             .split_rows(splits)
-                             .set_range_partition_columns(boost::assign::list_of("key"))
-                             .Create();
-      if (s.IsAlreadyPresent()) {
-        LOG(INFO) << "Table existed, deleting. " << table_name_;
-        KUDU_ASSERT_OK(client_->DeleteTable(table_name_));
-        sleep(1);
-        continue;
-      }
-      KUDU_CHECK_OK(s);
-      KUDU_ASSERT_OK(client_->OpenTable(table_name_, &client_table_));
-      break;
-    }
-  }
-
-  gscoped_ptr<KuduInsert> BuildTestRow(KuduTable* table, int index, int num_cols) {
-    DCHECK_GT(num_cols, 0);
-    DCHECK_LE(num_cols, 3);
-    gscoped_ptr<KuduInsert> insert(table->NewInsert());
-    KuduPartialRow* row = insert->mutable_row();
-    KUDU_CHECK_OK(row->SetInt32(0, index));
-    if (num_cols > 1) KUDU_CHECK_OK(row->SetInt32(1, index * 2));
-    if (num_cols > 2) {
-      KUDU_CHECK_OK(row->SetStringCopy(2, Slice(StringPrintf("hello_%d", index))));
-    }
-    return insert.Pass();
-  }
-
-  void InsertTestRows(KuduClient* client, KuduTable* table, int num_rows,
-      int first_row = 0, int num_cols = 3) {
-    std::tr1::shared_ptr<KuduSession> session = client->NewSession();
-    KUDU_ASSERT_OK(session->SetFlushMode(KuduSession::MANUAL_FLUSH));
-    session->SetTimeoutMillis(10000);
-    for (int i = first_row; i < num_rows + first_row; i++) {
-      KUDU_ASSERT_OK(session->Apply(BuildTestRow(table, i, num_cols).release()));
-      if (i % 1000 == 0) {
-        KUDU_ASSERT_OK(session->Flush());
-      }
-    }
-    KUDU_ASSERT_OK(session->Flush());
-    ASSERT_FALSE(session->HasPendingOperations());
-  }
-
-  void OpenTable(const string& table_name) {
-    table_name_ = table_name;
-    LOG(INFO) << "Opening Kudu table: " << table_name_;
-    KUDU_ASSERT_OK(client_->OpenTable(table_name, &client_table_));
-  }
-
-  void DeleteTable() {
-    LOG(INFO) << "Deleting Kudu table: " << table_name_;
-    KUDU_ASSERT_OK(client_->DeleteTable(table_name_));
-  }
-
-  vector<const KuduPartialRow*> DefaultSplitRows() {
-    vector<const KuduPartialRow*> keys;
-    KuduPartialRow* key = test_schema_.NewRow();
-    KUDU_CHECK_OK(key->SetInt32(0, 5));
-    keys.push_back(key);
-    return keys;
-  }
-
-  const string& table_name() const {
-    return table_name_;
-  }
-
-  const std::tr1::shared_ptr<KuduClient>& client() const {
-    return client_;
-  }
-
-  const std::tr1::shared_ptr<KuduTable>& table() const {
-    return client_table_;
-  }
-
-  const KuduSchema& test_schema() {
-    return test_schema_;
-  }
-
-  // Creates a test descriptor table based on the test schema.
-  // The returned DescriptorTbl will be allocated in this classe's object pool.
-  void CreateTableDescriptor(int num_cols_materialize, DescriptorTbl** desc_tbl) {
-    DescriptorTblBuilder desc_builder(&obj_pool_);
-    DCHECK_GE(num_cols_materialize, 0);
-    DCHECK_LE(num_cols_materialize, test_schema_.num_columns());
-
-    TKuduTable t_kudu_table;
-    t_kudu_table.__set_table_name(table_name());
-    t_kudu_table.__set_master_addresses(vector<string>(1, "0.0.0.0:7051"));
-    t_kudu_table.__set_key_columns(boost::assign::list_of("key"));
-
-    TTableDescriptor t_tbl_desc;
-    t_tbl_desc.__set_id(0);
-    t_tbl_desc.__set_tableType(::impala::TTableType::KUDU_TABLE);
-    t_tbl_desc.__set_kuduTable(t_kudu_table);
-
-    TScalarType int_scalar_type;
-    int_scalar_type.type = TPrimitiveType::INT;
-
-    TTypeNode int_type;
-    int_type.type = TTypeNodeType::SCALAR;
-    int_type.__set_scalar_type(int_scalar_type);
-
-    TColumnType int_col_type;
-    int_col_type.__set_types(vector<TTypeNode>(1, int_type));
-
-    TScalarType string_scalar_type;
-    string_scalar_type.type = TPrimitiveType::STRING;
-
-    TTypeNode string_type;
-    string_type.type = TTypeNodeType::SCALAR;
-    string_type.__set_scalar_type(string_scalar_type);
-
-    TColumnType string_col_type;
-    string_col_type.__set_types(vector<TTypeNode>(1, string_type));
-
-    vector<TColumnDescriptor> column_descriptors;
-
-    TupleDescBuilder& builder = desc_builder.DeclareTuple();
-    if (num_cols_materialize > 0) {
-      builder << TYPE_INT;
-      TColumnDescriptor key;
-      key.__set_name("key");
-      key.__set_type(int_col_type);
-      column_descriptors.push_back(key);
-    }
-    if (num_cols_materialize > 1) {
-      builder << TYPE_INT;
-      TColumnDescriptor int_val;
-      int_val.__set_name("int_val");
-      int_val.__set_type(int_col_type);
-      column_descriptors.push_back(int_val);
-    }
-    if (num_cols_materialize > 2) {
-      builder << TYPE_STRING;
-      TColumnDescriptor string_val;
-      string_val.__set_name("string_val");
-      string_val.__set_type(string_col_type);
-      column_descriptors.push_back(string_val);
-    }
-
-    t_tbl_desc.__set_columnDescriptors(column_descriptors);
-    desc_builder.SetTableDescriptor(t_tbl_desc);
-
-    *desc_tbl = desc_builder.Build();
-  }
-
- private:
-  string table_name_;
-  KuduSchema test_schema_;;
-  ObjectPool obj_pool_;
-  std::tr1::shared_ptr<KuduClient> client_;
-  std::tr1::shared_ptr<KuduTable> client_table_;
-};
-
-} // namespace impala
-
-#endif /* IMPALA_EXEC_KUDU_TESTUTIL_H */


[4/4] incubator-impala git commit: IMPALA-3943: Do not throw scan errors for empty Parquet files.

Posted by ab...@apache.org.
IMPALA-3943: Do not throw scan errors for empty Parquet files.

For Parquet files with no row groups but with num_rows=0 in the
file footer the Parquet scanner returns an error indicating
that the file is invalid. This behavior is a regression from
previous Impala versions which used to accept such files.

This patch restores the previous behavior and adds tests.

Change-Id: I50ac3df6ff24bc5c384ef22e0f804a5132adb62e
Reviewed-on: http://gerrit.cloudera.org:8080/4693
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Internal Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/0449b5be
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/0449b5be
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/0449b5be

Branch: refs/heads/master
Commit: 0449b5beaba89b02e8bc7fe133b4dc5fbe33fe81
Parents: b28baa4
Author: Alex Behm <al...@cloudera.com>
Authored: Tue Oct 11 18:49:38 2016 -0700
Committer: Internal Jenkins <cl...@gerrit.cloudera.org>
Committed: Wed Oct 12 09:22:57 2016 +0000

----------------------------------------------------------------------
 be/src/exec/hdfs-parquet-scanner.cc             |   6 ++++-
 testdata/data/README                            |   8 ++++++
 testdata/data/zero_rows_one_row_group.parquet   | Bin 0 -> 236 bytes
 testdata/data/zero_rows_zero_row_groups.parquet | Bin 0 -> 199 bytes
 .../queries/QueryTest/parquet-zero-rows.test    |  27 +++++++++++++++++++
 tests/query_test/test_scanners.py               |  25 +++++++++++++++++
 6 files changed, 65 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/0449b5be/be/src/exec/hdfs-parquet-scanner.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-parquet-scanner.cc b/be/src/exec/hdfs-parquet-scanner.cc
index 7d9adb4..7782e8a 100644
--- a/be/src/exec/hdfs-parquet-scanner.cc
+++ b/be/src/exec/hdfs-parquet-scanner.cc
@@ -410,7 +410,7 @@ Status HdfsParquetScanner::NextRowGroup() {
     ++row_group_idx_;
     if (row_group_idx_ >= file_metadata_.row_groups.size()) break;
     const parquet::RowGroup& row_group = file_metadata_.row_groups[row_group_idx_];
-    if (row_group.num_rows == 0) continue;
+    if (row_group.num_rows == 0 || file_metadata_.num_rows == 0) continue;
 
     const DiskIoMgr::ScanRange* split_range = static_cast<ScanRangeMetadata*>(
         metadata_range_->meta_data())->original_split;
@@ -895,6 +895,10 @@ Status HdfsParquetScanner::ProcessFooter() {
   }
 
   RETURN_IF_ERROR(ParquetMetadataUtils::ValidateFileVersion(file_metadata_, filename()));
+
+  // IMPALA-3943: Do not throw an error for empty files for backwards compatibility.
+  if (file_metadata_.num_rows == 0) return Status::OK();
+
   // Parse out the created by application version string
   if (file_metadata_.__isset.created_by) {
     file_version_ = ParquetFileVersion(file_metadata_.created_by);

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/0449b5be/testdata/data/README
----------------------------------------------------------------------
diff --git a/testdata/data/README b/testdata/data/README
index 3a0d5ec..fce8014 100644
--- a/testdata/data/README
+++ b/testdata/data/README
@@ -21,6 +21,14 @@ indexes are a single repeated run (and not literals), but the repeat count
 is incorrectly 0 in the file to test that such data corruption is proprly
 handled.
 
+zero_rows_zero_row_groups.parquet:
+Generated by hacking Impala's Parquet writer.
+The file metadata indicates zero rows and no row groups.
+
+zero_rows_one_row_group.parquet:
+Generated by hacking Impala's Parquet writer.
+The file metadata indicates zero rows but one row group.
+
 repeated_values.parquet:
 Generated with parquet-mr 1.2.5
 Contains 3 single-column rows:

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/0449b5be/testdata/data/zero_rows_one_row_group.parquet
----------------------------------------------------------------------
diff --git a/testdata/data/zero_rows_one_row_group.parquet b/testdata/data/zero_rows_one_row_group.parquet
new file mode 100644
index 0000000..3404a7c
Binary files /dev/null and b/testdata/data/zero_rows_one_row_group.parquet differ

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/0449b5be/testdata/data/zero_rows_zero_row_groups.parquet
----------------------------------------------------------------------
diff --git a/testdata/data/zero_rows_zero_row_groups.parquet b/testdata/data/zero_rows_zero_row_groups.parquet
new file mode 100644
index 0000000..9e132e3
Binary files /dev/null and b/testdata/data/zero_rows_zero_row_groups.parquet differ

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/0449b5be/testdata/workloads/functional-query/queries/QueryTest/parquet-zero-rows.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-query/queries/QueryTest/parquet-zero-rows.test b/testdata/workloads/functional-query/queries/QueryTest/parquet-zero-rows.test
new file mode 100644
index 0000000..e7de245
--- /dev/null
+++ b/testdata/workloads/functional-query/queries/QueryTest/parquet-zero-rows.test
@@ -0,0 +1,27 @@
+====
+---- QUERY
+select * from zero_rows_zero_row_groups
+---- TYPES
+int
+---- RESULTS
+====
+---- QUERY
+select count(*) from zero_rows_zero_row_groups
+---- TYPES
+bigint
+---- RESULTS
+0
+====
+---- QUERY
+select * from zero_rows_one_row_group
+---- TYPES
+int
+---- RESULTS
+====
+---- QUERY
+select count(*) from zero_rows_one_row_group
+---- TYPES
+bigint
+---- RESULTS
+0
+====

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/0449b5be/tests/query_test/test_scanners.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_scanners.py b/tests/query_test/test_scanners.py
index ba50949..6dce0af 100644
--- a/tests/query_test/test_scanners.py
+++ b/tests/query_test/test_scanners.py
@@ -245,6 +245,31 @@ class TestParquet(ImpalaTestSuite):
     vector.get_value('exec_option')['abort_on_error'] = 1
     self.run_test_case('QueryTest/parquet-abort-on-error', vector)
 
+  def test_zero_rows(self, vector, unique_database):
+    """IMPALA-3943: Tests that scanning files with num_rows=0 in the file footer
+    succeeds without errors."""
+    # Create test table with a file that has 0 rows and 0 row groups.
+    self.client.execute("create table %s.zero_rows_zero_row_groups (c int) "
+        "stored as parquet" % unique_database)
+    zero_rows_zero_row_groups_loc = get_fs_path(
+        "/test-warehouse/%s.db/%s" % (unique_database, "zero_rows_zero_row_groups"))
+    check_call(['hdfs', 'dfs', '-copyFromLocal',
+        os.environ['IMPALA_HOME'] + "/testdata/data/zero_rows_zero_row_groups.parquet",
+        zero_rows_zero_row_groups_loc])
+    # Create test table with a file that has 0 rows and 1 row group.
+    self.client.execute("create table %s.zero_rows_one_row_group (c int) "
+        "stored as parquet" % unique_database)
+    zero_rows_one_row_group_loc = get_fs_path(
+        "/test-warehouse/%s.db/%s" % (unique_database, "zero_rows_one_row_group"))
+    check_call(['hdfs', 'dfs', '-copyFromLocal',
+        os.environ['IMPALA_HOME'] + "/testdata/data/zero_rows_one_row_group.parquet",
+        zero_rows_one_row_group_loc])
+
+    vector.get_value('exec_option')['abort_on_error'] = 0
+    self.run_test_case('QueryTest/parquet-zero-rows', vector, unique_database)
+    vector.get_value('exec_option')['abort_on_error'] = 1
+    self.run_test_case('QueryTest/parquet-zero-rows', vector, unique_database)
+
   def test_corrupt_rle_counts(self, vector, unique_database):
     """IMPALA-3646: Tests that a certain type of file corruption for plain
     dictionary encoded values is gracefully handled. Cases tested:


[3/4] incubator-impala git commit: IMPALA-4274: hang in buffered-block-mgr-test

Posted by ab...@apache.org.
IMPALA-4274: hang in buffered-block-mgr-test

We started seeing hangs in CreateDestroyMulti() where a
thread was recursively acquiring static_block_mgrs_lock_.
This is only possible because a shared_ptr is destroyed
while holding the lock.

The fix is to reset the shared_ptr only after releasing
the lock.

Testing:
I was unable to reproduce the hang locally, but the
callstack in the JIRA was a strong enough smoking gun
to feel confident that this should fix the hang.

Change-Id: I21f3da1d09cdd101a28ee850f46f24acd361b604
Reviewed-on: http://gerrit.cloudera.org:8080/4690
Reviewed-by: Alex Behm <al...@cloudera.com>
Reviewed-by: Marcel Kornacker <ma...@cloudera.com>
Tested-by: Internal Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/b28baa4a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/b28baa4a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/b28baa4a

Branch: refs/heads/master
Commit: b28baa4a038e78b7d3ba26c50b78d9b28cf901a9
Parents: 3e23e40
Author: Tim Armstrong <ta...@cloudera.com>
Authored: Tue Oct 11 17:37:38 2016 -0700
Committer: Internal Jenkins <cl...@gerrit.cloudera.org>
Committed: Wed Oct 12 08:37:30 2016 +0000

----------------------------------------------------------------------
 be/src/runtime/buffered-block-mgr.cc | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/b28baa4a/be/src/runtime/buffered-block-mgr.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/buffered-block-mgr.cc b/be/src/runtime/buffered-block-mgr.cc
index f45a93b..f1d947e 100644
--- a/be/src/runtime/buffered-block-mgr.cc
+++ b/be/src/runtime/buffered-block-mgr.cc
@@ -524,6 +524,7 @@ Status BufferedBlockMgr::TransferBuffer(Block* dst, Block* src, bool unpin) {
 }
 
 BufferedBlockMgr::~BufferedBlockMgr() {
+  shared_ptr<BufferedBlockMgr> other_mgr_ptr;
   {
     lock_guard<SpinLock> lock(static_block_mgrs_lock_);
     BlockMgrsMap::iterator it = query_to_block_mgrs_.find(query_id_);
@@ -536,16 +537,19 @@ BufferedBlockMgr::~BufferedBlockMgr() {
     // distinguish between the two expired pointers), and when the other
     // ~BufferedBlockMgr() call occurs, it won't find an entry for this query_id_.
     if (it != query_to_block_mgrs_.end()) {
-      shared_ptr<BufferedBlockMgr> mgr = it->second.lock();
-      if (mgr.get() == NULL) {
+      other_mgr_ptr = it->second.lock();
+      if (other_mgr_ptr.get() == NULL) {
         // The BufferBlockMgr object referenced by this entry is being deconstructed.
         query_to_block_mgrs_.erase(it);
       } else {
         // The map references another (still valid) BufferedBlockMgr.
-        DCHECK_NE(mgr.get(), this);
+        DCHECK_NE(other_mgr_ptr.get(), this);
       }
     }
   }
+  // IMPALA-4274: releasing the reference count can recursively call ~BufferedBlockMgr().
+  // Do not do that with 'static_block_mgrs_lock_' held.
+  other_mgr_ptr.reset();
 
   if (io_request_context_ != NULL) io_mgr_->UnregisterContext(io_request_context_);