You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@impala.apache.org by jo...@apache.org on 2019/04/26 23:45:49 UTC

[impala] branch master updated (1147120 -> 8e97a3b)

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from 1147120  IMPALA-8454 (part 1): Refactor file descriptor loading code
     new 6a70374  IMPALA-8381: Optimize ParquetPlainEncoder::DecodeBatch() for simple types
     new 3491426  IMPALA-8444: Fix performance regression when building privilege name
     new 8e97a3b  Configure Hive 3's HS2 to execute queries using Tez local mode

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/exec/parquet/parquet-common.h               | 226 +++++++++++++++-----
 be/src/exec/parquet/parquet-plain-test.cc          | 229 +++++++++++++++++----
 be/src/testutil/random-vector-generators.h         | 108 ++++++++++
 bin/bootstrap_toolchain.py                         |  55 ++++-
 bin/impala-config.sh                               |  10 +
 fe/pom.xml                                         |  12 ++
 .../impala/authorization/AuthorizationPolicy.java  |  22 --
 .../sentry/SentryAuthorizationPolicy.java          |  27 +--
 .../impala/authorization/sentry/SentryProxy.java   |   2 +-
 .../java/org/apache/impala/catalog/Catalog.java    |   5 +-
 .../apache/impala/catalog/CatalogObjectCache.java  |   2 +-
 .../java/org/apache/impala/catalog/Principal.java  |  10 +-
 .../apache/impala/catalog/PrincipalPrivilege.java  | 152 +++++++-------
 fe/src/test/resources/hive-site.xml.py             |   8 +-
 testdata/bin/run-hive-server.sh                    |  15 ++
 tests/authorization/test_grant_revoke.py           |  47 ++++-
 16 files changed, 699 insertions(+), 231 deletions(-)
 create mode 100644 be/src/testutil/random-vector-generators.h

[impala] 01/03: IMPALA-8381: Optimize ParquetPlainEncoder::DecodeBatch() for simple types

Posted by jo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 6a703741d8fdc359833a0d593ca8b121cd5d890d
Author: Daniel Becker <da...@cloudera.com>
AuthorDate: Mon Apr 8 16:28:26 2019 +0200

    IMPALA-8381: Optimize ParquetPlainEncoder::DecodeBatch() for simple types
    
    Refactored the ParquetPlainEncoder::Decode() and
    ParquetPlainEncoder::DecodeBatch() methods to increase performance in
    batch decoding.
    
    The `Decode` and `DecodeBatch` methods retain their behaviour and
    outward interface, but the internal structure changes.
    
    We change how we split up the `Decode` template specialisations. The
    generic unspecialised template is used for numerical parquet types
    (INT32, INT64, INT96, FLOAT and DOUBLE) and various specialisations are
    used for BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY.
    
    We add a new method template, DecodeNoCheck, which does the actual
    decoding without bounds checking. It is called by the generic Decode
    method template internally. For all parquet types except for BYTE_ARRAY,
    DecodeBatch performs the bounds check once for the whole batch at the
    same time and calls DecodeNoCheck, so we save the cost of bounds
    checking for every decoded value. For BYTE_ARRAY, this cannot be done
    and we have to perform the checks for every value.
    
    In the non-BYTE_ARRAY version of DecodeBatch, we explicitly unroll the
    loop in batches of 8 to increase performance.
    
    The overall performance increase is up to 2x for small strides (8 bytes,
    INT32) but decreases as the stride increases, and disappears from around
    40 bytes. With bigger strides, there is no performance difference from
    the previous implementation.
    
    Testing:
      Added tests to parquet-plain-test.cc to test the `Decode` and the
      `DecodeBatch` methods both in single-value decoding and batch
      decoding.
    
    Change-Id: I57b7d2573bb6dfd038e581acb3bd8ea1565aa20d
    Reviewed-on: http://gerrit.cloudera.org:8080/12985
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/exec/parquet/parquet-common.h       | 226 +++++++++++++++++++++-------
 be/src/exec/parquet/parquet-plain-test.cc  | 229 ++++++++++++++++++++++++-----
 be/src/testutil/random-vector-generators.h | 108 ++++++++++++++
 3 files changed, 468 insertions(+), 95 deletions(-)

diff --git a/be/src/exec/parquet/parquet-common.h b/be/src/exec/parquet/parquet-common.h
index d225814..a7184ba 100644
--- a/be/src/exec/parquet/parquet-common.h
+++ b/be/src/exec/parquet/parquet-common.h
@@ -87,6 +87,26 @@ class ParquetPlainEncoder {
     }
   }
 
+  /// Returns the byte size of a value of Parqet type `type`. If `type` is
+  /// FIXED_LEN_BYTE_ARRAY, the argument `fixed_len_size` is returned.
+  /// The type must be one of INT32, INT64, INT96, FLOAT, DOUBLE or FIXED_LEN_BYTE_ARRAY.
+  /// BOOLEANs are not plain encoded and BYTE_ARRAYs do not have a fixed size, therefore
+  /// they are not supported.
+  static ALWAYS_INLINE int EncodedByteSize(const parquet::Type::type type,
+      const int fixed_len_size) {
+    switch (type) {
+      case parquet::Type::type::INT32: return sizeof(int32_t);
+      case parquet::Type::type::INT64: return sizeof(int64_t);
+      case parquet::Type::type::INT96: return 12;
+      case parquet::Type::type::FLOAT: return sizeof(float);
+      case parquet::Type::type::DOUBLE: return sizeof(double);
+      case parquet::Type::type::FIXED_LEN_BYTE_ARRAY: return fixed_len_size;
+      default: DCHECK(false);
+    };
+
+    return -1;
+  }
+
   /// The minimum byte size to store decimals of with precision t.precision.
   static int DecimalSize(const ColumnType& t) {
     DCHECK(t.type == TYPE_DECIMAL);
@@ -177,27 +197,31 @@ class ParquetPlainEncoder {
     }
   }
 
-  /// Decodes t from 'buffer', reading up to the byte before 'buffer_end'. 'buffer'
-  /// need not be aligned. If PARQUET_TYPE is FIXED_LEN_BYTE_ARRAY then 'fixed_len_size'
-  /// is the size of the object. Otherwise, it is unused.
-  /// Returns the number of bytes read or -1 if the value was not decoded successfully.
-  /// This generic template function is used with the following types:
-  /// =============================
-  /// InternalType   | PARQUET_TYPE
-  /// =============================
-  /// int32_t        | INT32
-  /// int64_t        | INT64
-  /// float          | FLOAT
-  /// double         | DOUBLE
-  /// Decimal4Value  | INT32
-  /// Decimal8Value  | INT64
-  /// TimestampValue | INT96
+  /// Decodes t from 'buffer', reading up to the byte before 'buffer_end'. 'buffer' need
+  /// not be aligned. If PARQUET_TYPE is FIXED_LEN_BYTE_ARRAY then 'fixed_len_size' is the
+  /// size of the object. Otherwise, it is unused. Returns the number of bytes read or -1
+  /// if the value was not decoded successfully.
+  /// This generic template function is used when PARQUET_TYPE is one of INT32, INT64,
+  /// INT96, FLOAT, DOUBLE or FIXED_LEN_BYTE_ARRAY.
   template <typename InternalType, parquet::Type::type PARQUET_TYPE>
   static int Decode(const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size,
       InternalType* v) {
-    int byte_size = ByteSize(*v);
+    /// We cannot make it a static assert because the DecodeByParquetType template
+    /// potentially calls this function with every combination of internal and parquet
+    /// types, not only the valid ones. The invalid combinations do not occur at runtime,
+    /// but they cannot be ruled out at compile time.
+    DCHECK(PARQUET_TYPE == parquet::Type::type::INT32
+        || PARQUET_TYPE == parquet::Type::type::INT64
+        || PARQUET_TYPE == parquet::Type::type::INT96
+        || PARQUET_TYPE == parquet::Type::type::FLOAT
+        || PARQUET_TYPE == parquet::Type::type::DOUBLE
+        || PARQUET_TYPE == parquet::Type::type::FIXED_LEN_BYTE_ARRAY);
+
+    int byte_size = EncodedByteSize(PARQUET_TYPE, fixed_len_size);
+
     if (UNLIKELY(buffer_end - buffer < byte_size)) return -1;
-    memcpy(v, buffer, byte_size);
+    DecodeNoBoundsCheck<InternalType, PARQUET_TYPE>(buffer, buffer_end,
+        fixed_len_size, v);
     return byte_size;
   }
 
@@ -209,6 +233,23 @@ class ParquetPlainEncoder {
   template <typename InternalType, parquet::Type::type PARQUET_TYPE>
   static int64_t DecodeBatch(const uint8_t* buffer, const uint8_t* buffer_end,
       int fixed_len_size, int64_t num_values, int64_t stride, InternalType* v);
+
+ private:
+  /// Decode values without bounds checking. `buffer_end` is only used for DCHECKs in
+  /// DEBUG mode, it is unused in RELEASE mode.
+  template <typename InternalType, parquet::Type::type PARQUET_TYPE>
+  static ALWAYS_INLINE inline void DecodeNoBoundsCheck(const uint8_t* buffer,
+      const uint8_t* buffer_end, int fixed_len_size, InternalType* v);
+
+  template <typename InternalType, parquet::Type::type PARQUET_TYPE>
+  static int64_t DecodeBatchAlwaysBoundsCheck(const uint8_t* buffer,
+      const uint8_t* buffer_end, int fixed_len_size, int64_t num_values, int64_t stride,
+      InternalType* v);
+
+  template <typename InternalType, parquet::Type::type PARQUET_TYPE>
+  static int64_t DecodeBatchOneBoundsCheck(const uint8_t* buffer,
+      const uint8_t* buffer_end, int fixed_len_size, int64_t num_values, int64_t stride,
+      InternalType* v);
 };
 
 /// Calling this with arguments of type ColumnType is certainly a programmer error, so we
@@ -258,46 +299,55 @@ inline int ParquetPlainEncoder::ByteSize(const TimestampValue& v) {
 template <typename From, typename To>
 inline int DecodeWithConversion(const uint8_t* buffer, const uint8_t* buffer_end, To* v) {
   int byte_size = sizeof(From);
-  if (UNLIKELY(buffer_end - buffer < byte_size)) return -1;
+  DCHECK_GE(buffer_end - buffer, byte_size);
   From dest;
   memcpy(&dest, buffer, byte_size);
   *v = dest;
   return byte_size;
 }
 
-template <>
-inline int ParquetPlainEncoder::Decode<int64_t, parquet::Type::INT32>(
-    const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size, int64_t* v) {
-  return DecodeWithConversion<int32_t, int64_t>(buffer, buffer_end, v);
+/// Decodes a value without bounds checking.
+/// This generic template function is used with the following types:
+/// =============================
+/// InternalType   | PARQUET_TYPE
+/// =============================
+/// int8_t         | INT32
+/// int16_t        | INT32
+/// int32_t        | INT32
+/// int64_t        | INT64
+/// float          | FLOAT
+/// double         | DOUBLE
+/// Decimal4Value  | INT32
+/// Decimal8Value  | INT64
+/// TimestampValue | INT96
+template <typename InternalType, parquet::Type::type PARQUET_TYPE>
+void ParquetPlainEncoder::DecodeNoBoundsCheck(const uint8_t* buffer,
+    const uint8_t* buffer_end, int fixed_len_size, InternalType* v) {
+  int byte_size = EncodedByteSize(PARQUET_TYPE, -1);
+  DCHECK_GE(buffer_end - buffer, byte_size);
+
+  /// This generic template is only used when either no conversion is needed or with
+  /// narrowing integer conversions (e.g. int32_t to int16_t or int8_t) where copying the
+  /// lower bytes is the correct conversion.
+  memcpy(v, buffer, sizeof(InternalType));
 }
 
 template <>
-inline int ParquetPlainEncoder::Decode<double, parquet::Type::INT32>(
-    const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size, double* v) {
-  return DecodeWithConversion<int32_t, double>(buffer, buffer_end, v);
+inline void ParquetPlainEncoder::DecodeNoBoundsCheck<int64_t, parquet::Type::INT32>(
+    const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size, int64_t* v) {
+  DecodeWithConversion<int32_t, int64_t>(buffer, buffer_end, v);
 }
 
 template <>
-inline int ParquetPlainEncoder::Decode<double, parquet::Type::FLOAT>(
+inline void ParquetPlainEncoder::DecodeNoBoundsCheck<double, parquet::Type::INT32>(
     const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size, double* v) {
-  return DecodeWithConversion<float, double>(buffer, buffer_end, v);
+  DecodeWithConversion<int32_t, double>(buffer, buffer_end, v);
 }
 
 template <>
-inline int ParquetPlainEncoder::Decode<int8_t, parquet::Type::INT32>(
-    const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size, int8_t* v) {
-  int byte_size = ByteSize(*v);
-  if (UNLIKELY(buffer_end - buffer < byte_size)) return -1;
-  *v = *buffer;
-  return byte_size;
-}
-template <>
-inline int ParquetPlainEncoder::Decode<int16_t, parquet::Type::INT32>(
-    const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size, int16_t* v) {
-  int byte_size = ByteSize(*v);
-  if (UNLIKELY(buffer_end - buffer < byte_size)) return -1;
-  memcpy(v, buffer, sizeof(int16_t));
-  return byte_size;
+inline void ParquetPlainEncoder::DecodeNoBoundsCheck<double, parquet::Type::FLOAT>(
+    const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size, double* v) {
+  DecodeWithConversion<float, double>(buffer, buffer_end, v);
 }
 
 template<typename T>
@@ -371,26 +421,85 @@ inline int ParquetPlainEncoder::Encode(
 }
 
 template <typename InternalType, parquet::Type::type PARQUET_TYPE>
-inline int64_t ParquetPlainEncoder::DecodeBatch(const uint8_t* buffer,
+inline int64_t ParquetPlainEncoder::DecodeBatchAlwaysBoundsCheck(const uint8_t* buffer,
     const uint8_t* buffer_end, int fixed_len_size, int64_t num_values, int64_t stride,
     InternalType* v) {
   const uint8_t* buffer_pos = buffer;
   StrideWriter<InternalType> out(v, stride);
+
   for (int64_t i = 0; i < num_values; ++i) {
     int encoded_len = Decode<InternalType, PARQUET_TYPE>(
         buffer_pos, buffer_end, fixed_len_size, out.Advance());
     if (UNLIKELY(encoded_len < 0)) return -1;
     buffer_pos += encoded_len;
   }
+
   return buffer_pos - buffer;
 }
 
+template <typename InternalType, parquet::Type::type PARQUET_TYPE>
+inline int64_t ParquetPlainEncoder::DecodeBatchOneBoundsCheck(const uint8_t* buffer,
+    const uint8_t* buffer_end, int fixed_len_size, int64_t num_values, int64_t stride,
+    InternalType* v) {
+  const uint8_t* buffer_pos = buffer;
+  uint8_t* output = reinterpret_cast<uint8_t*>(v);
+  const int byte_size_of_element = EncodedByteSize(PARQUET_TYPE, fixed_len_size);
+
+  if (UNLIKELY(buffer_end - buffer < num_values * byte_size_of_element)) return -1;
+
+  /// We unroll the loop manually in batches of 8.
+  constexpr int batch = 8;
+  const int full_batches = num_values / batch;
+  const int remainder = num_values % batch;
+
+  for (int b = 0; b < full_batches; b++) {
+#pragma push_macro("DECODE_NO_CHECK_UNROLL")
+#define DECODE_NO_CHECK_UNROLL(ignore1, i, ignore2) \
+    DecodeNoBoundsCheck<InternalType, PARQUET_TYPE>( \
+        buffer_pos + i * byte_size_of_element, buffer_end, fixed_len_size, \
+        reinterpret_cast<InternalType*>(output + i * stride));
+
+    BOOST_PP_REPEAT_FROM_TO(0, 8 /* The value of `batch` */,
+        DECODE_NO_CHECK_UNROLL, ignore);
+#pragma pop_macro("DECODE_NO_CHECK_UNROLL")
+
+    output += batch * stride;
+    buffer_pos += batch * byte_size_of_element;
+  }
+
+  StrideWriter<InternalType> out(reinterpret_cast<InternalType*>(output), stride);
+  for (int i = 0; i < remainder; i++) {
+    DecodeNoBoundsCheck<InternalType, PARQUET_TYPE>(
+        buffer_pos, buffer_end, fixed_len_size, out.Advance());
+    buffer_pos += byte_size_of_element;
+  }
+
+  DCHECK_EQ(buffer_pos - buffer, num_values * byte_size_of_element);
+  return buffer_pos - buffer;
+}
+
+template <typename InternalType, parquet::Type::type PARQUET_TYPE>
+inline int64_t ParquetPlainEncoder::DecodeBatch(const uint8_t* buffer,
+    const uint8_t* buffer_end, int fixed_len_size, int64_t num_values, int64_t stride,
+    InternalType* v) {
+  /// Whether bounds checking needs to be done for every element or we can check the whole
+  /// batch at the same time.
+  constexpr bool has_variable_length =
+      PARQUET_TYPE == parquet::Type::type::BYTE_ARRAY;
+  if (has_variable_length) {
+    return DecodeBatchAlwaysBoundsCheck<InternalType, PARQUET_TYPE>(buffer, buffer_end,
+        fixed_len_size, num_values, stride, v);
+  } else {
+    return DecodeBatchOneBoundsCheck<InternalType, PARQUET_TYPE>(buffer, buffer_end,
+        fixed_len_size, num_values, stride, v);
+  }
+}
+
 template <typename T>
-inline int DecodeDecimalFixedLen(
+inline void DecodeDecimalFixedLen(
     const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size, T* v) {
-  if (UNLIKELY(buffer_end - buffer < fixed_len_size)) return -1;
+  DCHECK_GE(buffer_end - buffer, fixed_len_size);
   DecimalUtil::DecodeFromFixedLenByteArray(buffer, fixed_len_size, v);
-  return fixed_len_size;
 }
 
 template <>
@@ -402,24 +511,27 @@ Decode<bool, parquet::Type::BOOLEAN>(const uint8_t* buffer,
 }
 
 template <>
-inline int ParquetPlainEncoder::
-Decode<Decimal4Value, parquet::Type::FIXED_LEN_BYTE_ARRAY>(const uint8_t* buffer,
-    const uint8_t* buffer_end, int fixed_len_size, Decimal4Value* v) {
-  return DecodeDecimalFixedLen(buffer, buffer_end, fixed_len_size, v);
+inline void ParquetPlainEncoder::
+DecodeNoBoundsCheck<Decimal4Value, parquet::Type::FIXED_LEN_BYTE_ARRAY>(
+    const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size,
+    Decimal4Value* v) {
+  DecodeDecimalFixedLen(buffer, buffer_end, fixed_len_size, v);
 }
 
 template <>
-inline int ParquetPlainEncoder::
-Decode<Decimal8Value, parquet::Type::FIXED_LEN_BYTE_ARRAY>(const uint8_t* buffer,
-    const uint8_t* buffer_end, int fixed_len_size, Decimal8Value* v) {
-  return DecodeDecimalFixedLen(buffer, buffer_end, fixed_len_size, v);
+inline void ParquetPlainEncoder::
+DecodeNoBoundsCheck<Decimal8Value, parquet::Type::FIXED_LEN_BYTE_ARRAY>(
+    const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size,
+    Decimal8Value* v) {
+  DecodeDecimalFixedLen(buffer, buffer_end, fixed_len_size, v);
 }
 
 template <>
-inline int ParquetPlainEncoder::
-Decode<Decimal16Value, parquet::Type::FIXED_LEN_BYTE_ARRAY>(const uint8_t* buffer,
-    const uint8_t* buffer_end, int fixed_len_size, Decimal16Value* v) {
-  return DecodeDecimalFixedLen(buffer, buffer_end, fixed_len_size, v);
+inline void ParquetPlainEncoder::
+DecodeNoBoundsCheck<Decimal16Value, parquet::Type::FIXED_LEN_BYTE_ARRAY>(
+    const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size,
+    Decimal16Value* v) {
+  DecodeDecimalFixedLen(buffer, buffer_end, fixed_len_size, v);
 }
 
 /// Helper method to decode Decimal type stored as variable length byte array.
diff --git a/be/src/exec/parquet/parquet-plain-test.cc b/be/src/exec/parquet/parquet-plain-test.cc
index 6eb880f..9fbe540 100644
--- a/be/src/exec/parquet/parquet-plain-test.cc
+++ b/be/src/exec/parquet/parquet-plain-test.cc
@@ -15,15 +15,17 @@
 // specific language governing permissions and limitations
 // under the License.
 
-#include <limits.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include <algorithm>
 #include <iostream>
 #include "exec/parquet/parquet-common.h"
 #include "runtime/decimal-value.h"
 #include "runtime/string-value.inline.h"
 #include "runtime/timestamp-value.h"
 #include "testutil/gtest-util.h"
+#include "testutil/random-vector-generators.h"
+#include "testutil/rand-util.h"
 
 #include "common/names.h"
 
@@ -78,25 +80,8 @@ int Encode(const Decimal16Value& v, int encoded_byte_size, uint8_t* buffer,
 }
 
 /// Test that the decoder fails when asked to decode a truncated value.
-template <typename InternalType, parquet::Type::type PARQUET_TYPE>
-void TestTruncate(const InternalType& v, int expected_byte_size) {
-  uint8_t buffer[expected_byte_size];
-  int encoded_size = Encode(v, expected_byte_size, buffer, PARQUET_TYPE);
-  EXPECT_EQ(encoded_size, expected_byte_size);
-
-  // Check all possible truncations of the buffer.
-  for (int truncated_size = encoded_size - 1; truncated_size >= 0; --truncated_size) {
-    InternalType result;
-    /// Copy to heap-allocated buffer so that ASAN can detect buffer overruns.
-    uint8_t* truncated_buffer = new uint8_t[truncated_size];
-    memcpy(truncated_buffer, buffer, truncated_size);
-    int decoded_size = ParquetPlainEncoder::Decode<InternalType, PARQUET_TYPE>(
-        truncated_buffer, truncated_buffer + truncated_size, expected_byte_size, &result);
-    EXPECT_EQ(-1, decoded_size);
-    delete[] truncated_buffer;
-  }
-}
-
+/// This function can be used for type widening tests but also tests without type
+/// widening, in which case `WidenInternalType` is the same as `InternalType`.
 template <typename InternalType, typename WidenInternalType,
     parquet::Type::type PARQUET_TYPE>
 void TestTruncate(const InternalType& v, int expected_byte_size) {
@@ -111,28 +96,14 @@ void TestTruncate(const InternalType& v, int expected_byte_size) {
     uint8_t* truncated_buffer = new uint8_t[truncated_size];
     memcpy(truncated_buffer, buffer, truncated_size);
     int decoded_size = ParquetPlainEncoder::Decode<WidenInternalType, PARQUET_TYPE>(
-        truncated_buffer, truncated_buffer + truncated_size, expected_byte_size,
-        &result);
+        truncated_buffer, truncated_buffer + truncated_size, expected_byte_size, &result);
     EXPECT_EQ(-1, decoded_size);
     delete[] truncated_buffer;
   }
 }
 
-template <typename InternalType, parquet::Type::type PARQUET_TYPE>
-void TestType(const InternalType& v, int expected_byte_size) {
-  uint8_t buffer[expected_byte_size];
-  int encoded_size = Encode(v, expected_byte_size, buffer, PARQUET_TYPE);
-  EXPECT_EQ(encoded_size, expected_byte_size);
-
-  InternalType result;
-  int decoded_size = ParquetPlainEncoder::Decode<InternalType, PARQUET_TYPE>(buffer,
-      buffer + expected_byte_size, expected_byte_size, &result);
-  EXPECT_EQ(decoded_size, expected_byte_size);
-  EXPECT_EQ(result, v);
-
-  TestTruncate<InternalType, PARQUET_TYPE>(v, expected_byte_size);
-}
-
+/// This function can be used for type widening tests but also tests without type
+/// widening, in which case `WidenInternalType` is the same as `InternalType`.
 template <typename InternalType, typename WidenInternalType,
     parquet::Type::type PARQUET_TYPE>
 void TestTypeWidening(const InternalType& v, int expected_byte_size) {
@@ -143,13 +114,27 @@ void TestTypeWidening(const InternalType& v, int expected_byte_size) {
   WidenInternalType result;
   int decoded_size = ParquetPlainEncoder::Decode<WidenInternalType, PARQUET_TYPE>(
       buffer, buffer + expected_byte_size, expected_byte_size, &result);
-  EXPECT_EQ(decoded_size, expected_byte_size);
+  EXPECT_EQ(expected_byte_size, decoded_size);
   EXPECT_EQ(v, result);
 
+  WidenInternalType batch_result;
+  int batch_decoded_size
+      = ParquetPlainEncoder::DecodeBatch<WidenInternalType, PARQUET_TYPE>(
+          buffer, buffer + expected_byte_size, expected_byte_size, 1,
+          sizeof(WidenInternalType), &batch_result);
+  EXPECT_EQ(expected_byte_size, batch_decoded_size);
+  EXPECT_EQ(v, batch_result);
+
   TestTruncate<InternalType, WidenInternalType, PARQUET_TYPE>(
       v, expected_byte_size);
 }
 
+template <typename InternalType, parquet::Type::type PARQUET_TYPE>
+void TestType(const InternalType& v, int expected_byte_size) {
+  return TestTypeWidening<InternalType, InternalType, PARQUET_TYPE>(
+      v, expected_byte_size);
+}
+
 TEST(PlainEncoding, Basic) {
   int8_t i8 = 12;
   int16_t i16 = 123;
@@ -286,6 +271,174 @@ TEST(PlainEncoding, Basic) {
   }
 }
 
+template <typename InputType, typename OutputType>
+void ExpectEqualWithStride(const std::vector<InputType>& input,
+    const std::vector<uint8_t>& output, int stride) {
+  ASSERT_EQ(input.size() * stride, output.size());
+
+  for (int i = 0; i < input.size(); i++) {
+    const InputType& input_value = input[i];
+    OutputType output_value;
+
+    memcpy(&output_value, &output[i * stride], sizeof(OutputType));
+    EXPECT_EQ(input_value, output_value);
+  }
+}
+
+/// This function can be used for type widening tests but also tests without type
+/// widening, in which case `WidenInternalType` is the same as `InternalType`.
+template <typename InternalType, typename WidenInternalType,
+         parquet::Type::type PARQUET_TYPE>
+void TestTypeWideningBatch(const std::vector<InternalType>& values,
+    int expected_byte_size, int stride) {
+  ASSERT_GE(stride, sizeof(WidenInternalType));
+
+  constexpr bool var_length = PARQUET_TYPE == parquet::Type::BYTE_ARRAY;
+
+  std::vector<uint8_t> buffer(values.size() * expected_byte_size, 0);
+  uint8_t* output_pos = buffer.data();
+  for (int i = 0; i < values.size(); i++) {
+    int encoded_size = Encode(values[i], expected_byte_size, output_pos, PARQUET_TYPE);
+    if (var_length) {
+      /// For variable length types, the size is variable and `expected_byte_size` should
+      /// be the maximum.
+      EXPECT_GE(expected_byte_size, encoded_size);
+    } else {
+      EXPECT_EQ(expected_byte_size, encoded_size);
+    }
+
+    output_pos += encoded_size;
+  }
+
+  /// Decode one by one.
+  std::vector<uint8_t> output_1by1(values.size() * stride);
+  uint8_t* input_pos = buffer.data();
+  for (int i = 0; i < values.size(); i++) {
+    WidenInternalType* dest = reinterpret_cast<WidenInternalType*>(
+        &output_1by1[i * stride]);
+    int decoded_size = ParquetPlainEncoder::Decode<WidenInternalType, PARQUET_TYPE>(
+        input_pos, buffer.data() + buffer.size(), expected_byte_size, dest);
+    if (var_length) {
+      EXPECT_GE(expected_byte_size, decoded_size);
+    } else {
+      EXPECT_EQ(expected_byte_size, decoded_size);
+    }
+
+    input_pos += decoded_size;
+  }
+
+  ExpectEqualWithStride<InternalType, WidenInternalType>(values, output_1by1, stride);
+
+  /// Decode in batch.
+  std::vector<uint8_t> output_batch(values.size() * stride);
+  int decoded_size = ParquetPlainEncoder::DecodeBatch<WidenInternalType, PARQUET_TYPE>(
+      buffer.data(), buffer.data() + buffer.size(), expected_byte_size, values.size(),
+      stride, reinterpret_cast<WidenInternalType*>(output_batch.data()));
+  if (var_length) {
+    EXPECT_GE(buffer.size(), decoded_size);
+  } else {
+    EXPECT_EQ(buffer.size(), decoded_size);
+  }
+
+  ExpectEqualWithStride<InternalType, WidenInternalType>(values, output_batch, stride);
+}
+
+template <typename InternalType, parquet::Type::type PARQUET_TYPE>
+void TestTypeBatch(const std::vector<InternalType>& values, int expected_byte_size,
+    int stride) {
+  return TestTypeWideningBatch<InternalType, InternalType, PARQUET_TYPE>(values,
+      expected_byte_size, stride);
+}
+
+TEST(PlainEncoding, Batch) {
+  std::mt19937 gen;
+  RandTestUtil::SeedRng("PARQUET_PLAIN_ENCODING_TEST_RANDOM_SEED", &gen);
+
+  constexpr int NUM_ELEMENTS = 1024 * 5 + 10;
+  constexpr int stride = 20;
+
+  const std::vector<int8_t> int8_vec = RandomNumberVec<int8_t>(gen, NUM_ELEMENTS);
+  TestTypeBatch<int8_t, parquet::Type::INT32>(int8_vec, sizeof(int32_t), stride);
+
+  const std::vector<int16_t> int16_vec = RandomNumberVec<int16_t>(gen, NUM_ELEMENTS);
+  TestTypeBatch<int16_t, parquet::Type::INT32>(int16_vec, sizeof(int32_t), stride);
+
+  const std::vector<int32_t> int32_vec = RandomNumberVec<int32_t>(gen, NUM_ELEMENTS);
+  TestTypeBatch<int32_t, parquet::Type::INT32>(int32_vec, sizeof(int32_t), stride);
+
+  const std::vector<int64_t> int64_vec = RandomNumberVec<int64_t>(gen, NUM_ELEMENTS);
+  TestTypeBatch<int64_t, parquet::Type::INT64>(int64_vec, sizeof(int64_t), stride);
+
+  const std::vector<float> float_vec = RandomNumberVec<float>(gen, NUM_ELEMENTS);
+  TestTypeBatch<float, parquet::Type::FLOAT>(float_vec, sizeof(float), stride);
+
+  const std::vector<double> double_vec = RandomNumberVec<double>(gen, NUM_ELEMENTS);
+  TestTypeBatch<double, parquet::Type::DOUBLE>(double_vec, sizeof(double), stride);
+
+  constexpr int max_str_length = 100;
+  const std::vector<std::string> str_vec = RandomStrVec(gen, NUM_ELEMENTS,
+      max_str_length);
+  std::vector<StringValue> sv_vec(str_vec.size());
+  std::transform(str_vec.begin(), str_vec.end(), sv_vec.begin(),
+      [] (const std::string& s) { return StringValue(s); });
+  TestTypeBatch<StringValue, parquet::Type::BYTE_ARRAY>(sv_vec,
+      sizeof(int32_t) + max_str_length, stride);
+
+  const std::vector<TimestampValue> tv_vec = RandomTimestampVec(gen, NUM_ELEMENTS);
+  TestTypeBatch<TimestampValue, parquet::Type::INT96>(tv_vec, 12, stride);
+
+  // Test type widening.
+  TestTypeWideningBatch<int32_t, int64_t, parquet::Type::INT32>(int32_vec,
+      sizeof(int32_t), stride);
+  TestTypeWideningBatch<int32_t, double, parquet::Type::INT32>(int32_vec, sizeof(int32_t),
+      stride);
+  TestTypeWideningBatch<float, double, parquet::Type::FLOAT>(float_vec, sizeof(float),
+      stride);
+
+  // In the Decimal batch tests, when writing the decimals as BYTE_ARRAYs, we always use
+  // the size of the underlying type as the array length for simplicity.
+  // The non-batch tests take care of storing them on as many bytes as needed.
+
+  // BYTE_ARRAYs store the length of the array on 4 bytes.
+  constexpr int decimal_size_bytes = sizeof(int32_t);
+
+  // Decimal4Value
+  std::vector<Decimal4Value> decimal4_vec(int32_vec.size());
+  std::transform(int32_vec.begin(), int32_vec.end(), decimal4_vec.begin(),
+      [] (const int32_t i) { return Decimal4Value(i); });
+
+  TestTypeBatch<Decimal4Value, parquet::Type::BYTE_ARRAY>(decimal4_vec,
+      decimal_size_bytes + sizeof(Decimal4Value::StorageType), stride);
+  TestTypeBatch<Decimal4Value, parquet::Type::FIXED_LEN_BYTE_ARRAY>(decimal4_vec,
+      sizeof(Decimal4Value::StorageType), stride);
+  TestTypeBatch<Decimal4Value, parquet::Type::INT32>(decimal4_vec,
+      sizeof(Decimal4Value::StorageType), stride);
+
+  // Decimal8Value
+  std::vector<Decimal8Value> decimal8_vec(int64_vec.size());
+  std::transform(int64_vec.begin(), int64_vec.end(), decimal8_vec.begin(),
+      [] (const int64_t i) { return Decimal8Value(i); });
+
+  TestTypeBatch<Decimal8Value, parquet::Type::BYTE_ARRAY>(decimal8_vec,
+      decimal_size_bytes + sizeof(Decimal8Value::StorageType), stride);
+  TestTypeBatch<Decimal8Value, parquet::Type::FIXED_LEN_BYTE_ARRAY>(decimal8_vec,
+      sizeof(Decimal8Value::StorageType), stride);
+  TestTypeBatch<Decimal8Value, parquet::Type::INT64>(decimal8_vec,
+      sizeof(Decimal8Value::StorageType), stride);
+
+  // Decimal16Value
+  // We do not test the whole 16 byte range as generating random int128_t values is
+  // complicated.
+  std::vector<Decimal16Value> decimal16_vec(int64_vec.size());
+  std::transform(int64_vec.begin(), int64_vec.end(), decimal16_vec.begin(),
+      [] (const int64_t i) { return Decimal16Value(i); });
+
+  TestTypeBatch<Decimal16Value, parquet::Type::BYTE_ARRAY>(decimal16_vec,
+      decimal_size_bytes + sizeof(Decimal16Value::StorageType), stride);
+  TestTypeBatch<Decimal16Value, parquet::Type::FIXED_LEN_BYTE_ARRAY>(decimal16_vec,
+      sizeof(Decimal16Value::StorageType), stride);
+}
+
 TEST(PlainEncoding, DecimalBigEndian) {
   // Test Basic can pass if we make the same error in encode and decode.
   // Verify the bytes are actually big endian.
diff --git a/be/src/testutil/random-vector-generators.h b/be/src/testutil/random-vector-generators.h
new file mode 100644
index 0000000..b26820e
--- /dev/null
+++ b/be/src/testutil/random-vector-generators.h
@@ -0,0 +1,108 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <algorithm>
+#include <limits.h>
+#include <random>
+#include <type_traits>
+
+namespace impala {
+
+/// Generates a vector of numbers with the given length, consisting of random elements.
+/// `NUM_T` must either be an integral or a floating point type.
+/// The distribution is uniform across the range [min, max] (both inclusive).
+template <typename NUM_T, typename Generator = std::mt19937>
+std::vector<NUM_T> RandomNumberVecMinMax(Generator& gen, const int length,
+    const NUM_T min, const NUM_T max) {
+  static_assert(std::is_integral<NUM_T>::value || std::is_floating_point<NUM_T>::value,
+      "An integral or floating point type is needed.");
+
+  using Dist = std::conditional_t<std::is_integral<NUM_T>::value,
+    std::uniform_int_distribution<NUM_T>,
+    std::uniform_real_distribution<NUM_T>>;
+
+  Dist dist(min, max);
+
+  std::vector<NUM_T> vec(length);
+  std::generate(vec.begin(), vec.end(), [&gen, &dist] () {
+      return dist(gen);
+      });
+
+  return vec;
+}
+
+/// Generates a vector of numbers with the given length, consisting of random elements.
+/// `NUM_T` must either be an integral or a floating point type.
+/// The distribution is uniform across all values of the NUM_T.
+template <typename NUM_T, typename Generator = std::mt19937>
+std::vector<NUM_T> RandomNumberVec(Generator& gen, const int length) {
+  return RandomNumberVecMinMax<NUM_T, Generator>(gen, length,
+      std::numeric_limits<NUM_T>::min(), std::numeric_limits<NUM_T>::max());
+}
+
+/// Generates a vector of strings with the given length. The elements are randomly
+/// generated strings. The length of the strings is a random number between
+/// `min_str_length` and `max_str_length` (both inclusive).
+template <typename Generator = std::mt19937>
+std::vector<std::string> RandomStrVec(Generator& gen, const int length,
+    const int max_str_length, const int min_str_length = 0) {
+  std::uniform_int_distribution<int> length_dist(0, max_str_length);
+  std::uniform_int_distribution<char> letter_dist('a', 'z');
+
+  std::vector<std::string> vec(length);
+  std::generate(vec.begin(), vec.end(), [&] () {
+      int str_length = length_dist(gen);
+      std::string s;
+
+      for (int i = 0; i < str_length; i++) {
+        s.push_back(letter_dist(gen));
+      }
+
+      return s;
+      });
+
+  return vec;
+}
+
+/// Generates a vector of `TimestampValue`s with the given length. The elements are random
+/// timestamps with uniform distribution on the valid range.
+template <typename Generator = std::mt19937>
+std::vector<TimestampValue> RandomTimestampVec(Generator& gen, const int length) {
+  const TimestampValue min_date =
+      TimestampValue::Parse("1400-01-01 00:00:00");
+  int64_t min_millis;
+  bool min_success = min_date.FloorUtcToUnixTimeMillis(&min_millis);
+  DCHECK(min_success);
+
+  const TimestampValue max_date =
+      TimestampValue::Parse("9999-12-31 23:59:59");
+  int64_t max_millis;
+  bool max_success = max_date.FloorUtcToUnixTimeMillis(&max_millis);
+  DCHECK(max_success);
+
+  const std::vector<int64_t> unix_time_millis = RandomNumberVecMinMax<int64_t, Generator>(
+      gen, length, min_millis, max_millis);
+  std::vector<TimestampValue> timestamps(length);
+  std::transform(unix_time_millis.begin(), unix_time_millis.end(), timestamps.begin(),
+     TimestampValue::UtcFromUnixTimeMillis);
+
+  return timestamps;
+}
+
+} // namespace impala

[impala] 03/03: Configure Hive 3's HS2 to execute queries using Tez local mode

Posted by jo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 8e97a3b5f68e55bd68b25a6d7966c8eb0d57e6d0
Author: Todd Lipcon <to...@apache.org>
AuthorDate: Wed Apr 3 21:02:59 2019 -0700

    Configure Hive 3's HS2 to execute queries using Tez local mode
    
    Hive 3 no longer supports MR execution, so this sets up the appropriate
    configuration and classpath so that HS2 can run queries using Tez.
    
    The bulk of this patch is toolchain changes to download Tez itself. The
    Tez tarball is slightly odd in that it has no top-level directory, so
    the patch changes around bootstrap_toolchain a bit to support creating
    its own top-level directory for a component.
    
    The remainder of the patch is some classpath setup and hive-site changes
    when Hive 3 is enabled.
    
    So far I tested this manually by setting up a metastore and
    impala-config with USE_CDP_HIVE=true, and then connecting to HS2 using
    
      hive beeline -u 'jdbc:hive2://localhost:11050'
    
    I was able to insert and query data, and was able to verify that queries
    like 'select count(*)' were executing via Tez local mode.
    
    NOTE: this patch relies on a custom build of Tez, based on a private
    branch. I've submitted a PR to Tez upstream, referenced in the commits
    here. Will remove this hack once the PR is accepted and makes its way
    into an official build.
    
    Change-Id: I76e47fbd1d6ff5103d81a8de430d5465dba284cd
    Reviewed-on: http://gerrit.cloudera.org:8080/12931
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Joe McDonnell <jo...@cloudera.com>
---
 bin/bootstrap_toolchain.py             | 55 ++++++++++++++++++++++++++++------
 bin/impala-config.sh                   | 10 +++++++
 fe/pom.xml                             | 12 ++++++++
 fe/src/test/resources/hive-site.xml.py |  8 ++++-
 testdata/bin/run-hive-server.sh        | 15 ++++++++++
 5 files changed, 90 insertions(+), 10 deletions(-)

diff --git a/bin/bootstrap_toolchain.py b/bin/bootstrap_toolchain.py
index 7c44902..d51b5cf 100755
--- a/bin/bootstrap_toolchain.py
+++ b/bin/bootstrap_toolchain.py
@@ -103,6 +103,20 @@ class Package(object):
       self.url = os.environ.get(url_env_var)
 
 
+class CdpComponent(object):
+  def __init__(self, basename, makedir=False):
+    """
+    basename: the name of the file to be downloaded, without its .tar.gz suffix
+    makedir: if false, it is assumed that the downloaded tarball will expand
+             into a directory with the same name as 'basename'. If True, we
+             assume that the tarball doesn't have any top-level directory,
+             and so we need to manually create a directory within which to
+             expand the tarball.
+    """
+    self.basename = basename
+    self.makedir = makedir
+
+
 def try_get_platform_release_label():
   """Gets the right package label from the OS version. Returns an OsMapping with both
      'toolchain' and 'cdh' labels. Return None if not found.
@@ -418,8 +432,12 @@ def download_cdh_components(toolchain_root, cdh_components, url_prefix):
 
 
 def download_cdp_components(cdp_components, url_prefix):
-  """Downloads and unpacks the CDP components for a given URL prefix into
-  $CDP_COMPONENTS_HOME if not found."""
+  """
+  Downloads and unpacks the CDP components for a given URL prefix into
+  $CDP_COMPONENTS_HOME if not found.
+
+  cdp_components: list of CdpComponent instances
+  """
   cdp_components_home = os.environ.get("CDP_COMPONENTS_HOME")
   if not cdp_components_home:
     logging.error("Impala environment not set up correctly, make sure "
@@ -430,12 +448,27 @@ def download_cdp_components(cdp_components, url_prefix):
   if not os.path.exists(cdp_components_home):
     os.makedirs(cdp_components_home)
 
-  def download(component_name):
-    pkg_directory = "{0}/{1}".format(cdp_components_home, component_name)
+  def download(component):
+    pkg_directory = "{0}/{1}".format(cdp_components_home, component.basename)
     if os.path.isdir(pkg_directory): return
-    file_name = "{0}.tar.gz".format(component_name)
+    file_name = "{0}.tar.gz".format(component.basename)
     download_path = "{0}/{1}".format(url_prefix, file_name)
-    wget_and_unpack_package(download_path, file_name, cdp_components_home, False)
+    dst = cdp_components_home
+    if component.makedir:
+      # Download and unpack in a temp directory, which we'll later move into place
+      dst = tempfile.mkdtemp(dir=cdp_components_home)
+    try:
+      wget_and_unpack_package(download_path, file_name, dst, False)
+    except:  # noqa
+      # Clean up any partially-unpacked result.
+      if os.path.isdir(pkg_directory):
+        shutil.rmtree(pkg_directory)
+      # Clean up any temp directory if we made one
+      if component.makedir:
+        shutil.rmtree(dst)
+      raise
+    if component.makedir:
+      os.rename(dst, pkg_directory)
 
   execute_many(download, cdp_components)
 
@@ -533,11 +566,15 @@ if __name__ == "__main__":
 
   cdp_build_number = os.environ["CDP_BUILD_NUMBER"]
   cdp_components = [
-    "ranger-{0}-admin".format(os.environ.get("IMPALA_RANGER_VERSION")),
+    CdpComponent("ranger-{0}-admin".format(os.environ.get("IMPALA_RANGER_VERSION"))),
   ]
+  use_cdp_hive = os.getenv("USE_CDP_HIVE") == "true"
   if use_cdp_hive:
-    cdp_components.append("apache-hive-{0}-bin"
-                          .format(os.environ.get("IMPALA_HIVE_VERSION")))
+    cdp_components.append(CdpComponent("apache-hive-{0}-bin"
+                          .format(os.environ.get("IMPALA_HIVE_VERSION"))))
+    cdp_components.append(CdpComponent(
+        "tez-{0}-minimal".format(os.environ.get("IMPALA_TEZ_VERSION")),
+        makedir=True))
   download_path_prefix = \
     "https://{0}/build/cdp_components/{1}/tarballs".format(toolchain_host,
                                                            cdp_build_number)
diff --git a/bin/impala-config.sh b/bin/impala-config.sh
index 87b5216..709cfcb 100755
--- a/bin/impala-config.sh
+++ b/bin/impala-config.sh
@@ -174,6 +174,16 @@ export KUDU_JAVA_VERSION=1.10.0-cdh6.x-SNAPSHOT
 export USE_CDP_HIVE=${USE_CDP_HIVE-false}
 if $USE_CDP_HIVE; then
   export IMPALA_HIVE_VERSION=3.1.0.6.0.99.0-45
+  # Temporary version of Tez, patched with the fix for TEZ-1348:
+  # https://github.com/apache/tez/pull/40
+  # We'll switch to a non-"todd" version of Tez once that fix is integrated.
+  # For now, if you're bumping the CDP build number, you'll need to download
+  # this tarball from an earlier build and re-upload it to the new directory
+  # in the toolchain bucket.
+  #
+  # TODO(todd) switch to an official build.
+  export IMPALA_TEZ_VERSION=0.10.0-todd-6fcc41e5798b.1
+  export TEZ_HOME="$CDP_COMPONENTS_HOME/tez-${IMPALA_TEZ_VERSION}-minimal"
 else
   export IMPALA_HIVE_VERSION=2.1.1-cdh6.x-SNAPSHOT
 fi
diff --git a/fe/pom.xml b/fe/pom.xml
index bac4555..920b7e9 100644
--- a/fe/pom.xml
+++ b/fe/pom.xml
@@ -313,6 +313,10 @@ under the License.
           <groupId>net.minidev</groupId>
           <artifactId>json-smart</artifactId>
         </exclusion>
+        <exclusion>
+          <groupId>org.apache.calcite.avatica</groupId>
+          <artifactId>avatica</artifactId>
+        </exclusion>
       </exclusions>
     </dependency>
 
@@ -349,6 +353,10 @@ under the License.
           <groupId>net.minidev</groupId>
           <artifactId>json-smart</artifactId>
         </exclusion>
+        <exclusion>
+          <groupId>org.apache.calcite.avatica</groupId>
+          <artifactId>avatica</artifactId>
+        </exclusion>
       </exclusions>
     </dependency>
 
@@ -408,6 +416,10 @@ under the License.
           <groupId>net.minidev</groupId>
           <artifactId>json-smart</artifactId>
         </exclusion>
+        <exclusion>
+          <groupId>org.apache.calcite.avatica</groupId>
+          <artifactId>avatica</artifactId>
+        </exclusion>
       </exclusions>
     </dependency>
 
diff --git a/fe/src/test/resources/hive-site.xml.py b/fe/src/test/resources/hive-site.xml.py
index fb64374..9c0ca7a 100644
--- a/fe/src/test/resources/hive-site.xml.py
+++ b/fe/src/test/resources/hive-site.xml.py
@@ -76,7 +76,13 @@ if kerberize:
   #   hive.metastore.kerberos.keytab.file
   #   hive.metastore.kerberos.principal
 
-if hive_major_version < 3:
+# Enable Tez and ACID for Hive 3
+if hive_major_version >= 3:
+  CONFIG.update({
+   'hive.tez.container.size': '512',
+   'hive.txn.manager': 'org.apache.hadoop.hive.ql.lockmgr.DbTxnManager',
+   'tez.local.mode': 'true'})
+else:
   CONFIG.update({
    # TODO(vihang) Disabled for HMS3.
    'hive.metastore.event.listeners': 'org.apache.sentry.binding.metastore.SentrySyncHMSNotificationsPostEventListener',
diff --git a/testdata/bin/run-hive-server.sh b/testdata/bin/run-hive-server.sh
index 8a6a1ca..6bdaaee 100755
--- a/testdata/bin/run-hive-server.sh
+++ b/testdata/bin/run-hive-server.sh
@@ -90,6 +90,21 @@ HADOOP_CLIENT_OPTS="-Xmx2024m -Dhive.log.file=hive-metastore.log" hive \
 ${CLUSTER_BIN}/wait-for-metastore.py --transport=${METASTORE_TRANSPORT}
 
 if [ ${ONLY_METASTORE} -eq 0 ]; then
+  # For Hive 3, we use Tez for execution. We have to add it to the HS2 classpath.
+  if $USE_CDP_HIVE; then
+    export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_HOME}/*
+    # This is a little hacky, but Tez bundles a bunch of junk into lib/, such
+    # as extra copies of the hadoop libraries, etc, and we want to avoid conflicts.
+    # So, we'll be a bit choosy about what we add to the classpath here.
+    for jar in $TEZ_HOME/lib/* ; do
+      case $(basename $jar) in
+        commons-*|RoaringBitmap*)
+          export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$jar
+          ;;
+      esac
+    done
+  fi
+
   # Starts a HiveServer2 instance on the port specified by the HIVE_SERVER2_THRIFT_PORT
   # environment variable. HADOOP_HEAPSIZE should be set to at least 2048 to avoid OOM
   # when loading ORC tables like widerow.

[impala] 02/03: IMPALA-8444: Fix performance regression when building privilege name

Posted by jo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 349142635250f92412003a1e62829fc4b441dc75
Author: Fredy Wijaya <fw...@cloudera.com>
AuthorDate: Tue Apr 23 15:35:01 2019 -0700

    IMPALA-8444: Fix performance regression when building privilege name
    
    This patch fixes the performance regression when building privilege name
    by rewriting PrincipalPrivilege.buildPrivilegeName() with a simple
    string concatentation instead of using a list that gets converted into a
    string.
    
    Below is the result of running a benchmark using JMH comparing the old
    and new implementations:
    
    Result "org.apache.impala.BuildPrivilegeNameBenchmark.fast":
      0.344 ±(99.9%) 0.004 us/op [Average]
      (min, avg, max) = (0.336, 0.344, 0.355), stdev = 0.005
      CI (99.9%): [0.339, 0.348] (assumes normal distribution)
    
    Result "org.apache.impala.BuildPrivilegeNameBenchmark.slow":
      0.831 ±(99.9%) 0.011 us/op [Average]
      (min, avg, max) = (0.807, 0.831, 0.856), stdev = 0.015
      CI (99.9%): [0.820, 0.842] (assumes normal distribution)
    
    Benchmark                         Mode  Cnt  Score   Error Units
    BuildPrivilegeNameBenchmark.fast  avgt   25  0.344 ± 0.004 us/op
    BuildPrivilegeNameBenchmark.slow  avgt   25  0.831 ± 0.011 us/op
    
    This patch also updates SentryAuthorizationPolicy.listPrivileges() to
    reuse the privilege names that have already been built instead of
    building them again. While fixing this, I found a bug where Principal
    stores the PrincipalPrivilege in a case insensitive way. This is true
    for all privilege scopes, except URI. This patch fixes the issue by
    making privilege name to be case sensitive instead.
    
    This patch removes incorrect synchronization in
    SentryAuthorizationPolicy.listPrivileges() that can cause the operation
    to run in serial in a highly concurrent workload.
    
    Testing:
    - Ran all FE tests
    - Ran all E2E authorization tests
    - Added E2E test for privilege name case sensitivity bug
    
    Change-Id: I942d9b55f07c8972f69e532567d9b7d80fceb6e5
    Reviewed-on: http://gerrit.cloudera.org:8080/13095
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../impala/authorization/AuthorizationPolicy.java  |  22 ---
 .../sentry/SentryAuthorizationPolicy.java          |  27 +---
 .../impala/authorization/sentry/SentryProxy.java   |   2 +-
 .../java/org/apache/impala/catalog/Catalog.java    |   5 +-
 .../apache/impala/catalog/CatalogObjectCache.java  |   2 +-
 .../java/org/apache/impala/catalog/Principal.java  |  10 +-
 .../apache/impala/catalog/PrincipalPrivilege.java  | 152 +++++++++++----------
 tests/authorization/test_grant_revoke.py           |  47 ++++++-
 8 files changed, 141 insertions(+), 126 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/authorization/AuthorizationPolicy.java b/fe/src/main/java/org/apache/impala/authorization/AuthorizationPolicy.java
index adbc004..33b7c51 100644
--- a/fe/src/main/java/org/apache/impala/authorization/AuthorizationPolicy.java
+++ b/fe/src/main/java/org/apache/impala/authorization/AuthorizationPolicy.java
@@ -162,28 +162,6 @@ public class AuthorizationPolicy {
   }
 
   /**
-   * Removes a privilege from the policy mapping to the role specified by the principal ID
-   * in the privilege. Throws a CatalogException if no role with a corresponding ID exists
-   * in the catalog. Returns null if no matching privilege is found in this principal.
-   */
-  public synchronized PrincipalPrivilege removePrivilege(PrincipalPrivilege privilege)
-      throws CatalogException {
-    Principal principal = getPrincipal(privilege.getPrincipalId(),
-        privilege.getPrincipalType());
-    if (principal == null) {
-      throw new CatalogException(String.format("Error removing privilege: %s. %s ID " +
-          "'%d' does not exist.", privilege.getName(),
-          Principal.toString(privilege.getPrincipalType()), privilege.getPrincipalId()));
-    }
-    if (LOG.isTraceEnabled()) {
-      LOG.trace("Removing privilege: " + privilege.getName() + " from " +
-          Principal.toString(privilege.getPrincipalType()).toLowerCase() + ": " +
-          principal.getName() + " with ID: " + principal.getId());
-    }
-    return principal.removePrivilege(privilege.getName());
-  }
-
-  /**
    * Returns all roles in the policy. Returns an empty list if no roles exist.
    */
   public synchronized List<Role> getAllRoles() {
diff --git a/fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java b/fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java
index 9c31421..1adfec8 100644
--- a/fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java
+++ b/fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java
@@ -20,7 +20,6 @@ package org.apache.impala.authorization.sentry;
 import com.google.common.base.Preconditions;
 import com.google.common.collect.Sets;
 import org.apache.impala.authorization.AuthorizationPolicy;
-import org.apache.impala.catalog.PrincipalPrivilege;
 import org.apache.impala.catalog.Role;
 import org.apache.impala.catalog.User;
 import org.apache.sentry.core.common.ActiveRoleSet;
@@ -53,7 +52,7 @@ public class SentryAuthorizationPolicy implements PrivilegeCache {
    * Returns a set of privilege strings in Sentry format.
    */
   @Override
-  public synchronized Set<String> listPrivileges(Set<String> groups,
+  public Set<String> listPrivileges(Set<String> groups,
       ActiveRoleSet roleSet) {
     Set<String> privileges = Sets.newHashSet();
     if (roleSet != ActiveRoleSet.ALL) {
@@ -64,16 +63,7 @@ public class SentryAuthorizationPolicy implements PrivilegeCache {
     for (String groupName: groups) {
       List<Role> grantedRoles = authzPolicy_.getGrantedRoles(groupName);
       for (Role role: grantedRoles) {
-        for (PrincipalPrivilege privilege: role.getPrivileges()) {
-          String authorizable = privilege.getName();
-          if (authorizable == null) {
-            if (LOG.isTraceEnabled()) {
-              LOG.trace("Ignoring invalid privilege: " + privilege.getName());
-            }
-            continue;
-          }
-          privileges.add(authorizable);
-        }
+        privileges.addAll(role.getPrivilegeNames());
       }
     }
     return privileges;
@@ -83,22 +73,13 @@ public class SentryAuthorizationPolicy implements PrivilegeCache {
    * Returns a set of privilege strings in Sentry format.
    */
   @Override
-  public synchronized Set<String> listPrivileges(Set<String> groups, Set<String> users,
+  public Set<String> listPrivileges(Set<String> groups, Set<String> users,
       ActiveRoleSet roleSet) {
     Set<String> privileges = listPrivileges(groups, roleSet);
     for (String userName: users) {
       User user = authzPolicy_.getUser(userName);
       if (user != null) {
-        for (PrincipalPrivilege privilege: user.getPrivileges()) {
-          String authorizable = privilege.getName();
-          if (authorizable == null) {
-            if (LOG.isTraceEnabled()) {
-              LOG.trace("Ignoring invalid privilege: " + privilege.getName());
-            }
-            continue;
-          }
-          privileges.add(authorizable);
-        }
+        privileges.addAll(user.getPrivilegeNames());
       }
     }
     return privileges;
diff --git a/fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java b/fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java
index d2f7c9a..fa6265d 100644
--- a/fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java
+++ b/fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java
@@ -337,7 +337,7 @@ public class SentryProxy {
       TPrivilege thriftPriv =
           SentryPolicyService.sentryPrivilegeToTPrivilege(sentryPriv, principal);
       String privilegeName = PrincipalPrivilege.buildPrivilegeName(thriftPriv);
-      privilegesToRemove.remove(privilegeName.toLowerCase());
+      privilegesToRemove.remove(privilegeName);
       PrincipalPrivilege existingPrincipalPriv = principal.getPrivilege(privilegeName);
       // We already know about this privilege (privileges cannot be modified).
       if (existingPrincipalPriv != null &&
diff --git a/fe/src/main/java/org/apache/impala/catalog/Catalog.java b/fe/src/main/java/org/apache/impala/catalog/Catalog.java
index a7615af..3b22865 100644
--- a/fe/src/main/java/org/apache/impala/catalog/Catalog.java
+++ b/fe/src/main/java/org/apache/impala/catalog/Catalog.java
@@ -579,9 +579,8 @@ public abstract class Catalog implements AutoCloseable {
         // The combination of privilege name + principal ID + principal type is
         // guaranteed to be unique.
         return "PRIVILEGE:" +
-            PrincipalPrivilege.buildPrivilegeName(catalogObject.getPrivilege())
-                .toLowerCase() + "." +
-            Integer.toString(catalogObject.getPrivilege().getPrincipal_id()) + "." +
+            PrincipalPrivilege.buildPrivilegeName(catalogObject.getPrivilege()) + "." +
+            catalogObject.getPrivilege().getPrincipal_id() + "." +
             catalogObject.getPrivilege().getPrincipal_type();
       case HDFS_CACHE_POOL:
         return "HDFS_CACHE_POOL:" +
diff --git a/fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java b/fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java
index d613cf9..d9bfc14 100644
--- a/fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java
+++ b/fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java
@@ -155,4 +155,4 @@ public class CatalogObjectCache<T extends CatalogObject> implements Iterable<T>
   public Iterator<T> iterator() {
     return metadataCache_.values().iterator();
   }
-}
+}
\ No newline at end of file
diff --git a/fe/src/main/java/org/apache/impala/catalog/Principal.java b/fe/src/main/java/org/apache/impala/catalog/Principal.java
index 650eac4..1012fb2 100644
--- a/fe/src/main/java/org/apache/impala/catalog/Principal.java
+++ b/fe/src/main/java/org/apache/impala/catalog/Principal.java
@@ -17,6 +17,8 @@
 
 package org.apache.impala.catalog;
 
+import java.util.ArrayList;
+import java.util.HashSet;
 import java.util.List;
 import java.util.Set;
 import java.util.concurrent.atomic.AtomicInteger;
@@ -37,9 +39,9 @@ public abstract class Principal extends CatalogObjectImpl {
   private final TPrincipal principal_;
   // The last principal ID assigned, starts at 0.
   private static AtomicInteger principalId_ = new AtomicInteger(0);
-
+  // URIs are case sensitive, so we need to store privilege names in a case sensitive way.
   private final CatalogObjectCache<PrincipalPrivilege> principalPrivileges_ =
-      new CatalogObjectCache<>();
+      new CatalogObjectCache<>(false);
 
   protected Principal(String principalName, TPrincipalType type,
       Set<String> grantGroups) {
@@ -68,7 +70,7 @@ public abstract class Principal extends CatalogObjectImpl {
    * principal, an empty list is returned.
    */
   public List<PrincipalPrivilege> getPrivileges() {
-    return Lists.newArrayList(principalPrivileges_.getValues());
+    return new ArrayList<>(principalPrivileges_.getValues());
   }
 
   /**
@@ -76,7 +78,7 @@ public abstract class Principal extends CatalogObjectImpl {
    * granted to the principal.
    */
   public Set<String> getPrivilegeNames() {
-    return Sets.newHashSet(principalPrivileges_.keySet());
+    return new HashSet<>(principalPrivileges_.keySet());
   }
 
   /**
diff --git a/fe/src/main/java/org/apache/impala/catalog/PrincipalPrivilege.java b/fe/src/main/java/org/apache/impala/catalog/PrincipalPrivilege.java
index 2b97dc4..1b2da26 100644
--- a/fe/src/main/java/org/apache/impala/catalog/PrincipalPrivilege.java
+++ b/fe/src/main/java/org/apache/impala/catalog/PrincipalPrivilege.java
@@ -17,8 +17,6 @@
 
 package org.apache.impala.catalog;
 
-import java.util.List;
-
 import org.apache.impala.authorization.AuthorizationPolicy;
 import org.apache.impala.thrift.TCatalogObject;
 import org.apache.impala.thrift.TCatalogObjectType;
@@ -28,9 +26,7 @@ import org.apache.impala.thrift.TPrivilegeLevel;
 import org.apache.impala.thrift.TPrivilegeScope;
 import org.apache.log4j.Logger;
 
-import com.google.common.base.Joiner;
 import com.google.common.base.Preconditions;
-import com.google.common.collect.Lists;
 
 /**
  * Represents a privilege that has been granted to a principal in an authorization policy.
@@ -38,11 +34,8 @@ import com.google.common.collect.Lists;
  */
 public class PrincipalPrivilege extends CatalogObjectImpl {
   private static final Logger LOG = Logger.getLogger(AuthorizationPolicy.class);
-  // These Joiners are used to build principal names. For simplicity, the principal name
-  // we use can also be sent to the Sentry library to perform authorization checks
-  // so we build them in the same format.
-  private static final Joiner AUTHORIZABLE_JOINER = Joiner.on("->");
-  private static final Joiner KV_JOINER = Joiner.on("=");
+  private static final String AUTHORIZABLE_SEPARATOR = "->";
+  private static final String KV_SEPARATOR = "=";
   private final TPrivilege privilege_;
 
   private PrincipalPrivilege(TPrivilege privilege) {
@@ -61,70 +54,89 @@ public class PrincipalPrivilege extends CatalogObjectImpl {
    * [ServerName=value]->[DbName=value]->[TableName=value]->[ColumnName=value]->[Action Granted=value]->[Grant Option=value]
    */
   public static String buildPrivilegeName(TPrivilege privilege) {
-    List<String> authorizable = Lists.newArrayListWithExpectedSize(4);
-    try {
-      Preconditions.checkNotNull(privilege);
-      TPrivilegeScope scope = privilege.getScope();
-      Preconditions.checkNotNull(scope);
-      switch (scope) {
-        case SERVER: {
-          authorizable.add(KV_JOINER.join("server", privilege.getServer_name().
-              toLowerCase()));
-          break;
-        }
-        case URI: {
-          authorizable.add(KV_JOINER.join("server", privilege.getServer_name().
-              toLowerCase()));
-          // (IMPALA-2695) URIs are case sensitive
-          authorizable.add(KV_JOINER.join("uri", privilege.getUri()));
-          break;
-        }
-        case DATABASE: {
-          authorizable.add(KV_JOINER.join("server", privilege.getServer_name().
-              toLowerCase()));
-          authorizable.add(KV_JOINER.join("db", privilege.getDb_name().
-              toLowerCase()));
-          break;
-        }
-        case TABLE: {
-          authorizable.add(KV_JOINER.join("server", privilege.getServer_name().
-              toLowerCase()));
-          authorizable.add(KV_JOINER.join("db", privilege.getDb_name().
-              toLowerCase()));
-          authorizable.add(KV_JOINER.join("table", privilege.getTable_name().
-              toLowerCase()));
-          break;
-        }
-        case COLUMN: {
-          authorizable.add(KV_JOINER.join("server", privilege.getServer_name().
-              toLowerCase()));
-          authorizable.add(KV_JOINER.join("db", privilege.getDb_name().
-              toLowerCase()));
-          authorizable.add(KV_JOINER.join("table", privilege.getTable_name().
-              toLowerCase()));
-          authorizable.add(KV_JOINER.join("column", privilege.getColumn_name().
-              toLowerCase()));
-          break;
-        }
-        default: {
-          throw new UnsupportedOperationException(
-              "Unknown privilege scope: " + scope.toString());
-        }
+    StringBuilder privilegeName = new StringBuilder();
+    Preconditions.checkNotNull(privilege);
+    TPrivilegeScope scope = privilege.getScope();
+    Preconditions.checkNotNull(scope);
+    switch (scope) {
+      case SERVER: {
+        privilegeName.append("server")
+            .append(KV_SEPARATOR)
+            .append(privilege.getServer_name().toLowerCase());
+        break;
       }
-
-      // The ALL privilege is always implied and does not need to be included as part
-      // of the name.
-      if (privilege.getPrivilege_level() != TPrivilegeLevel.ALL) {
-        authorizable.add(KV_JOINER.join("action",
-            privilege.getPrivilege_level().toString()));
+      case URI: {
+        privilegeName.append("server")
+            .append(KV_SEPARATOR)
+            .append(privilege.getServer_name().toLowerCase());
+        privilegeName.append(AUTHORIZABLE_SEPARATOR);
+        // (IMPALA-2695) URIs are case sensitive
+        privilegeName.append("uri")
+            .append(KV_SEPARATOR)
+            .append(privilege.getUri());
+        break;
+      }
+      case DATABASE: {
+        privilegeName.append("server")
+            .append(KV_SEPARATOR)
+            .append(privilege.getServer_name().toLowerCase());
+        privilegeName.append(AUTHORIZABLE_SEPARATOR);
+        privilegeName.append("db")
+            .append(KV_SEPARATOR)
+            .append(privilege.getDb_name().toLowerCase());
+        break;
+      }
+      case TABLE: {
+        privilegeName.append("server")
+            .append(KV_SEPARATOR)
+            .append(privilege.getServer_name().toLowerCase());
+        privilegeName.append(AUTHORIZABLE_SEPARATOR);
+        privilegeName.append("db")
+            .append(KV_SEPARATOR)
+            .append(privilege.getDb_name().toLowerCase());
+        privilegeName.append(AUTHORIZABLE_SEPARATOR);
+        privilegeName.append("table")
+            .append(KV_SEPARATOR)
+            .append(privilege.getTable_name().toLowerCase());
+        break;
       }
-      authorizable.add(KV_JOINER.join("grantoption", privilege.isHas_grant_opt()));
-      return AUTHORIZABLE_JOINER.join(authorizable);
-    } catch (Exception e) {
-      // Should never make it here unless the privilege is malformed.
-      LOG.error("ERROR: ", e);
-      return null;
+      case COLUMN: {
+        privilegeName.append("server")
+            .append(KV_SEPARATOR)
+            .append(privilege.getServer_name().toLowerCase());
+        privilegeName.append(AUTHORIZABLE_SEPARATOR);
+        privilegeName.append("db")
+            .append(KV_SEPARATOR)
+            .append(privilege.getDb_name().toLowerCase());
+        privilegeName.append(AUTHORIZABLE_SEPARATOR);
+        privilegeName.append("table")
+            .append(KV_SEPARATOR)
+            .append(privilege.getTable_name().toLowerCase());
+        privilegeName.append(AUTHORIZABLE_SEPARATOR);
+        privilegeName.append("column")
+            .append(KV_SEPARATOR)
+            .append(privilege.getColumn_name().toLowerCase());
+        break;
+      }
+      default: {
+        throw new UnsupportedOperationException(
+            "Unknown privilege scope: " + scope.toString());
+      }
+    }
+
+    // The ALL privilege is always implied and does not need to be included as part
+    // of the name.
+    if (privilege.getPrivilege_level() != TPrivilegeLevel.ALL) {
+      privilegeName.append(AUTHORIZABLE_SEPARATOR);
+      privilegeName.append("action")
+          .append(KV_SEPARATOR)
+          .append(privilege.getPrivilege_level().toString().toLowerCase());
     }
+    privilegeName.append(AUTHORIZABLE_SEPARATOR);
+    privilegeName.append("grantoption")
+        .append(KV_SEPARATOR)
+        .append(privilege.isHas_grant_opt());
+    return privilegeName.toString();
   }
 
   /**
diff --git a/tests/authorization/test_grant_revoke.py b/tests/authorization/test_grant_revoke.py
index 0dcd423..542f749 100644
--- a/tests/authorization/test_grant_revoke.py
+++ b/tests/authorization/test_grant_revoke.py
@@ -347,10 +347,10 @@ class TestGrantRevoke(SentryCacheTestSuite):
                     "org.apache.impala.service.CustomClusterResourceAuthorizationProvider"
                     .format(SENTRY_CONFIG_FILE_OO),
       sentry_config=SENTRY_CONFIG_FILE_OO)
-  def test_same_name_for_role_and_user_invalidate_metadata(self, testid_checksum):
+  def test_same_name_for_role_and_user_invalidate_metadata(self, unique_name):
     """IMPALA-7729: Tests running invalidate metadata with for the same name for both
     user and role should not cause Impala query to hang."""
-    db_prefix = testid_checksum
+    db_prefix = unique_name
     role_name = "foobar"
     # Use two different clients so that the sessions will use two different user names.
     foobar_impalad_client = self.create_impala_client()
@@ -380,3 +380,46 @@ class TestGrantRevoke(SentryCacheTestSuite):
       self.client.execute("drop database {0}_db1".format(db_prefix))
       self.client.execute("drop database {0}_db2".format(db_prefix))
       self.client.execute("drop role {0}".format(role_name))
+
+  @pytest.mark.execute_serially
+  @SentryCacheTestSuite.with_args(
+      impalad_args="--server_name=server1",
+      catalogd_args="--sentry_config={0}".format(SENTRY_CONFIG_FILE),
+      sentry_config=SENTRY_CONFIG_FILE)
+  def test_uri_privilege_case_sensitive(self, unique_role):
+    """Tests that revoking on a granted URI with a different case should not be
+       allowed."""
+    try:
+      self.client.execute("create role {0}".format(unique_role))
+      self.client.execute("grant role {0} to group `{1}`"
+                          .format(unique_role, grp.getgrnam(getuser()).gr_name))
+      self.client.execute("grant refresh on server to {0}".format(unique_role))
+      self.client.execute("grant all on uri '/test-warehouse/FOO' to {0}"
+                          .format(unique_role))
+      self.client.execute("grant all on uri '/test-warehouse/foo' to {0}"
+                          .format(unique_role))
+      result = self.client.execute("show grant role {0}".format(unique_role))
+
+      # Both URIs should exist.
+      assert any("/test-warehouse/FOO" in x for x in result.data)
+      assert any("/test-warehouse/foo" in x for x in result.data)
+
+      self.client.execute("refresh authorization")
+      # After refresh authorization, we should expect both URIs to still exist.
+      assert any("/test-warehouse/FOO" in x for x in result.data)
+      assert any("/test-warehouse/foo" in x for x in result.data)
+
+      self.client.execute("revoke all on uri '/test-warehouse/foo' from {0}"
+                          .format(unique_role))
+
+      result = self.client.execute("show grant role {0}".format(unique_role))
+      # Only one URI should exist.
+      assert any("/test-warehouse/FOO" in x for x in result.data)
+      assert not any("/test-warehouse/foo" in x for x in result.data)
+
+      self.client.execute("refresh authorization")
+      # After refresh authorization, the one URI should still exist.
+      assert any("/test-warehouse/FOO" in x for x in result.data)
+      assert not any("/test-warehouse/foo" in x for x in result.data)
+    finally:
+      self.client.execute("drop role {0}".format(unique_role))