You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2018/01/25 21:46:19 UTC

[01/11] impala git commit: [DOCS] IMPALA-6105 unix_timestamp returns a number of seconds

Repository: impala
Updated Branches:
  refs/heads/2.x 1bb3547e4 -> 42ce1f959


[DOCS] IMPALA-6105 unix_timestamp returns a number of seconds

Added a note that the unix_timestamp function returns a number of
seconds.

Change-Id: Ie502322368c0052d4767191cf8f7ecb7ac5e7a16
Reviewed-on: http://gerrit.cloudera.org:8080/9084
Reviewed-by: John Russell <jr...@cloudera.com>
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/725aeac6
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/725aeac6
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/725aeac6

Branch: refs/heads/2.x
Commit: 725aeac64420dfc82cd793e0849eb02afbeea2c4
Parents: 7103eac
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Fri Jan 19 15:20:57 2018 -0800
Committer: Philip Zeyliger <ph...@cloudera.com>
Committed: Wed Jan 24 10:17:56 2018 -0800

----------------------------------------------------------------------
 docs/topics/impala_datetime_functions.xml | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/725aeac6/docs/topics/impala_datetime_functions.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_datetime_functions.xml b/docs/topics/impala_datetime_functions.xml
index f4d062a..d9c77dc 100644
--- a/docs/topics/impala_datetime_functions.xml
+++ b/docs/topics/impala_datetime_functions.xml
@@ -2526,9 +2526,12 @@ select now() + interval 2 weeks as 2_weeks_from_now,
 
         <dd>
           <indexterm audience="hidden">unix_timestamp() function</indexterm>
-          <b>Purpose:</b> Returns an integer value representing the current date and time as a delta from the Unix
-          epoch, or converts from a specified date and time value represented as a <codeph>TIMESTAMP</codeph> or
-          <codeph>STRING</codeph>.
+          <b>Purpose:</b> Returns a Unix time, which is a number of seconds
+          elapsed since '1970-01-01 00:00:00' UTC. If called with no
+          argument, the current date and time is converted to its Unix time. If
+          called with arguments, the first argument represented as the
+          <codeph>TIMESTAMP</codeph> or <codeph>STRING</codeph> is converted to
+          its Unix time.
           <p>
             <b>Return type:</b> <codeph rev="2.2.0">bigint</codeph>
           </p>


[09/11] impala git commit: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by ta...@apache.org.
IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of
characters.

This patch generalizes ltrim()/rtrim() functions to accept a second
argument that specifies the set of characters to be removed from the
leading/trailing end of the target string:

ltrim(string text[, characters text])
rtrim(string text[, characters text])

A common string trimming method has been added to StringFunctions,
which is called from the general ltrim/rtrim/btrim functions. The
functions also share prepare and close operations.

New StringFunctions tests have been added to ExprTest for the new
forms of ltrim() and rtrim(). New tests to cover handling of special
characters have also been added.

Note that our string handling functions only work with the ASCII
character set. Handling of other character sets lies outside the
scope of this patch.

The existing ltrim()/rtrim()/trim() functions that take only one
argument have been updated to use the more general methods.

Testing: Queries like the following were run on a 1.5-billion row
tpch_parquet.lineitem table, with the old and new implementations
to ensure there is no performance regression:

  1. select count(trim(l_shipinstruct)), count(trim(l_returnflag)), ...
  2. select count(*) from t where trim(l_shipinstruct) = '' and ...

Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Reviewed-on: http://gerrit.cloudera.org:8080/8349
Reviewed-by: Michael Ho <kw...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/cf2e4828
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/cf2e4828
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/cf2e4828

Branch: refs/heads/2.x
Commit: cf2e48287c1cad88411f9841ead15eae2a046504
Parents: 725aeac
Author: Zoram Thanga <zo...@cloudera.com>
Authored: Fri Oct 20 14:20:26 2017 -0700
Committer: Philip Zeyliger <ph...@cloudera.com>
Committed: Wed Jan 24 10:17:57 2018 -0800

----------------------------------------------------------------------
 be/src/exprs/expr-test.cc                    |  43 ++++++
 be/src/exprs/string-functions-ir.cc          | 153 +++++++++++-----------
 be/src/exprs/string-functions.h              |  44 +++++--
 common/function-registry/impala_functions.py |  26 +++-
 4 files changed, 178 insertions(+), 88 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/cf2e4828/be/src/exprs/expr-test.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/expr-test.cc b/be/src/exprs/expr-test.cc
index 2df1433..c6e81f1 100644
--- a/be/src/exprs/expr-test.cc
+++ b/be/src/exprs/expr-test.cc
@@ -3744,6 +3744,46 @@ TEST_F(ExprTest, StringFunctions) {
   TestStringValue("rtrim('abc  defg')", "abc  defg");
   TestIsNull("rtrim(NULL)", TYPE_STRING);
 
+  TestStringValue("ltrim('%%%%%abcdefg%%%%%', '%')", "abcdefg%%%%%");
+  TestStringValue("ltrim('%%%%%abcdefg', '%')", "abcdefg");
+  TestStringValue("ltrim('abcdefg%%%%%', '%')", "abcdefg%%%%%");
+  TestStringValue("ltrim('%%%%%abc%%defg', '%')", "abc%%defg");
+  TestStringValue("ltrim('abcdefg', 'abc')", "defg");
+  TestStringValue("ltrim('    abcdefg', ' ')", "abcdefg");
+  TestStringValue("ltrim('abacdefg', 'abc')", "defg");
+  TestStringValue("ltrim('abacdefgcab', 'abc')", "defgcab");
+  TestStringValue("ltrim('abcacbbacbcacabcba', 'abc')", "");
+  TestStringValue("ltrim('', 'abc')", "");
+  TestStringValue("ltrim('     ', 'abc')", "     ");
+  TestIsNull("ltrim(NULL, 'abc')", TYPE_STRING);
+  TestStringValue("ltrim('abcdefg', NULL)", "abcdefg");
+  TestStringValue("ltrim('abcdefg', '')", "abcdefg");
+  TestStringValue("ltrim('abcdabcdabc', 'abc')", "dabcdabc");
+  TestStringValue("ltrim('aaaaaaaaa', 'a')", "");
+  TestStringValue("ltrim('abcdefg', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabg')", "cdefg");
+  TestStringValue("ltrim('æeioü','æü')", "eioü");
+  TestStringValue("ltrim('\\\\abcdefg', 'a\\\\')", "bcdefg");
+
+  TestStringValue("rtrim('%%%%%abcdefg%%%%%', '%')", "%%%%%abcdefg");
+  TestStringValue("rtrim('%%%%%abcdefg', '%')", "%%%%%abcdefg");
+  TestStringValue("rtrim('abcdefg%%%%%', '%')", "abcdefg");
+  TestStringValue("rtrim('abc%%defg%%%%%', '%')", "abc%%defg");
+  TestStringValue("rtrim('abcdefg', 'abc')", "abcdefg");
+  TestStringValue("rtrim('abcdefg    ', ' ')", "abcdefg");
+  TestStringValue("rtrim('abacdefg', 'efg')", "abacd");
+  TestStringValue("rtrim('abacdefgcab', 'abc')", "abacdefg");
+  TestStringValue("rtrim('abcacbbacbcacabcba', 'abc')", "");
+  TestStringValue("rtrim('', 'abc')", "");
+  TestStringValue("rtrim('     ', 'abc')", "     ");
+  TestIsNull("rtrim(NULL, 'abc')", TYPE_STRING);
+  TestStringValue("rtrim('abcdefg', NULL)", "abcdefg");
+  TestStringValue("rtrim('abcdefg', '')", "abcdefg");
+  TestStringValue("rtrim('abcdabcdabc', 'abc')", "abcdabcd");
+  TestStringValue("rtrim('aaaaaaaaa', 'a')", "");
+  TestStringValue("rtrim('abcdefg', 'eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeefg')", "abcd");
+  TestStringValue("rtrim('æeioü','æü')", "æeio");
+  TestStringValue("rtrim('abcdefg\\\\', 'g\\\\')", "abcdef");
+
   TestStringValue("btrim('     abcdefg   ')", "abcdefg");
   TestStringValue("btrim('     abcdefg')", "abcdefg");
   TestStringValue("btrim('abcdefg      ')", "abcdefg");
@@ -3762,11 +3802,14 @@ TEST_F(ExprTest, StringFunctions) {
   TestStringValue("btrim('abacdefgcab', 'abc')", "defg");
   TestStringValue("btrim('abcacbbacbcacabcba', 'abc')", "");
   TestStringValue("btrim('', 'abc')", "");
+  TestStringValue("btrim('     ', 'abc')", "     ");
+  TestIsNull("btrim(NULL, 'abc')", TYPE_STRING);
   TestStringValue("btrim('abcdefg', NULL)", "abcdefg");
   TestStringValue("btrim('abcdabcdabc', 'abc')", "dabcd");
   TestStringValue("btrim('aaaaaaaaa', 'a')", "");
   TestStringValue("btrim('abcdefg', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabg')", "cdef");
   TestStringValue("btrim('æeioü','æü')", "eio");
+  TestStringValue("btrim('\\\\abcdefg\\\\', 'ag\\\\')", "bcdef");
 
   TestStringValue("space(0)", "");
   TestStringValue("space(-1)", "");

http://git-wip-us.apache.org/repos/asf/impala/blob/cf2e4828/be/src/exprs/string-functions-ir.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/string-functions-ir.cc b/be/src/exprs/string-functions-ir.cc
index 49bc8c1..50378bd 100644
--- a/be/src/exprs/string-functions-ir.cc
+++ b/be/src/exprs/string-functions-ir.cc
@@ -21,7 +21,6 @@
 #include <stdint.h>
 #include <re2/re2.h>
 #include <re2/stringpiece.h>
-#include <bitset>
 
 #include <boost/static_assert.hpp>
 
@@ -400,41 +399,97 @@ StringVal StringFunctions::Translate(FunctionContext* context, const StringVal&
   return result;
 }
 
-StringVal StringFunctions::Trim(FunctionContext* context, const StringVal& str) {
+void StringFunctions::TrimPrepare(
+    FunctionContext* context, FunctionContext::FunctionStateScope scope) {
+  if (scope != FunctionContext::THREAD_LOCAL) return;
+  // Create a bitset to hold the unique characters to trim.
+  bitset<256>* unique_chars = new bitset<256>();
+  context->SetFunctionState(scope, unique_chars);
+  // If the caller didn't specify the set of characters to trim, it means
+  // that we're only trimming whitespace. Return early in that case.
+  // There can be either 1 or 2 arguments.
+  DCHECK(context->GetNumArgs() == 1 || context->GetNumArgs() == 2);
+  if (context->GetNumArgs() == 1) {
+    unique_chars->set(static_cast<int>(' '), true);
+    return;
+  }
+  if (!context->IsArgConstant(1)) return;
+  DCHECK_EQ(context->GetArgType(1)->type, FunctionContext::TYPE_STRING);
+  StringVal* chars_to_trim = reinterpret_cast<StringVal*>(context->GetConstantArg(1));
+  if (chars_to_trim->is_null) return; // We shouldn't peek into Null StringVals
+  for (int32_t i = 0; i < chars_to_trim->len; ++i) {
+    unique_chars->set(static_cast<int>(chars_to_trim->ptr[i]), true);
+  }
+}
+
+void StringFunctions::TrimClose(
+    FunctionContext* context, FunctionContext::FunctionStateScope scope) {
+  if (scope != FunctionContext::THREAD_LOCAL) return;
+  bitset<256>* unique_chars = reinterpret_cast<bitset<256>*>(
+      context->GetFunctionState(scope));
+  delete unique_chars;
+  context->SetFunctionState(scope, nullptr);
+}
+
+template <StringFunctions::TrimPosition D, bool IS_IMPLICIT_WHITESPACE>
+StringVal StringFunctions::DoTrimString(FunctionContext* ctx,
+    const StringVal& str, const StringVal& chars_to_trim) {
   if (str.is_null) return StringVal::null();
+  bitset<256>* unique_chars = reinterpret_cast<bitset<256>*>(
+      ctx->GetFunctionState(FunctionContext::THREAD_LOCAL));
+  // When 'chars_to_trim' is unique for each element (e.g. when 'chars_to_trim'
+  // is each element of a table column), we need to prepare a bitset of unique
+  // characters here instead of using the bitset from function context.
+  if (!IS_IMPLICIT_WHITESPACE && !ctx->IsArgConstant(1)) {
+    if (chars_to_trim.is_null) return str;
+    unique_chars->reset();
+    for (int32_t i = 0; i < chars_to_trim.len; ++i) {
+      unique_chars->set(static_cast<int>(chars_to_trim.ptr[i]), true);
+    }
+  }
   // Find new starting position.
   int32_t begin = 0;
-  while (begin < str.len && str.ptr[begin] == ' ') {
-    ++begin;
+  int32_t end = str.len - 1;
+  if (D == LEADING || D == BOTH) {
+    while (begin < str.len &&
+        unique_chars->test(static_cast<int>(str.ptr[begin]))) {
+      ++begin;
+    }
   }
   // Find new ending position.
-  int32_t end = str.len - 1;
-  while (end > begin && str.ptr[end] == ' ') {
-    --end;
+  if (D == TRAILING || D == BOTH) {
+    while (end >= begin && unique_chars->test(static_cast<int>(str.ptr[end]))) {
+      --end;
+    }
   }
   return StringVal(str.ptr + begin, end - begin + 1);
 }
 
+StringVal StringFunctions::Trim(FunctionContext* context, const StringVal& str) {
+  return DoTrimString<BOTH, true>(context, str, StringVal(" "));
+}
+
 StringVal StringFunctions::Ltrim(FunctionContext* context, const StringVal& str) {
-  if (str.is_null) return StringVal::null();
-  // Find new starting position.
-  int32_t begin = 0;
-  while (begin < str.len && str.ptr[begin] == ' ') {
-    ++begin;
-  }
-  return StringVal(str.ptr + begin, str.len - begin);
+  return DoTrimString<LEADING, true>(context, str, StringVal(" "));
 }
 
 StringVal StringFunctions::Rtrim(FunctionContext* context, const StringVal& str) {
-  if (str.is_null) return StringVal::null();
-  if (str.len == 0) return str;
-  // Find new ending position.
-  int32_t end = str.len - 1;
-  while (end > 0 && str.ptr[end] == ' ') {
-    --end;
-  }
-  DCHECK_GE(end, 0);
-  return StringVal(str.ptr, (str.ptr[end] == ' ') ? end : end + 1);
+  return DoTrimString<TRAILING, true>(context, str, StringVal(" "));
+}
+
+StringVal StringFunctions::LTrimString(FunctionContext* ctx,
+    const StringVal& str, const StringVal& chars_to_trim) {
+  return DoTrimString<LEADING, false>(ctx, str, chars_to_trim);
+}
+
+StringVal StringFunctions::RTrimString(FunctionContext* ctx,
+    const StringVal& str, const StringVal& chars_to_trim) {
+  return DoTrimString<TRAILING, false>(ctx, str, chars_to_trim);
+}
+
+StringVal StringFunctions::BTrimString(FunctionContext* ctx,
+    const StringVal& str, const StringVal& chars_to_trim) {
+  return DoTrimString<BOTH, false>(ctx, str, chars_to_trim);
 }
 
 IntVal StringFunctions::Ascii(FunctionContext* context, const StringVal& str) {
@@ -923,58 +978,6 @@ StringVal StringFunctions::Chr(FunctionContext* ctx, const IntVal& val) {
   return AnyValUtil::FromBuffer(ctx, &c, 1);
 }
 
-void StringFunctions::BTrimPrepare(
-    FunctionContext* context, FunctionContext::FunctionStateScope scope) {
-  if (scope != FunctionContext::THREAD_LOCAL) return;
-  // Create a bitset to hold the unique characters to trim.
-  bitset<256>* unique_chars = new bitset<256>();
-  context->SetFunctionState(scope, unique_chars);
-  if (!context->IsArgConstant(1)) return;
-  DCHECK_EQ(context->GetArgType(1)->type, FunctionContext::TYPE_STRING);
-  StringVal* chars_to_trim = reinterpret_cast<StringVal*>(context->GetConstantArg(1));
-  for (int32_t i = 0; i < chars_to_trim->len; ++i) {
-    unique_chars->set(static_cast<int>(chars_to_trim->ptr[i]), true);
-  }
-}
-
-void StringFunctions::BTrimClose(
-    FunctionContext* context, FunctionContext::FunctionStateScope scope) {
-  if (scope != FunctionContext::THREAD_LOCAL) return;
-  bitset<256>* unique_chars = reinterpret_cast<bitset<256>*>(
-      context->GetFunctionState(scope));
-  delete unique_chars;
-  context->SetFunctionState(scope, nullptr);
-}
-
-StringVal StringFunctions::BTrimString(FunctionContext* ctx,
-    const StringVal& str, const StringVal& chars_to_trim) {
-  if (str.is_null) return StringVal::null();
-  bitset<256>* unique_chars = reinterpret_cast<bitset<256>*>(
-      ctx->GetFunctionState(FunctionContext::THREAD_LOCAL));
-  // When 'chars_to_trim' is unique for each element (e.g. when 'chars_to_trim'
-  // is each element of a table column), we need to prepare a bitset of unique
-  // characters here instead of using the bitset from function context.
-  if (!ctx->IsArgConstant(1)) {
-    unique_chars->reset();
-    DCHECK(chars_to_trim.len != 0 || chars_to_trim.is_null);
-    for (int32_t i = 0; i < chars_to_trim.len; ++i) {
-      unique_chars->set(static_cast<int>(chars_to_trim.ptr[i]), true);
-    }
-  }
-  // Find new starting position.
-  int32_t begin = 0;
-  while (begin < str.len &&
-      unique_chars->test(static_cast<int>(str.ptr[begin]))) {
-    ++begin;
-  }
-  // Find new ending position.
-  int32_t end = str.len - 1;
-  while (end > begin && unique_chars->test(static_cast<int>(str.ptr[end]))) {
-    --end;
-  }
-  return StringVal(str.ptr + begin, end - begin + 1);
-}
-
 // Similar to strstr() except that the strings are not null-terminated
 static char* LocateSubstring(char* haystack, int hay_len, const char* needle, int needle_len) {
   DCHECK_GT(needle_len, 0);

http://git-wip-us.apache.org/repos/asf/impala/blob/cf2e4828/be/src/exprs/string-functions.h
----------------------------------------------------------------------
diff --git a/be/src/exprs/string-functions.h b/be/src/exprs/string-functions.h
index 86fc547..91ad2cc 100644
--- a/be/src/exprs/string-functions.h
+++ b/be/src/exprs/string-functions.h
@@ -20,6 +20,7 @@
 #define IMPALA_EXPRS_STRING_FUNCTIONS_H
 
 #include <re2/re2.h>
+#include <bitset>
 
 #include "runtime/string-value.h"
 #include "runtime/string-search.h"
@@ -47,6 +48,12 @@ class TupleRow;
 
 class StringFunctions {
  public:
+  // String trimming position or direction
+  enum TrimPosition {
+    LEADING, // Trim from the begining, or leading end
+    TRAILING, // Trim from the right, or trailing end
+    BOTH // Trim from both ends of string
+  };
   static StringVal Substring(FunctionContext*, const StringVal& str, const BigIntVal& pos,
       const BigIntVal& len);
   static StringVal Substring(FunctionContext*, const StringVal& str,
@@ -76,6 +83,25 @@ class StringFunctions {
   static StringVal Trim(FunctionContext*, const StringVal& str);
   static StringVal Ltrim(FunctionContext*, const StringVal& str);
   static StringVal Rtrim(FunctionContext*, const StringVal& str);
+
+  /// Sets up arguments and function context for the *TrimString functions below.
+  static void TrimPrepare(FunctionContext*, FunctionContext::FunctionStateScope);
+  /// Cleans up the work done by TrimPrepare above.
+  static void TrimClose(FunctionContext*, FunctionContext::FunctionStateScope);
+
+  /// Trims occurrences of the characters in 'chars_to_trim' string from
+  /// the beginning of string 'str'.
+  static StringVal LTrimString(FunctionContext* ctx, const StringVal& str,
+      const StringVal& chars_to_trim);
+  /// Trims occurrences of the characters in 'chars_to_trim' string from
+  /// the end of string 'str'.
+  static StringVal RTrimString(FunctionContext* ctx, const StringVal& str,
+      const StringVal& chars_to_trim);
+  /// Trims occurrences of the characters in 'chars_to_trim' string from
+  /// both ends of string 'str'.
+  static StringVal BTrimString(FunctionContext* ctx, const StringVal& str,
+      const StringVal& chars_to_trim);
+
   static IntVal Ascii(FunctionContext*, const StringVal& str);
   static IntVal Instr(FunctionContext*, const StringVal& str, const StringVal& substr,
       const BigIntVal& start_position, const BigIntVal& occurrence);
@@ -118,16 +144,18 @@ class StringFunctions {
   /// Converts ASCII 'val' to corresponding character.
   static StringVal Chr(FunctionContext* context, const IntVal& val);
 
-  static void BTrimPrepare(FunctionContext*, FunctionContext::FunctionStateScope);
-  static void BTrimClose(FunctionContext*, FunctionContext::FunctionStateScope);
-
-  /// Trims occurrences of the characters in 'chars_to_trim' string from
-  /// both ends of string 'str'.
-  static StringVal BTrimString(FunctionContext* ctx, const StringVal& str,
-    const StringVal& chars_to_trim);
-
   static StringVal Base64Encode(FunctionContext* ctx, const StringVal& str);
   static StringVal Base64Decode(FunctionContext* ctx, const StringVal& str);
+
+ private:
+  /// Templatized implementation of the actual string trimming function.
+  /// The first parameter, 'D', is one of StringFunctions::TrimPosition values.
+  /// The second parameter, 'IS_IMPLICIT_WHITESPACE', is true when the set of characters
+  /// to trim is implicitly set to ' ', as a result of calling the one-arg
+  /// forms of trim/ltrim/rtrim.
+  template <TrimPosition D, bool IS_IMPLICIT_WHITESPACE>
+  static StringVal DoTrimString(FunctionContext* ctx, const StringVal& str,
+      const StringVal& chars_to_trim);
 };
 }
 #endif

http://git-wip-us.apache.org/repos/asf/impala/blob/cf2e4828/common/function-registry/impala_functions.py
----------------------------------------------------------------------
diff --git a/common/function-registry/impala_functions.py b/common/function-registry/impala_functions.py
index aa9bb49..b78062b 100644
--- a/common/function-registry/impala_functions.py
+++ b/common/function-registry/impala_functions.py
@@ -439,9 +439,21 @@ visible_functions = [
   [['reverse'], 'STRING', ['STRING'], 'impala::StringFunctions::Reverse'],
   [['translate'], 'STRING', ['STRING', 'STRING', 'STRING'],
    'impala::StringFunctions::Translate'],
-  [['trim'], 'STRING', ['STRING'], 'impala::StringFunctions::Trim'],
-  [['ltrim'], 'STRING', ['STRING'], 'impala::StringFunctions::Ltrim'],
-  [['rtrim'], 'STRING', ['STRING'], 'impala::StringFunctions::Rtrim'],
+  [['trim'], 'STRING', ['STRING'], 'impala::StringFunctions::Trim',
+   '_ZN6impala15StringFunctions11TrimPrepareEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE',
+   '_ZN6impala15StringFunctions9TrimCloseEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE'],
+  [['ltrim'], 'STRING', ['STRING'], 'impala::StringFunctions::Ltrim',
+   '_ZN6impala15StringFunctions11TrimPrepareEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE',
+   '_ZN6impala15StringFunctions9TrimCloseEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE'],
+  [['rtrim'], 'STRING', ['STRING'], 'impala::StringFunctions::Rtrim',
+   '_ZN6impala15StringFunctions11TrimPrepareEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE',
+   '_ZN6impala15StringFunctions9TrimCloseEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE'],
+  [['ltrim'], 'STRING', ['STRING', 'STRING'], 'impala::StringFunctions::LTrimString',
+   '_ZN6impala15StringFunctions11TrimPrepareEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE',
+   '_ZN6impala15StringFunctions9TrimCloseEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE'],
+  [['rtrim'], 'STRING', ['STRING', 'STRING'], 'impala::StringFunctions::RTrimString',
+   '_ZN6impala15StringFunctions11TrimPrepareEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE',
+   '_ZN6impala15StringFunctions9TrimCloseEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE'],
   [['ascii'], 'INT', ['STRING'], 'impala::StringFunctions::Ascii'],
   [['instr'], 'INT', ['STRING', 'STRING'], 'impala::StringFunctions::Instr'],
   [['instr'], 'INT', ['STRING', 'STRING', 'BIGINT'], 'impala::StringFunctions::Instr'],
@@ -486,8 +498,12 @@ visible_functions = [
    '_ZN6impala15StringFunctions13ParseUrlCloseEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE'],
 # Netezza compatibility char functions
   [['chr'], 'STRING', ['INT'], 'impala::StringFunctions::Chr'],
-  [['btrim'], 'STRING', ['STRING'], 'impala::StringFunctions::Trim'],
-  [['btrim'], 'STRING', ['STRING', 'STRING'], 'impala::StringFunctions::BTrimString', '_ZN6impala15StringFunctions12BTrimPrepareEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE', '_ZN6impala15StringFunctions10BTrimCloseEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE'],
+  [['btrim'], 'STRING', ['STRING'], 'impala::StringFunctions::Trim',
+   '_ZN6impala15StringFunctions11TrimPrepareEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE',
+   '_ZN6impala15StringFunctions9TrimCloseEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE'],
+  [['btrim'], 'STRING', ['STRING', 'STRING'], 'impala::StringFunctions::BTrimString',
+   '_ZN6impala15StringFunctions11TrimPrepareEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE',
+   '_ZN6impala15StringFunctions9TrimCloseEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE'],
 
   # Conditional Functions
   # Some of these have empty symbols because the BE special-cases them based on the


[06/11] impala git commit: IMPALA-6399: Fix timeout logic in test_query_profile_thrift_timestamps

Posted by ta...@apache.org.
IMPALA-6399: Fix timeout logic in test_query_profile_thrift_timestamps

In the previous fix for IMPALA-6399 we increased the number of retries.
However, there was a bug in the retry logic: If the profile was missing,
we would call 'continue' without sleeping, thus eating through the
maximum number of retries in a short period of time.

This change now switches to a time-based wait-loop instead of a set
number of retries. It also moves the sleep() to the beginning of the
loop to make it less likely to forget  it when changing the code in the
future. Calling sleep() before trying to fetch the profile for the first
time also reduces the likelihood of hitting a warning when the profile
is not yet available.

This change also increases the timeout to 300 seconds and marks the test
for serial execution to decrease the likelihood of races.

Change-Id: I2f0f3db75c52b8b2081dd15ed7e7d2a1d6b22389
Reviewed-on: http://gerrit.cloudera.org:8080/9079
Reviewed-by: Lars Volker <lv...@cloudera.com>
Tested-by: Lars Volker <lv...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/07ab8772
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/07ab8772
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/07ab8772

Branch: refs/heads/2.x
Commit: 07ab8772e5f1be7232f0e94bc197009469195349
Parents: e32b230
Author: Lars Volker <lv...@cloudera.com>
Authored: Fri Jan 19 10:57:43 2018 -0800
Committer: Philip Zeyliger <ph...@cloudera.com>
Committed: Wed Jan 24 10:17:56 2018 -0800

----------------------------------------------------------------------
 tests/query_test/test_observability.py | 34 +++++++++++++++++------------
 1 file changed, 20 insertions(+), 14 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/07ab8772/tests/query_test/test_observability.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_observability.py b/tests/query_test/test_observability.py
index d0d4ecd..e8599b5 100644
--- a/tests/query_test/test_observability.py
+++ b/tests/query_test/test_observability.py
@@ -19,6 +19,7 @@ from tests.common.impala_test_suite import ImpalaTestSuite
 from tests.common.skip import SkipIfS3, SkipIfADLS, SkipIfIsilon, SkipIfLocal
 from tests.common.impala_cluster import ImpalaCluster
 import logging
+import pytest
 import time
 
 class TestObservability(ImpalaTestSuite):
@@ -132,6 +133,9 @@ class TestObservability(ImpalaTestSuite):
     assert results.runtime_profile.count("AGGREGATION_NODE") == 2
     assert results.runtime_profile.count("PLAN_ROOT_SINK") == 2
 
+  # IMPALA-6399: Run this test serially to avoid a delay over the wait time in fetching
+  # the profile.
+  @pytest.mark.execute_serially
   def test_query_profile_thrift_timestamps(self):
     """Test that the query profile start and end time date-time strings have
     nanosecond precision. Nanosecond precision is expected by management API clients
@@ -142,17 +146,18 @@ class TestObservability(ImpalaTestSuite):
     results = self.client.fetch(query, handle)
     self.client.close()
 
-    start_time_sub_sec_str = ""
-    end_time_sub_sec_str = ""
-    start_time = ""
-    end_time = ""
-
-    MAX_RETRIES = 300
-    for retries in xrange(MAX_RETRIES):
+    MAX_WAIT = 300
+    start = time.time()
+    end = start + MAX_WAIT
+    while time.time() <= end:
+      # Sleep before trying to fetch the profile. This helps to prevent a warning when the
+      # profile is not yet available immediately. It also makes it less likely to
+      # introduce an error below in future changes by forgetting to sleep.
+      time.sleep(1)
       tree = self.impalad_test_service.get_thrift_profile(query_id)
-
-      if tree is None:
+      if not tree:
         continue
+
       # tree.nodes[1] corresponds to ClientRequestState::summary_profile_
       # See be/src/service/client-request-state.[h|cc].
       start_time = tree.nodes[1].info_strings["Start Time"]
@@ -161,9 +166,10 @@ class TestObservability(ImpalaTestSuite):
       start_time_sub_sec_str = start_time.split('.')[-1]
       end_time_sub_sec_str = end_time.split('.')[-1]
       if len(end_time_sub_sec_str) == 0:
-        logging.info('end_time_sub_sec_str hasn\'t shown up yet, retries=%d', retries)
-        time.sleep(1)
+        elapsed = time.time() - start
+        logging.info("end_time_sub_sec_str hasn't shown up yet, elapsed=%d", elapsed)
         continue
+
       assert len(end_time_sub_sec_str) == 9, end_time
       assert len(start_time_sub_sec_str) == 9, start_time
       return True
@@ -171,7 +177,7 @@ class TestObservability(ImpalaTestSuite):
     # If we're here, we didn't get the final thrift profile from the debug web page.
     # This could happen due to heavy system load. The test is then inconclusive.
     # Log a message and fail this run.
-    dbg_str = 'Debug thrift profile for query ' + str(query_id) + ' not available in '
-    dbg_str += str(MAX_RETRIES) + ' seconds, '
-    dbg_str += '(' + start_time + ', ' + end_time + ').'
+
+    dbg_str = "Debug thrift profile for query {0} not available in {1} seconds".format(
+      query_id, MAX_WAIT)
     assert False, dbg_str


[04/11] impala git commit: IMPALA-3942: Fix wrongly escaped string literal in front-end

Posted by ta...@apache.org.
IMPALA-3942: Fix wrongly escaped string literal in front-end

String literal can be wrapped by either single or double quotes.
There are some holes in escaping the string literal:
- toSql() always returns strings that are single quoted, resulting in
improper escaping in the output if the original string was actually
double quoted.
- It is not always possible to determine if a string "should" be single
or double quoted. e.g. concat('a', "b")

The solution is to normalize any string which comes from user's given
string or a generated string (e.g. constant fold by the rewritter rule)
to single quotes.

Testing:
Added unit tests into expr-test, ToSqlTest

Change-Id: Ibc4b5f5d8ffaa8feb96a466959427a04b3b06fec
Reviewed-on: http://gerrit.cloudera.org:8080/8818
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/2ad80c5e
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/2ad80c5e
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/2ad80c5e

Branch: refs/heads/2.x
Commit: 2ad80c5eb192ed0acac77a2b065387de7bd5029f
Parents: 07ab877
Author: Jinchul <ji...@gmail.com>
Authored: Tue Dec 12 16:34:39 2017 +0900
Committer: Philip Zeyliger <ph...@cloudera.com>
Committed: Wed Jan 24 10:17:56 2018 -0800

----------------------------------------------------------------------
 be/src/exprs/expr-test.cc                       | 36 ++++++++++++++++
 .../apache/impala/analysis/StringLiteral.java   | 45 ++++++++++++++++++--
 .../org/apache/impala/analysis/ToSqlTest.java   | 16 +++++++
 3 files changed, 93 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/2ad80c5e/be/src/exprs/expr-test.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/expr-test.cc b/be/src/exprs/expr-test.cc
index 7a2a65a..2df1433 100644
--- a/be/src/exprs/expr-test.cc
+++ b/be/src/exprs/expr-test.cc
@@ -1339,6 +1339,42 @@ TEST_F(ExprTest, LiteralExprs) {
   TestIsNull("null", TYPE_NULL);
 }
 
+// IMPALA-3942: Test escaping string literal for single/double quotes
+TEST_F(ExprTest, EscapeStringLiteral) {
+  TestStringValue(R"('"')", R"(")");
+  TestStringValue(R"("'")", R"(')");
+  TestStringValue(R"("\"")", R"(")");
+  TestStringValue(R"('\'')", R"(')");
+  TestStringValue(R"('\\"')", R"(\")");
+  TestStringValue(R"("\\'")", R"(\')");
+  TestStringValue(R"("\\\"")", R"(\")");
+  TestStringValue(R"('\\\'')", R"(\')");
+  TestStringValue(R"("\\\"\\'\\\"")", R"(\"\'\")");
+  TestStringValue(R"('\\\'\\"\\\'')", R"(\'\"\')");
+  TestStringValue(R"("\\")", R"(\)");
+  TestStringValue(R"('\\')", R"(\)");
+  TestStringValue(R"("\\\\")", R"(\\)");
+  TestStringValue(R"('\\\\')", R"(\\)");
+  TestStringValue(R"('a"b')", R"(a"b)");
+  TestStringValue(R"("a'b")", R"(a'b)");
+  TestStringValue(R"('a\"b')", R"(a"b)");
+  TestStringValue(R"('a\'b')", R"(a'b)");
+  TestStringValue(R"("a\"b")", R"(a"b)");
+  TestStringValue(R"("a\'b")", R"(a'b)");
+  TestStringValue(R"('a\\"b')", R"(a\"b)");
+  TestStringValue(R"("a\\'b")", R"(a\'b)");
+  TestStringValue(R"('a\\\'b')", R"(a\'b)");
+  TestStringValue(R"("a\\\"b")", R"(a\"b)");
+  TestStringValue(R"(concat("a'b", "c'd"))", R"(a'bc'd)");
+  TestStringValue(R"(concat('a"b', 'c"d'))", R"(a"bc"d)");
+  TestStringValue(R"(concat("a'b", 'c"d'))", R"(a'bc"d)");
+  TestStringValue(R"(concat('a"b', "c'd"))", R"(a"bc'd)");
+  TestStringValue(R"(concat("a\"b", 'c\'d'))", R"(a"bc'd)");
+  TestStringValue(R"(concat('a\'b', "c\"d"))", R"(a'bc"d)");
+  TestStringValue(R"(concat("a\\\"b", 'c\\\'d'))", R"(a\"bc\'d)");
+  TestStringValue(R"(concat('a\\\'b', "c\\\"d"))", R"(a\'bc\"d)");
+}
+
 TEST_F(ExprTest, ArithmeticExprs) {
   // Test float ops.
   TestFixedResultTypeOps<float, float, double>(min_float_values_[TYPE_FLOAT],

http://git-wip-us.apache.org/repos/asf/impala/blob/2ad80c5e/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java b/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
index 72a5ca2..5c51c45 100644
--- a/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
+++ b/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
@@ -70,7 +70,7 @@ public class StringLiteral extends LiteralExpr {
   public int hashCode() { return value_.hashCode(); }
 
   @Override
-  public String toSqlImpl() { return "'" + value_ + "'"; }
+  public String toSqlImpl() { return "'" + getNormalizedValue() + "'"; }
 
   @Override
   protected void toThrift(TExprNode msg) {
@@ -84,7 +84,44 @@ public class StringLiteral extends LiteralExpr {
   public String getUnescapedValue() {
     // Unescape string exactly like Hive does. Hive's method assumes
     // quotes so we add them here to reuse Hive's code.
-    return BaseSemanticAnalyzer.unescapeSQLString("'" + value_ + "'");
+    return BaseSemanticAnalyzer.unescapeSQLString("'" + getNormalizedValue()
+        + "'");
+  }
+
+  /**
+   *  String literals can come directly from the SQL of a query or from rewrites like
+   *  constant folding. So this value normalization to a single-quoted string is necessary
+   *  because we do not know whether single or double quotes are appropriate.
+   *
+   *  @return a normalized representation of the string value suitable for embedding in
+   *          SQL as a single-quoted string literal.
+   */
+  private String getNormalizedValue() {
+    final int len = value_.length();
+    final StringBuilder sb = new StringBuilder(len);
+    for (int i = 0; i < len; ++i) {
+      final char currentChar = value_.charAt(i);
+      if (currentChar == '\\' && (i + 1) < len) {
+        final char nextChar = value_.charAt(i + 1);
+        // unescape an escaped double quote: remove back-slash in front of the quote.
+        if (nextChar == '"' || nextChar == '\'' || nextChar == '\\') {
+          if (nextChar != '"') {
+            sb.append(currentChar);
+          }
+          sb.append(nextChar);
+          ++i;
+          continue;
+        }
+
+        sb.append(currentChar);
+      } else if (currentChar == '\'') {
+        // escape a single quote: add back-slash in front of the quote.
+        sb.append("\\\'");
+      } else {
+        sb.append(currentChar);
+      }
+    }
+    return sb.toString();
   }
 
   @Override
@@ -159,8 +196,8 @@ public class StringLiteral extends LiteralExpr {
       return new NumericLiteral(val);
     }
     // Symbol is not an integer or floating point literal.
-    throw new AnalysisException(
-        "Failed to convert string literal '" + value_ + "' to number.");
+    throw new AnalysisException("Failed to convert string literal '"
+        + value_ + "' to number.");
   }
 
   @Override

http://git-wip-us.apache.org/repos/asf/impala/blob/2ad80c5e/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
----------------------------------------------------------------------
diff --git a/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java b/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
index bb6a39b..2b9da03 100644
--- a/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
@@ -462,6 +462,22 @@ public class ToSqlTest extends FrontendTestBase {
         "SELECT `1 + 10`, `trim('abc')` FROM (SELECT 1 + 10, trim('abc')) t");
   }
 
+  @Test
+  public void normalizeStringLiteralTest() {
+    testToSql("select \"'\"", "SELECT '\\''");
+    testToSql("select \"\\'\"", "SELECT '\\''");
+    testToSql("select \"\\\\'\"", "SELECT '\\\\\\''");
+    testToSql("select '\"'", "SELECT '\"'");
+    testToSql("select '\\\"'", "SELECT '\"'");
+    testToSql("select '\\''", "SELECT '\\''");
+    testToSql("select '\\\\\\''", "SELECT '\\\\\\''");
+    testToSql("select regexp_replace(string_col, \"\\\\'\", \"'\") from " +
+        "functional.alltypes", "SELECT regexp_replace(string_col, '\\\\\\'', '\\'') " +
+        "FROM functional.alltypes");
+    testToSql("select * from functional.alltypes where '123' = \"123\"",
+        "SELECT * FROM functional.alltypes WHERE '123' = '123'");
+  }
+
   // Test the toSql() output of the where clause.
   @Test
   public void whereTest() {


[10/11] impala git commit: IMPALA-6435: Disable codegen for CHAR literals.

Posted by ta...@apache.org.
IMPALA-6435: Disable codegen for CHAR literals.

Currently we do not codegen CHAR types. This change checks
for CHAR literals in a expr and disables codegen.

Change-Id: I7e4e27350c53bc69ce412a004e392e7480214f73
Reviewed-on: http://gerrit.cloudera.org:8080/9102
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/f4b6818d
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/f4b6818d
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/f4b6818d

Branch: refs/heads/2.x
Commit: f4b6818df7e06c609e072f5d2b1afa329b1e1bd0
Parents: 14e170a
Author: aphadke <ap...@cloudera.com>
Authored: Tue Jan 23 09:58:44 2018 -0800
Committer: Philip Zeyliger <ph...@cloudera.com>
Committed: Wed Jan 24 10:17:57 2018 -0800

----------------------------------------------------------------------
 be/src/exprs/literal.cc                             |  5 ++++-
 .../queries/QueryTest/disable-codegen.test          | 16 ++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/f4b6818d/be/src/exprs/literal.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/literal.cc b/be/src/exprs/literal.cc
index 92bc0c1..3b7caf8 100644
--- a/be/src/exprs/literal.cc
+++ b/be/src/exprs/literal.cc
@@ -356,6 +356,10 @@ Status Literal::GetCodegendComputeFn(LlvmCodeGen* codegen, llvm::Function** fn)
     return Status::OK();
   }
 
+  if (type_.type == TYPE_CHAR) {
+    return Status::Expected("Codegen not supported for CHAR");
+  }
+
   DCHECK_EQ(GetNumChildren(), 0);
   llvm::Value* args[2];
   *fn = CreateIrFunctionPrototype("Literal", codegen, &args);
@@ -388,7 +392,6 @@ Status Literal::GetCodegendComputeFn(LlvmCodeGen* codegen, llvm::Function** fn)
       break;
     case TYPE_STRING:
     case TYPE_VARCHAR:
-    case TYPE_CHAR:
       v.SetLen(builder.getInt32(value_.string_val.len));
       v.SetPtr(codegen->GetStringConstant(
           &builder, value_.string_val.ptr, value_.string_val.len));

http://git-wip-us.apache.org/repos/asf/impala/blob/f4b6818d/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test b/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test
index dee0126..9f609d4 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test
@@ -28,3 +28,19 @@ bigint
 # Verify that codegen was disabled
 row_regex: .*Codegen Disabled: disabled due to optimization hints.*
 ====
+---- QUERY
+# IMPALA-6435: We do not codegen char columns. This fix checks for a
+# CHAR type literal in the expr and disables codegen. This query will crash
+# impala without the fix.
+select count(*) from (
+  select cast('a' as char(4)) as s from functional.alltypestiny
+  union all
+  select cast('a' as char(4)) as s from functional.alltypestiny
+  union all
+  select cast(NULL as char(4)) as s from functional.alltypestiny
+) t
+---- RESULTS
+24
+---- TYPES
+bigint
+====


[03/11] impala git commit: IMPALA-6318: Revert "Adjustment for hanging query cancellation test"

Posted by ta...@apache.org.
IMPALA-6318: Revert "Adjustment for hanging query cancellation test"

Jenkins jobs occasionally hang on test_query_cancellation_during_fetch.
There was a workaround proposal submitted under this Jira ID, however,
apparently jobs still hang on this test randomly. Reverting the
workaround and skipping the test until further fix proposal provided.

This reverts commit 7810d1f9a2c7d59b4b916d4d1793672cd8c33143.

Change-Id: I51acee49b5a17c4852410b7568fd1d092b114a6d
Reviewed-on: http://gerrit.cloudera.org:8080/8972
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/2a27b4f4
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/2a27b4f4
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/2a27b4f4

Branch: refs/heads/2.x
Commit: 2a27b4f4788a6cacdf3b69aa60cf179dcfd2d682
Parents: 1bb3547
Author: Gabor Kaszab <ga...@cloudera.com>
Authored: Mon Jan 22 14:21:29 2018 +0100
Committer: Philip Zeyliger <ph...@cloudera.com>
Committed: Wed Jan 24 10:17:56 2018 -0800

----------------------------------------------------------------------
 tests/shell/test_shell_commandline.py |  4 +++-
 tests/shell/util.py                   | 20 ++++++++------------
 2 files changed, 11 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/2a27b4f4/tests/shell/test_shell_commandline.py
----------------------------------------------------------------------
diff --git a/tests/shell/test_shell_commandline.py b/tests/shell/test_shell_commandline.py
index e4caa99..513abdc 100644
--- a/tests/shell/test_shell_commandline.py
+++ b/tests/shell/test_shell_commandline.py
@@ -329,6 +329,8 @@ class TestImpalaShell(ImpalaTestSuite):
   def test_query_cancellation_during_fetch(self):
     """IMPALA-1144: Test cancellation (CTRL+C) while results are being
     fetched"""
+    pytest.skip("""Skipping as it occasionally gets stuck in Jenkins builds
+                resulting the build to timeout.""")
     # A select query where fetch takes several seconds
     stmt = "with v as (values (1 as x), (2), (3), (4)) " + \
         "select * from v, v v2, v v3, v v4, v v5, v v6, v v7, v v8, " + \
@@ -373,7 +375,7 @@ class TestImpalaShell(ImpalaTestSuite):
     execution in fact starts and then cancels it. Expects the query
     cancellation to succeed."""
     args = "-q \"" + stmt + ";\""
-    p = ImpalaShell(args, omit_stdout=True)
+    p = ImpalaShell(args)
 
     self.wait_for_query_state(stmt, cancel_at_state)
 

http://git-wip-us.apache.org/repos/asf/impala/blob/2a27b4f4/tests/shell/util.py
----------------------------------------------------------------------
diff --git a/tests/shell/util.py b/tests/shell/util.py
index 1fd7601..22ffa1a 100755
--- a/tests/shell/util.py
+++ b/tests/shell/util.py
@@ -130,14 +130,11 @@ class ImpalaShellResult(object):
     self.stderr = str()
 
 class ImpalaShell(object):
-  """A single instance of the Impala shell. The process is started when this object is
+  """A single instance of the Impala shell. The proces is started when this object is
      constructed, and then users should repeatedly call send_cmd(), followed eventually by
      get_result() to retrieve the process output."""
-  def __init__(self, args=None, env=None, omit_stdout=False):
-    self.args = args
-    self.env = env
-    self.omit_stdout = omit_stdout
-    self.shell_process = self._start_new_shell_process()
+  def __init__(self, args=None, env=None):
+    self.shell_process = self._start_new_shell_process(args, env=env)
 
   def pid(self):
     return self.shell_process.pid
@@ -161,12 +158,11 @@ class ImpalaShell(object):
     result.rc = self.shell_process.returncode
     return result
 
-  def _start_new_shell_process(self):
+  def _start_new_shell_process(self, args=None, env=None):
     """Starts a shell process and returns the process handle"""
     shell_args = SHELL_CMD
-    if self.args is not None: shell_args = "%s %s" % (SHELL_CMD, self.args)
+    if args is not None: shell_args = "%s %s" % (SHELL_CMD, args)
     lex = shlex.split(shell_args)
-    if not self.env: self.env = os.environ
-    stdout = open(os.devnull, 'w') if self.omit_stdout else PIPE
-    return Popen(lex, shell=False, stdout=stdout, stdin=PIPE, stderr=PIPE,
-                 env=self.env)
+    if not env: env = os.environ
+    return Popen(lex, shell=False, stdout=PIPE, stdin=PIPE, stderr=PIPE,
+                 env=env)


[02/11] impala git commit: IMPALA-4132: Use -fno-omit-frame-pointer

Posted by ta...@apache.org.
IMPALA-4132: Use -fno-omit-frame-pointer

Using -fno-omit-frame-pointer would keep the frame pointers for
functions in the register. As a result we expect more useful stack
traces to be produced.

For testing performance benchmark was executed on TPC-H and TPC-DS
without any significant discrepancy from the baseline results.
(For the specific measures check the attachments in the Jira.)

Change-Id: Ib7d0d88ba015a847356ed0274fe91017b98cb941
Reviewed-on: http://gerrit.cloudera.org:8080/8612
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/7103eac7
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/7103eac7
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/7103eac7

Branch: refs/heads/2.x
Commit: 7103eac7f82f129af56ac3f263f38937d10183e9
Parents: 2ad80c5
Author: Gabor Kaszab <ga...@cloudera.com>
Authored: Tue Nov 21 02:12:51 2017 +0100
Committer: Philip Zeyliger <ph...@cloudera.com>
Committed: Wed Jan 24 10:17:56 2018 -0800

----------------------------------------------------------------------
 be/CMakeLists.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/7103eac7/be/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/be/CMakeLists.txt b/be/CMakeLists.txt
index 163567a..56c8dbd 100644
--- a/be/CMakeLists.txt
+++ b/be/CMakeLists.txt
@@ -39,8 +39,9 @@ PROJECT(ASSEMBLER)
 #  -Wno-vla: we use C99-style variable-length arrays
 #  -pthread: enable multithreaded malloc
 #  -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG: enable nanosecond precision for boost
+#  -fno-omit-frame-pointers: Keep frame pointer for functions in register
 SET(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wall -Wno-sign-compare -Wno-unknown-pragmas -pthread")
-SET(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -fno-strict-aliasing")
+SET(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -fno-strict-aliasing -fno-omit-frame-pointer")
 SET(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -std=c++14")
 SET(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wno-deprecated -Wno-vla")
 SET(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG")


[07/11] impala git commit: IMPALA-6395: Add a flag for data stream sender's buffer size

Posted by ta...@apache.org.
IMPALA-6395: Add a flag for data stream sender's buffer size

This change adds a flag to control the maximum size in bytes
a row batch in a data stream sender's channel can accumulate
before the row batch is sent over the wire. Increasing this
value will better amortize the cost of compression and RPC
per row batch. The default value is 16KB per channel.

Change-Id: I385f4b7a0671bb2d7872bee60d476c375680b5c2
Reviewed-on: http://gerrit.cloudera.org:8080/9026
Reviewed-by: Michael Ho <kw...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/42ce1f95
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/42ce1f95
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/42ce1f95

Branch: refs/heads/2.x
Commit: 42ce1f959cb1bf53ae1b76bf2779628892038679
Parents: f4b6818
Author: Michael Ho <kw...@cloudera.com>
Authored: Fri Jan 12 23:19:35 2018 -0800
Committer: Philip Zeyliger <ph...@cloudera.com>
Committed: Wed Jan 24 10:17:57 2018 -0800

----------------------------------------------------------------------
 be/src/exec/data-sink.cc | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/42ce1f95/be/src/exec/data-sink.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/data-sink.cc b/be/src/exec/data-sink.cc
index d0e7f4a..f8f068e 100644
--- a/be/src/exec/data-sink.cc
+++ b/be/src/exec/data-sink.cc
@@ -39,6 +39,9 @@
 #include "common/names.h"
 
 DECLARE_bool(use_krpc);
+DEFINE_int64(data_stream_sender_buffer_size, 16 * 1024,
+    "(Advanced) Max size in bytes which a row batch in a data stream sender's channel "
+    "can accumulate before the row batch is sent over the wire.");
 
 using strings::Substitute;
 
@@ -65,12 +68,13 @@ Status DataSink::Create(const TPlanFragmentCtx& fragment_ctx,
 
       if (FLAGS_use_krpc) {
         *sink = pool->Add(new KrpcDataStreamSender(fragment_instance_ctx.sender_id,
-            row_desc, thrift_sink.stream_sink, fragment_ctx.destinations, 16 * 1024,
-            state));
+            row_desc, thrift_sink.stream_sink, fragment_ctx.destinations,
+            FLAGS_data_stream_sender_buffer_size, state));
       } else {
         // TODO: figure out good buffer size based on size of output row
         *sink = pool->Add(new DataStreamSender(fragment_instance_ctx.sender_id, row_desc,
-            thrift_sink.stream_sink, fragment_ctx.destinations, 16 * 1024, state));
+            thrift_sink.stream_sink, fragment_ctx.destinations,
+            FLAGS_data_stream_sender_buffer_size, state));
       }
       break;
     case TDataSinkType::TABLE_SINK:


[11/11] impala git commit: IMPALA-6190/6246: Add instances tab and event sequence

Posted by ta...@apache.org.
IMPALA-6190/6246: Add instances tab and event sequence

This change adds tracking of the current state during the execution of a
fragment instance. The current state is then reported back to the
coordinator and exposed to users via a new tab in the query detail debug
webpage.

This change also adds an event timeline to fragment instances in the
query profile. The timeline measures the time since backend-local query
start at which particular events complete. Events are derived from the
current state of the execution of a fragment instance. For example:

    - Prepare Finished: 13.436ms (13.436ms)
    - First Batch Produced: 1s022ms (1s008ms)
    - First Batch Sent: 1s022ms (455.558us)
    - ExecInternal Finished: 2s783ms (1s760ms)

I added automated tests for both extensions and additionally verified
the change by manual inspection.

Here are the TPCH performance comparison results between this change and
the previous commit on a 16 node cluster.

+------------+-----------------------+---------+------------+------------+----------------+
| Workload   | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+------------+-----------------------+---------+------------+------------+----------------+
| TPCH(_300) | parquet / none / none | 18.47   | -0.94%     | 9.72       | -1.08%         |
+------------+-----------------------+---------+------------+------------+----------------+

+------------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| Workload   | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
+------------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| TPCH(_300) | TPCH-Q5  | parquet / none / none | 48.88  | 46.93       |   +4.15%   |   0.14%    |   3.61%        | 1           | 3     |
| TPCH(_300) | TPCH-Q13 | parquet / none / none | 21.64  | 21.15       |   +2.29%   |   2.06%    |   1.84%        | 1           | 3     |
| TPCH(_300) | TPCH-Q11 | parquet / none / none | 1.71   | 1.70        |   +1.12%   |   0.54%    |   2.51%        | 1           | 3     |
| TPCH(_300) | TPCH-Q18 | parquet / none / none | 33.15  | 32.79       |   +1.09%   |   0.13%    |   2.03%        | 1           | 3     |
| TPCH(_300) | TPCH-Q14 | parquet / none / none | 5.95   | 5.90        |   +0.82%   |   2.19%    |   0.49%        | 1           | 3     |
| TPCH(_300) | TPCH-Q1  | parquet / none / none | 13.99  | 13.90       |   +0.63%   |   0.25%    |   1.39%        | 1           | 3     |
| TPCH(_300) | TPCH-Q2  | parquet / none / none | 3.44   | 3.44        |   +0.00%   | * 20.29% * | * 20.76% *     | 1           | 3     |
| TPCH(_300) | TPCH-Q6  | parquet / none / none | 1.21   | 1.22        |   -0.01%   |   0.06%    |   0.06%        | 1           | 3     |
| TPCH(_300) | TPCH-Q20 | parquet / none / none | 3.51   | 3.51        |   -0.11%   |   7.15%    |   7.30%        | 1           | 3     |
| TPCH(_300) | TPCH-Q16 | parquet / none / none | 6.89   | 6.91        |   -0.21%   |   0.65%    |   0.55%        | 1           | 3     |
| TPCH(_300) | TPCH-Q4  | parquet / none / none | 4.78   | 4.80        |   -0.38%   |   0.06%    |   0.59%        | 1           | 3     |
| TPCH(_300) | TPCH-Q19 | parquet / none / none | 30.78  | 31.04       |   -0.83%   |   0.45%    |   1.03%        | 1           | 3     |
| TPCH(_300) | TPCH-Q22 | parquet / none / none | 6.06   | 6.12        |   -1.02%   |   1.51%    |   2.12%        | 1           | 3     |
| TPCH(_300) | TPCH-Q10 | parquet / none / none | 9.43   | 9.58        |   -1.54%   |   0.69%    |   3.30%        | 1           | 3     |
| TPCH(_300) | TPCH-Q21 | parquet / none / none | 93.41  | 95.18       |   -1.86%   |   0.08%    |   0.81%        | 1           | 3     |
| TPCH(_300) | TPCH-Q15 | parquet / none / none | 3.40   | 3.47        |   -1.99%   |   0.72%    |   1.27%        | 1           | 3     |
| TPCH(_300) | TPCH-Q7  | parquet / none / none | 44.98  | 46.24       |   -2.71%   |   1.83%    |   1.27%        | 1           | 3     |
| TPCH(_300) | TPCH-Q3  | parquet / none / none | 28.06  | 29.11       |   -3.61%   |   1.62%    |   1.23%        | 1           | 3     |
| TPCH(_300) | TPCH-Q12 | parquet / none / none | 3.15   | 3.28        |   -3.80%   |   0.96%    |   1.32%        | 1           | 3     |
| TPCH(_300) | TPCH-Q9  | parquet / none / none | 29.47  | 30.80       |   -4.30%   |   0.29%    |   0.34%        | 1           | 3     |
| TPCH(_300) | TPCH-Q17 | parquet / none / none | 4.37   | 4.62        |   -5.33%   |   0.63%    |   0.54%        | 1           | 3     |
| TPCH(_300) | TPCH-Q8  | parquet / none / none | 7.99   | 8.46        |   -5.53%   |   7.95%    |   1.11%        | 1           | 3     |
+------------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+

Here are the TPCDS performance comparison results between this change
and the previous commit on a 16 node cluster. I inspected the Q2 results
and concluded that the variability is unrelated to this change.

+--------------+-----------------------+---------+------------+------------+----------------+
| Workload     | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+--------------+-----------------------+---------+------------+------------+----------------+
| TPCDS(_1000) | parquet / none / none | 13.07   | +0.51%     | 4.27       | +1.83%         |
+--------------+-----------------------+---------+------------+------------+----------------+

+--------------+------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| Workload     | Query      | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
+--------------+------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| TPCDS(_1000) | TPCDS-Q2   | parquet / none / none | 8.36   | 4.25        | R +96.81%  | * 48.88% * |   0.42%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q8   | parquet / none / none | 1.59   | 1.35        |   +17.86%  | * 13.91% * |   4.01%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q73  | parquet / none / none | 1.81   | 1.71        |   +5.92%   |   5.53%    |   0.15%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q28  | parquet / none / none | 7.26   | 6.95        |   +4.47%   |   1.09%    |   1.11%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q46  | parquet / none / none | 2.36   | 2.30        |   +2.62%   |   1.45%    |   0.40%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q7   | parquet / none / none | 2.78   | 2.73        |   +1.98%   |   1.21%    |   2.23%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q55  | parquet / none / none | 1.05   | 1.03        |   +1.91%   |   1.16%    |   2.20%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q42  | parquet / none / none | 1.05   | 1.04        |   +1.71%   |   0.90%    |   2.63%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q19  | parquet / none / none | 1.67   | 1.65        |   +1.55%   |   1.12%    |   1.96%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q23  | parquet / none / none | 151.75 | 149.94      |   +1.20%   |   3.23%    |   1.83%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q64  | parquet / none / none | 40.25  | 39.79       |   +1.16%   |   0.43%    |   0.28%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q96  | parquet / none / none | 2.25   | 2.22        |   +1.05%   |   1.00%    |   0.11%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q53  | parquet / none / none | 1.60   | 1.58        |   +1.01%   |   1.28%    |   0.04%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q79  | parquet / none / none | 4.17   | 4.13        |   +0.94%   |   0.89%    |   0.06%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q59  | parquet / none / none | 5.74   | 5.71        |   +0.60%   |   1.22%    |   2.56%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q52  | parquet / none / none | 0.89   | 0.89        |   +0.14%   |   0.03%    |   0.63%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q88  | parquet / none / none | 7.10   | 7.12        |   -0.23%   |   0.43%    |   0.47%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q3   | parquet / none / none | 1.10   | 1.11        |   -0.40%   |   0.58%    |   0.36%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q98  | parquet / none / none | 2.30   | 2.31        |   -0.49%   |   3.58%    |   1.04%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q61  | parquet / none / none | 1.87   | 1.89        |   -1.08%   |   1.68%    |   0.14%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q27a | parquet / none / none | 2.93   | 2.96        |   -1.18%   |   1.74%    |   1.54%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q34  | parquet / none / none | 2.23   | 2.27        |   -1.73%   |   1.91%    |   1.32%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q63  | parquet / none / none | 1.56   | 1.60        |   -1.96%   |   1.91%    |   3.33%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q89  | parquet / none / none | 2.64   | 2.70        |   -2.20%   |   1.93%    |   1.88%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q47  | parquet / none / none | 30.41  | 31.17       |   -2.41%   |   1.09%    |   1.52%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q1   | parquet / none / none | 3.77   | 3.86        |   -2.46%   |   1.91%    |   0.61%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q6   | parquet / none / none | 61.67  | 63.34       |   -2.65%   |   3.77%    |   0.31%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q4   | parquet / none / none | 31.11  | 31.96       |   -2.66%   |   0.61%    |   0.77%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q43  | parquet / none / none | 4.10   | 4.22        |   -2.87%   |   1.40%    |   2.85%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q5   | parquet / none / none | 8.30   | 8.56        |   -3.13%   |   1.55%    |   0.47%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q27  | parquet / none / none | 2.28   | 2.35        |   -3.13%   |   1.17%    |   1.56%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q65  | parquet / none / none | 31.74  | 32.77       |   -3.15%   |   1.47%    |   1.11%        | 1           | 3     |
| TPCDS(_1000) | TPCDS-Q68  | parquet / none / none | 1.56   | 1.62        |   -3.58%   |   9.37%    | * 11.93% *     | 1           | 3     |
+--------------+------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+

(R) Regression: TPCDS(_1000) TPCDS-Q2 [parquet / none / none] (4.25s -> 8.36s [+96.81%])
+---------------------+------------+----------+----------+------------+------------+----------+----------+------------+--------+---------+-----------+
| Operator            | % of Query | Avg      | Base Avg | Delta(Avg) | StdDev(%)  | Max      | Base Max | Delta(Max) | #Hosts | #Rows   | Est #Rows |
+---------------------+------------+----------+----------+------------+------------+----------+----------+------------+--------+---------+-----------+
| 27:MERGING-EXCHANGE | 22.48%     | 6.97s    | 2.85s    | +144.40%   | * 58.44% * | 11.05s   | 2.86s    | +286.33%   | 1      | 2.51K   | 2.56K     |
| 26:EXCHANGE         | 7.65%      | 2.37s    | 2.43s    | -2.16%     |   1.82%    | 2.46s    | 2.50s    | -1.65%     | 14     | 365     | 2.56K     |
| 23:EXCHANGE         | 8.58%      | 2.66s    | 2.70s    | -1.46%     |   1.67%    | 2.74s    | 2.78s    | -1.47%     | 14     | 516     | 10.64K    |
| 13:AGGREGATE        | 4.21%      | 1.31s    | 1.30s    | +0.65%     |   0.06%    | 1.47s    | 1.43s    | +2.38%     | 14     | 516     | 10.64K    |
| 12:HASH JOIN        | 2.89%      | 896.20ms | 885.79ms | +1.17%     |   1.43%    | 1.06s    | 1.01s    | +4.77%     | 14     | 433.27M | 2.16B     |
| 06:SCAN HDFS        | 2.83%      | 877.34ms | 886.93ms | -1.08%     |   1.23%    | 888.16ms | 906.88ms | -2.06%     | 1      | 365     | 373       |
| 19:EXCHANGE         | 23.20%     | 7.20s    | 3.12s    | +130.58%   | * 56.73% * | 11.33s   | 3.17s    | +256.92%   | 14     | 520     | 10.64K    |
| 05:AGGREGATE        | 12.06%     | 3.74s    | 1.34s    | +178.49%   | * 64.53% * | 6.33s    | 1.53s    | +314.84%   | 14     | 520     | 10.64K    |
| 04:HASH JOIN        | 7.71%      | 2.39s    | 956.81ms | +149.90%   | * 60.36% * | 4.04s    | 1.13s    | +256.75%   | 14     | 442.29M | 2.16B     |
| 03:SCAN HDFS        | 2.83%      | 878.97ms | 894.11ms | -1.69%     |   1.34%    | 890.78ms | 910.22ms | -2.14%     | 1      | 371     | 73.05K    |
+---------------------+------------+----------+----------+------------+------------+----------+----------+------------+--------+---------+-----------+

Change-Id: I626456b6afa9101eeeeffd5cda10c4096d63d7f9
Reviewed-on: http://gerrit.cloudera.org:8080/8758
Reviewed-by: Lars Volker <lv...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/057cc51b
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/057cc51b
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/057cc51b

Branch: refs/heads/2.x
Commit: 057cc51b54fffbeb800a8f3f819fed1f5bbf32d1
Parents: cf2e482
Author: Lars Volker <lv...@cloudera.com>
Authored: Tue Nov 28 15:08:29 2017 -0800
Committer: Philip Zeyliger <ph...@cloudera.com>
Committed: Wed Jan 24 10:17:57 2018 -0800

----------------------------------------------------------------------
 be/src/common/atomic.h                      |  17 +++
 be/src/runtime/coordinator-backend-state.cc |  46 ++++++++
 be/src/runtime/coordinator-backend-state.h  |  26 ++++-
 be/src/runtime/coordinator.cc               |  29 +++--
 be/src/runtime/coordinator.h                |   4 +
 be/src/runtime/fragment-instance-state.cc   | 130 ++++++++++++++++++++---
 be/src/runtime/fragment-instance-state.h    |  54 +++++++++-
 be/src/runtime/query-state.cc               |   2 +
 be/src/runtime/query-state.h                |   7 +-
 be/src/service/impala-http-handler.cc       |  22 ++++
 be/src/service/impala-http-handler.h        |   5 +
 be/src/util/runtime-profile-counters.h      |  69 ++++++++----
 be/src/util/runtime-profile.cc              |  21 +++-
 be/src/util/stopwatch.h                     |  30 +++---
 common/thrift/ImpalaInternalService.thrift  |  21 +++-
 tests/query_test/test_observability.py      |  13 +++
 tests/webserver/test_web_pages.py           |  49 ++++++---
 www/query_detail_tabs.tmpl                  |   1 +
 www/query_finstances.tmpl                   | 129 ++++++++++++++++++++++
 19 files changed, 594 insertions(+), 81 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/common/atomic.h
----------------------------------------------------------------------
diff --git a/be/src/common/atomic.h b/be/src/common/atomic.h
index 791f3b2..3925137 100644
--- a/be/src/common/atomic.h
+++ b/be/src/common/atomic.h
@@ -132,6 +132,23 @@ class AtomicPtr {
   internal::AtomicInt<intptr_t> ptr_;
 };
 
+/// Atomic enum. Operations have the same semantics as AtomicInt.
+template<typename T>
+class AtomicEnum {
+  static_assert(std::is_enum<T>::value, "Type must be enum");
+  static_assert(sizeof(typename std::underlying_type<T>::type) <= sizeof(int32_t),
+      "Underlying enum type must fit into 4 bytes");
+
+ public:
+  /// Atomic load with "acquire" memory-ordering semantic.
+  ALWAYS_INLINE T Load() const { return static_cast<T>(enum_.Load()); }
+
+  /// Atomic store with "release" memory-ordering semantic.
+  ALWAYS_INLINE void Store(T val) { enum_.Store(val); }
+
+ private:
+  internal::AtomicInt<int32_t> enum_;
+};
 
 }
 

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/runtime/coordinator-backend-state.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/coordinator-backend-state.cc b/be/src/runtime/coordinator-backend-state.cc
index 9d89086..914a3e4 100644
--- a/be/src/runtime/coordinator-backend-state.cc
+++ b/be/src/runtime/coordinator-backend-state.cc
@@ -453,6 +453,7 @@ void Coordinator::BackendState::InstanceStats::InitCounters() {
 void Coordinator::BackendState::InstanceStats::Update(
     const TFragmentInstanceExecStatus& exec_status,
     ExecSummary* exec_summary, ProgressUpdater* scan_range_progress) {
+  last_report_time_ms_ = MonotonicMillis();
   if (exec_status.done) stopwatch_.Stop();
   profile_->Update(exec_status.profile);
   if (!profile_created_) {
@@ -496,6 +497,32 @@ void Coordinator::BackendState::InstanceStats::Update(
   int64_t delta = total - total_ranges_complete_;
   total_ranges_complete_ = total;
   scan_range_progress->Update(delta);
+
+  // extract the current execution state of this instance
+  current_state_ = exec_status.current_state;
+}
+
+void Coordinator::BackendState::InstanceStats::ToJson(Value* value, Document* document) {
+  Value instance_id_val(PrintId(exec_params_.instance_id).c_str(),
+      document->GetAllocator());
+  value->AddMember("instance_id", instance_id_val, document->GetAllocator());
+
+  // We send 'done' explicitly so we don't have to infer it by comparison with a string
+  // constant in the debug page JS code.
+  value->AddMember("done", done_, document->GetAllocator());
+
+  Value state_val(FragmentInstanceState::ExecStateToString(current_state_).c_str(),
+      document->GetAllocator());
+  value->AddMember("current_state", state_val, document->GetAllocator());
+
+  Value fragment_name_val(exec_params_.fragment().display_name.c_str(),
+      document->GetAllocator());
+  value->AddMember("fragment_name", fragment_name_val, document->GetAllocator());
+
+  value->AddMember("first_status_update_received", last_report_time_ms_ > 0,
+      document->GetAllocator());
+  value->AddMember("time_since_last_heard_from", MonotonicMillis() - last_report_time_ms_,
+      document->GetAllocator());
 }
 
 Coordinator::FragmentStats::FragmentStats(const string& avg_profile_name,
@@ -583,3 +610,22 @@ void Coordinator::BackendState::ToJson(Value* value, Document* document) {
   value->AddMember(
       "num_remaining_instances", num_remaining_instances_, document->GetAllocator());
 }
+
+void Coordinator::BackendState::InstanceStatsToJson(Value* value, Document* document) {
+  Value instance_stats(kArrayType);
+  {
+    lock_guard<mutex> l(lock_);
+    for (const auto& elem : instance_stats_map_) {
+      Value val(kObjectType);
+      elem.second->ToJson(&val, document);
+      instance_stats.PushBack(val, document->GetAllocator());
+    }
+    DCHECK_EQ(instance_stats.Size(), fragments_.size());
+  }
+  value->AddMember("instance_stats", instance_stats, document->GetAllocator());
+
+  // impalad_address is not protected by lock_. The lifetime of the backend state is
+  // protected by Coordinator::lock_.
+  Value val(TNetworkAddressToString(impalad_address()).c_str(), document->GetAllocator());
+  value->AddMember("host", val, document->GetAllocator());
+}

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/runtime/coordinator-backend-state.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/coordinator-backend-state.h b/be/src/runtime/coordinator-backend-state.h
index 73acef9..0973ca3 100644
--- a/be/src/runtime/coordinator-backend-state.h
+++ b/be/src/runtime/coordinator-backend-state.h
@@ -124,6 +124,10 @@ class Coordinator::BackendState {
   /// number of instances, peak memory consumption, host and status amongst others.
   void ToJson(rapidjson::Value* value, rapidjson::Document* doc);
 
+  /// Serializes the InstanceStats of all instances of this backend state to JSON by
+  /// adding members to 'value', including the remote host name.
+  void InstanceStatsToJson(rapidjson::Value* value, rapidjson::Document* doc);
+
  private:
   /// Execution stats for a single fragment instance.
   /// Not thread-safe.
@@ -132,9 +136,9 @@ class Coordinator::BackendState {
     InstanceStats(const FInstanceExecParams& exec_params, FragmentStats* fragment_stats,
         ObjectPool* obj_pool);
 
-    /// Update 'this' with exec_status, the fragment instances' TExecStats in
-    /// exec_summary, and 'progress_updater' with the number of
-    /// newly completed scan ranges. Also updates the instance's avg profile.
+    /// Updates 'this' with exec_status, the fragment instances' TExecStats in
+    /// exec_summary, and 'progress_updater' with the number of newly completed scan
+    /// ranges. Also updates the instance's avg profile.
     void Update(const TFragmentInstanceExecStatus& exec_status,
         ExecSummary* exec_summary, ProgressUpdater* scan_range_progress);
 
@@ -142,12 +146,20 @@ class Coordinator::BackendState {
       return exec_params_.per_fragment_instance_idx;
     }
 
+    /// Serializes instance stats to JSON by adding members to 'value', including its
+    /// instance id, plan fragment name, and the last event that was recorded during
+    /// execution of the instance.
+    void ToJson(rapidjson::Value* value, rapidjson::Document* doc);
+
    private:
     friend class BackendState;
 
     /// query lifetime
     const FInstanceExecParams& exec_params_;
 
+    /// Set in Update(). Uses MonotonicMillis().
+    int64_t last_report_time_ms_ = 0;
+
     /// owned by coordinator object pool provided in the c'tor, created in Update()
     RuntimeProfile* profile_;
 
@@ -172,9 +184,13 @@ class Coordinator::BackendState {
     std::vector<RuntimeProfile::Counter*> scan_ranges_complete_counters_;
 
     /// PER_HOST_PEAK_MEM_COUNTER
-    RuntimeProfile::Counter* peak_mem_counter_;
+    RuntimeProfile::Counter* peak_mem_counter_ = nullptr;
+
+    /// The current state of this fragment instance's execution. This gets serialized in
+    /// ToJson() and is displayed in the debug webpages.
+    TFInstanceExecState::type current_state_ = TFInstanceExecState::WAITING_FOR_EXEC;
 
-    /// Extract scan_ranges_complete_counters_ and peak_mem_counter_ from profile_.
+    /// Extracts scan_ranges_complete_counters_ and peak_mem_counter_ from profile_.
     void InitCounters();
   };
 

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/runtime/coordinator.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/coordinator.cc b/be/src/runtime/coordinator.cc
index 85ff810..7973775 100644
--- a/be/src/runtime/coordinator.cc
+++ b/be/src/runtime/coordinator.cc
@@ -459,8 +459,8 @@ Status Coordinator::GetStatus() {
   return query_status_;
 }
 
-  Status Coordinator::UpdateStatus(const Status& status, const string& backend_hostname,
-     bool is_fragment_failure, const TUniqueId& instance_id) {
+Status Coordinator::UpdateStatus(const Status& status, const string& backend_hostname,
+    bool is_fragment_failure, const TUniqueId& instance_id) {
   {
     lock_guard<mutex> l(lock_);
 
@@ -1258,13 +1258,28 @@ MemTracker* Coordinator::query_mem_tracker() const {
 }
 
 void Coordinator::BackendsToJson(Document* doc) {
-  lock_guard<mutex> l(lock_);
   Value states(kArrayType);
-  for (BackendState* state : backend_states_) {
-    Value val(kObjectType);
-    state->ToJson(&val, doc);
-    states.PushBack(val, doc->GetAllocator());
+  {
+    lock_guard<mutex> l(lock_);
+    for (BackendState* state : backend_states_) {
+      Value val(kObjectType);
+      state->ToJson(&val, doc);
+      states.PushBack(val, doc->GetAllocator());
+    }
   }
   doc->AddMember("backend_states", states, doc->GetAllocator());
 }
+
+void Coordinator::FInstanceStatsToJson(Document* doc) {
+  Value states(kArrayType);
+  {
+    lock_guard<mutex> l(lock_);
+    for (BackendState* state : backend_states_) {
+      Value val(kObjectType);
+      state->InstanceStatsToJson(&val, doc);
+      states.PushBack(val, doc->GetAllocator());
+    }
+  }
+  doc->AddMember("backend_instances", states, doc->GetAllocator());
+}
 }

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/runtime/coordinator.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/coordinator.h b/be/src/runtime/coordinator.h
index e7ddee9..d630b9a 100644
--- a/be/src/runtime/coordinator.h
+++ b/be/src/runtime/coordinator.h
@@ -183,6 +183,10 @@ class Coordinator { // NOLINT: The member variables could be re-ordered to save
   /// 'backend_states'.
   void BackendsToJson(rapidjson::Document* document);
 
+  /// Adds to 'document' a serialized array of all backend names and stats of all fragment
+  /// instances running on each backend in a member named 'backend_instances'.
+  void FInstanceStatsToJson(rapidjson::Document* document);
+
  private:
   class BackendState;
   struct FilterTarget;

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/runtime/fragment-instance-state.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/fragment-instance-state.cc b/be/src/runtime/fragment-instance-state.cc
index 5c9deb8..16b4a7e 100644
--- a/be/src/runtime/fragment-instance-state.cc
+++ b/be/src/runtime/fragment-instance-state.cc
@@ -61,7 +61,6 @@ static const string OPEN_TIMER_NAME = "OpenTime";
 static const string PREPARE_TIMER_NAME = "PrepareTime";
 static const string EXEC_TIMER_NAME = "ExecTime";
 
-
 FragmentInstanceState::FragmentInstanceState(
     QueryState* query_state, const TPlanFragmentCtx& fragment_ctx,
     const TPlanFragmentInstanceCtx& instance_ctx)
@@ -91,6 +90,7 @@ Status FragmentInstanceState::Exec() {
   }
 
 done:
+  UpdateState(StateEvent::EXEC_END);
   // call this before Close() to make sure the thread token got released
   Finalize(status);
   Close();
@@ -114,7 +114,6 @@ void FragmentInstanceState::Cancel() {
 
 Status FragmentInstanceState::Prepare() {
   DCHECK(!prepared_promise_.IsSet());
-
   VLOG(2) << "fragment_instance_ctx:\n" << ThriftDebugString(instance_ctx_);
 
   // Do not call RETURN_IF_ERROR or explicitly return before this line,
@@ -129,11 +128,17 @@ Status FragmentInstanceState::Prepare() {
   profile()->AddChild(timings_profile_);
   SCOPED_TIMER(ADD_TIMER(timings_profile_, PREPARE_TIMER_NAME));
 
+  // Events that are tracked in a separate timeline for each fragment instance, relative
+  // to the startup of the query state.
+  event_sequence_ =
+      profile()->AddEventSequence("Fragment Instance Lifecycle Event Timeline");
+  event_sequence_->Start(query_state_->fragment_events_start_time());
+  UpdateState(StateEvent::PREPARE_START);
+
   runtime_state_->InitFilterBank();
 
   // Reserve one main thread from the pool
   runtime_state_->resource_pool()->AcquireThreadToken();
-
   avg_thread_tokens_ = profile()->AddSamplingCounter("AverageThreadTokens",
       bind<int64_t>(mem_fn(&ThreadResourceMgr::ResourcePool::num_threads),
           runtime_state_->resource_pool()));
@@ -201,15 +206,6 @@ Status FragmentInstanceState::Prepare() {
     ReleaseThreadToken();
   }
 
-  if (runtime_state_->ShouldCodegen()) {
-    RETURN_IF_ERROR(runtime_state_->CreateCodegen());
-    exec_tree_->Codegen(runtime_state_);
-    // It shouldn't be fatal to fail codegen. However, until IMPALA-4233 is fixed,
-    // ScalarFnCall has no fall back to interpretation when codegen fails so propagates
-    // the error status for now.
-    RETURN_IF_ERROR(runtime_state_->CodegenScalarFns());
-  }
-
   // set up profile counters
   profile()->AddChild(exec_tree_->runtime_profile());
   rows_produced_counter_ =
@@ -247,14 +243,22 @@ Status FragmentInstanceState::Open() {
   SCOPED_TIMER(ADD_TIMER(timings_profile_, OPEN_TIMER_NAME));
   SCOPED_THREAD_COUNTER_MEASUREMENT(runtime_state_->total_thread_statistics());
 
-  // codegen prior to exec_tree_->Open()
   if (runtime_state_->ShouldCodegen()) {
+    UpdateState(StateEvent::CODEGEN_START);
+    RETURN_IF_ERROR(runtime_state_->CreateCodegen());
+    exec_tree_->Codegen(runtime_state_);
+    // It shouldn't be fatal to fail codegen. However, until IMPALA-4233 is fixed,
+    // ScalarFnCall has no fall back to interpretation when codegen fails so propagates
+    // the error status for now.
+    RETURN_IF_ERROR(runtime_state_->CodegenScalarFns());
+
     LlvmCodeGen* codegen = runtime_state_->codegen();
     DCHECK(codegen != nullptr);
     RETURN_IF_ERROR(codegen->FinalizeModule());
   }
 
   {
+    UpdateState(StateEvent::OPEN_START);
     SCOPED_TIMER(ADD_CHILD_TIMER(timings_profile_, "ExecTreeOpenTime", OPEN_TIMER_NAME));
     RETURN_IF_ERROR(exec_tree_->Open(runtime_state_));
   }
@@ -266,6 +270,7 @@ Status FragmentInstanceState::ExecInternal() {
       ADD_CHILD_TIMER(timings_profile_, "ExecTreeExecTime", EXEC_TIMER_NAME);
   SCOPED_THREAD_COUNTER_MEASUREMENT(runtime_state_->total_thread_statistics());
   bool exec_tree_complete = false;
+  UpdateState(StateEvent::WAITING_FOR_FIRST_BATCH);
   do {
     Status status;
     row_batch_->Reset();
@@ -274,11 +279,14 @@ Status FragmentInstanceState::ExecInternal() {
       RETURN_IF_ERROR(
           exec_tree_->GetNext(runtime_state_, row_batch_.get(), &exec_tree_complete));
     }
+    UpdateState(StateEvent::BATCH_PRODUCED);
     if (VLOG_ROW_IS_ON) row_batch_->VLogRows("FragmentInstanceState::ExecInternal()");
     COUNTER_ADD(rows_produced_counter_, row_batch_->num_rows());
     RETURN_IF_ERROR(sink_->Send(runtime_state_, row_batch_.get()));
+    UpdateState(StateEvent::BATCH_SENT);
   } while (!exec_tree_complete);
 
+  UpdateState(StateEvent::LAST_BATCH_SENT);
   // Flush the sink *before* stopping the report thread. Flush may need to add some
   // important information to the last report that gets sent. (e.g. table sinks record the
   // files they have written to in this method)
@@ -374,6 +382,79 @@ void FragmentInstanceState::SendReport(bool done, const Status& status) {
   query_state_->ReportExecStatus(done, status, this);
 }
 
+void FragmentInstanceState::UpdateState(const StateEvent event)
+{
+  TFInstanceExecState::type current_state = current_state_.Load();
+  TFInstanceExecState::type next_state = current_state;
+  switch (event) {
+    case StateEvent::PREPARE_START:
+      DCHECK_EQ(current_state, TFInstanceExecState::WAITING_FOR_EXEC);
+      next_state = TFInstanceExecState::WAITING_FOR_PREPARE;
+      break;
+
+    case StateEvent::CODEGEN_START:
+      DCHECK_EQ(current_state, TFInstanceExecState::WAITING_FOR_PREPARE);
+      event_sequence_->MarkEvent("Prepare Finished");
+      next_state = TFInstanceExecState::WAITING_FOR_CODEGEN;
+      break;
+
+    case StateEvent::OPEN_START:
+      if (current_state == TFInstanceExecState::WAITING_FOR_PREPARE) {
+        event_sequence_->MarkEvent("Prepare Finished");
+      } else {
+        DCHECK_EQ(current_state, TFInstanceExecState::WAITING_FOR_CODEGEN);
+      }
+      next_state = TFInstanceExecState::WAITING_FOR_OPEN;
+      break;
+
+    case StateEvent::WAITING_FOR_FIRST_BATCH:
+      DCHECK_EQ(current_state, TFInstanceExecState::WAITING_FOR_OPEN);
+      next_state = TFInstanceExecState::WAITING_FOR_FIRST_BATCH;
+      break;
+
+    case StateEvent::BATCH_PRODUCED:
+      if (UNLIKELY(current_state == TFInstanceExecState::WAITING_FOR_FIRST_BATCH)) {
+        event_sequence_->MarkEvent("First Batch Produced");
+        next_state = TFInstanceExecState::FIRST_BATCH_PRODUCED;
+      } else {
+        DCHECK_EQ(current_state, TFInstanceExecState::PRODUCING_DATA);
+      }
+      break;
+
+    case StateEvent::BATCH_SENT:
+      if (UNLIKELY(current_state == TFInstanceExecState::FIRST_BATCH_PRODUCED)) {
+        event_sequence_->MarkEvent("First Batch Sent");
+        next_state = TFInstanceExecState::PRODUCING_DATA;
+      } else {
+        DCHECK_EQ(current_state, TFInstanceExecState::PRODUCING_DATA);
+      }
+      break;
+
+    case StateEvent::LAST_BATCH_SENT:
+      if (UNLIKELY(current_state == TFInstanceExecState::WAITING_FOR_OPEN)) {
+        event_sequence_->MarkEvent("Open Finished");
+      } else {
+        DCHECK_EQ(current_state, TFInstanceExecState::PRODUCING_DATA);
+      }
+      next_state = TFInstanceExecState::LAST_BATCH_SENT;
+      break;
+
+    case StateEvent::EXEC_END:
+      // Allow abort in all states to make error handling easier.
+      event_sequence_->MarkEvent("ExecInternal Finished");
+      next_state = TFInstanceExecState::FINISHED;
+      break;
+
+    default:
+      DCHECK(false) << "Unexpected Event: " << static_cast<int>(event);
+      break;
+  }
+  // current_state_ is an AtomicEnum to add memory barriers for concurrent reads by the
+  // profile reporting thread. This method is the only one updating it and is not
+  // meant to be thread safe.
+  if (next_state != current_state) current_state_.Store(next_state);
+}
+
 void FragmentInstanceState::StopReportThread() {
   if (!report_thread_active_) return;
   {
@@ -426,6 +507,29 @@ void FragmentInstanceState::PublishFilter(const TPublishFilterParams& params) {
   runtime_state_->filter_bank()->PublishGlobalFilter(params);
 }
 
+string FragmentInstanceState::ExecStateToString(const TFInstanceExecState::type state) {
+  // Labels to send to the debug webpages to display the current state to the user.
+  static const string finstance_state_labels[] = {
+      "Waiting for Exec",         // WAITING_FOR_EXEC
+      "Waiting for Codegen",      // WAITING_FOR_CODEGEN
+      "Waiting for Prepare",      // WAITING_FOR_PREPARE
+      "Waiting for First Batch",  // WAITING_FOR_OPEN
+      "Waiting for First Batch",  // WAITING_FOR_FIRST_BATCH
+      "First batch produced",     // FIRST_BATCH_PRODUCED
+      "Producing Data",           // PRODUCING_DATA
+      "Last batch sent",          // LAST_BATCH_SENT
+      "Finished"                  // FINISHED
+  };
+  /// Make sure we have a label for every possible state.
+  static_assert(
+      sizeof(finstance_state_labels) / sizeof(char*) == TFInstanceExecState::FINISHED + 1,
+      "");
+
+  DCHECK_LT(state, sizeof(finstance_state_labels) / sizeof(char*))
+      << "Unknown instance state";
+  return finstance_state_labels[state];
+}
+
 const TQueryCtx& FragmentInstanceState::query_ctx() const {
   return query_state_->query_ctx();
 }

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/runtime/fragment-instance-state.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/fragment-instance-state.h b/be/src/runtime/fragment-instance-state.h
index 4e832f6..292b93c 100644
--- a/be/src/runtime/fragment-instance-state.h
+++ b/be/src/runtime/fragment-instance-state.h
@@ -23,6 +23,7 @@
 #include <boost/scoped_ptr.hpp>
 #include <boost/thread/mutex.hpp>
 
+#include "common/atomic.h"
 #include "common/status.h"
 #include "util/promise.h"
 
@@ -54,12 +55,14 @@ class RuntimeState;
 /// for this fragment instance and closes all data streams.
 ///
 /// The FIS makes an aggregated profile for the entire fragment available, which
-/// includes profile information for the plan itself as well as the output sink.
+/// includes profile information for the plan itself as well as the output sink. It also
+/// contains a timeline of events of the fragment instance.
 /// The FIS periodically makes a ReportExecStatus RPC to the coordinator to report the
-/// execution status and profile. The frequency of those reports is controlled by the flag
-/// status_report_interval; setting that flag to 0 disables periodic reporting altogether
-/// Regardless of the value of that flag, a report is sent at least once at the end of
-/// execution with an overall status and profile (and 'done' indicator).
+/// execution status, the current state of the execution, and the instance profile. The
+/// frequency of those reports is controlled by the flag status_report_interval; setting
+/// that flag to 0 disables periodic reporting altogether Regardless of the value of that
+/// flag, a report is sent at least once at the end of execution with an overall status
+/// and profile (and 'done' indicator).
 /// The FIS will send at least one final status report. If execution ended with an error,
 /// that error status will be part of the final report (it will not be overridden by
 /// the resulting cancellation).
@@ -105,6 +108,9 @@ class FragmentInstanceState {
   /// the Prepare phase. May be nullptr.
   PlanRootSink* root_sink() { return root_sink_; }
 
+  /// Returns a string description of 'current_state_'.
+  static string ExecStateToString(const TFInstanceExecState::type state);
+
   /// Name of the counter that is tracking per query, per host peak mem usage.
   /// TODO: this doesn't look like it belongs here
   static const std::string PER_HOST_PEAK_MEM_COUNTER;
@@ -117,6 +123,7 @@ class FragmentInstanceState {
   const TPlanFragmentInstanceCtx& instance_ctx() const { return instance_ctx_; }
   const TUniqueId& query_id() const { return query_ctx().query_id; }
   const TUniqueId& instance_id() const { return instance_ctx_.fragment_instance_id; }
+  TFInstanceExecState::type current_state() const { return current_state_.Load(); }
   const TNetworkAddress& coord_address() const { return query_ctx().coord_address; }
   ObjectPool* obj_pool();
 
@@ -155,6 +162,37 @@ class FragmentInstanceState {
   /// Lives in obj_pool().
   RuntimeProfile* timings_profile_ = nullptr;
 
+  /// Event sequence tracking the completion of various stages of this fragment instance.
+  /// Updated in UpdateState().
+  RuntimeProfile::EventSequence* event_sequence_ = nullptr;
+
+  /// Events that change the current state of this instance's execution, which is kept in
+  /// 'current_state_'. Events are issued throughout the execution by calling
+  /// UpdateState(), which implements a state machine. See the implementation of
+  /// UpdateState() for valid state transitions.
+  enum class StateEvent {
+    /// Indicates the start of execution.
+    PREPARE_START,
+    /// Indicates that codegen will get called. Omitted if not doing codegen.
+    CODEGEN_START,
+    /// Indicates the call to Open().
+    OPEN_START,
+    /// Indicates waiting for the first batch to arrive.
+    WAITING_FOR_FIRST_BATCH,
+    /// Indicates that a new batch was produced by this instance.
+    BATCH_PRODUCED,
+    /// Indicates that a batch has been sent.
+    BATCH_SENT,
+    /// Indicates that no new batches will be received.
+    LAST_BATCH_SENT,
+    /// Indicates the end of this instance's execution.
+    EXEC_END
+  };
+
+  /// The current state of this fragment instance's execution. Only updated by the
+  /// fragment instance thread in UpdateState() and read by the profile reporting threads.
+  AtomicEnum<TFInstanceExecState::type> current_state_;
+
   /// Output sink for rows sent to this fragment. Created in Prepare(), lives in
   /// obj_pool().
   DataSink* sink_ = nullptr;
@@ -232,6 +270,12 @@ class FragmentInstanceState {
   /// ReportProfileThread() thread will do periodically.
   void SendReport(bool done, const Status& status);
 
+  /// Handle the execution event 'event'. This implements a state machine and will update
+  /// the current execution state of this fragment instance. Also marks an event in
+  /// 'event_sequence_' for some states. Must not be called by multiple threads
+  /// concurrently.
+  void UpdateState(const StateEvent event);
+
   /// Called when execution is complete to finalize counters and send the final status
   /// report.  Must be called only once. Can handle partially-finished Prepare().
   void Finalize(const Status& status);

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/runtime/query-state.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/query-state.cc b/be/src/runtime/query-state.cc
index 259cd34..5a11caf 100644
--- a/be/src/runtime/query-state.cc
+++ b/be/src/runtime/query-state.cc
@@ -227,6 +227,7 @@ void QueryState::ReportExecStatusAux(bool done, const Status& status,
     instance_status.__set_fragment_instance_id(fis->instance_id());
     status.SetTStatus(&instance_status);
     instance_status.__set_done(done);
+    instance_status.__set_current_state(fis->current_state());
 
     DCHECK(fis->profile() != nullptr);
     fis->profile()->ToThrift(&instance_status.profile);
@@ -304,6 +305,7 @@ void QueryState::StartFInstances() {
   DCHECK_GT(rpc_params_.fragment_ctxs.size(), 0);
   TPlanFragmentCtx* fragment_ctx = &rpc_params_.fragment_ctxs[0];
   int fragment_ctx_idx = 0;
+  fragment_events_start_time_ = MonotonicStopWatch::Now();
   for (const TPlanFragmentInstanceCtx& instance_ctx: rpc_params_.fragment_instance_ctxs) {
     // determine corresponding TPlanFragmentCtx
     if (fragment_ctx->fragment.idx != instance_ctx.fragment_idx) {

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/runtime/query-state.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/query-state.h b/be/src/runtime/query-state.h
index f7b83a7..ae3bdd5 100644
--- a/be/src/runtime/query-state.h
+++ b/be/src/runtime/query-state.h
@@ -109,7 +109,7 @@ class QueryState {
   }
   MemTracker* query_mem_tracker() const { return query_mem_tracker_; }
 
-  // the following getters are only valid after Prepare()
+  // the following getters are only valid after Init()
   ReservationTracker* buffer_reservation() const { return buffer_reservation_; }
   InitialReservations* initial_reservations() const { return initial_reservations_; }
   TmpFileMgr::FileGroup* file_group() const { return file_group_; }
@@ -117,6 +117,7 @@ class QueryState {
 
   // the following getters are only valid after StartFInstances()
   const DescriptorTbl& desc_tbl() const { return *desc_tbl_; }
+  int64_t fragment_events_start_time() const { return fragment_events_start_time_; }
 
   /// Sets up state required for fragment execution: memory reservations, etc. Fails
   /// if resources could not be acquired. Acquires a resource refcount and returns it
@@ -243,6 +244,10 @@ class QueryState {
   /// "num-queries-spilled" metric.
   AtomicInt32 query_spilled_;
 
+  /// Records the point in time when fragment instances are started up. Set in
+  /// StartFInstances().
+  int64_t fragment_events_start_time_ = 0;
+
   /// Create QueryState w/ refcnt of 0.
   /// The query is associated with the resource pool query_ctx.request_pool or
   /// 'request_pool', if the former is not set (needed for tests).

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/service/impala-http-handler.cc
----------------------------------------------------------------------
diff --git a/be/src/service/impala-http-handler.cc b/be/src/service/impala-http-handler.cc
index 33c5e73..b633f2a 100644
--- a/be/src/service/impala-http-handler.cc
+++ b/be/src/service/impala-http-handler.cc
@@ -104,6 +104,9 @@ void ImpalaHttpHandler::RegisterHandlers(Webserver* webserver) {
   webserver->RegisterUrlCallback("/query_backends", "query_backends.tmpl",
       MakeCallback(this, &ImpalaHttpHandler::QueryBackendsHandler), false);
 
+  webserver->RegisterUrlCallback("/query_finstances", "query_finstances.tmpl",
+      MakeCallback(this, &ImpalaHttpHandler::QueryFInstancesHandler), false);
+
   webserver->RegisterUrlCallback("/cancel_query", "common-pre.tmpl",
       MakeCallback(this, &ImpalaHttpHandler::CancelQueryHandler), false);
 
@@ -708,6 +711,25 @@ void ImpalaHttpHandler::QueryBackendsHandler(
   request_state->coord()->BackendsToJson(document);
 }
 
+void ImpalaHttpHandler::QueryFInstancesHandler(
+    const Webserver::ArgumentMap& args, Document* document) {
+  TUniqueId query_id;
+  Status status = ParseIdFromArguments(args, &query_id, "query_id");
+  Value query_id_val(PrintId(query_id).c_str(), document->GetAllocator());
+  document->AddMember("query_id", query_id_val, document->GetAllocator());
+  if (!status.ok()) {
+    // Redact the error message, it may contain part or all of the query.
+    Value json_error(RedactCopy(status.GetDetail()).c_str(), document->GetAllocator());
+    document->AddMember("error", json_error, document->GetAllocator());
+    return;
+  }
+
+  shared_ptr<ClientRequestState> request_state = server_->GetClientRequestState(query_id);
+  if (request_state.get() == nullptr || request_state->coord() == nullptr) return;
+
+  request_state->coord()->FInstanceStatsToJson(document);
+}
+
 void ImpalaHttpHandler::QuerySummaryHandler(bool include_json_plan, bool include_summary,
     const Webserver::ArgumentMap& args, Document* document) {
   TUniqueId query_id;

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/service/impala-http-handler.h
----------------------------------------------------------------------
diff --git a/be/src/service/impala-http-handler.h b/be/src/service/impala-http-handler.h
index 8ad84bd..f2492ff 100644
--- a/be/src/service/impala-http-handler.h
+++ b/be/src/service/impala-http-handler.h
@@ -96,6 +96,11 @@ class ImpalaHttpHandler {
   void QueryBackendsHandler(
       const Webserver::ArgumentMap& args, rapidjson::Document* document);
 
+  /// If 'args' contains a query id, serializes all fragment instance states for all
+  /// backends for that query to 'document'.
+  void QueryFInstancesHandler(
+      const Webserver::ArgumentMap& args, rapidjson::Document* document);
+
   /// Cancels an in-flight query and writes the result to 'contents'.
   void CancelQueryHandler(const Webserver::ArgumentMap& args,
       rapidjson::Document* document);

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/util/runtime-profile-counters.h
----------------------------------------------------------------------
diff --git a/be/src/util/runtime-profile-counters.h b/be/src/util/runtime-profile-counters.h
index 9227e66..62281f1 100644
--- a/be/src/util/runtime-profile-counters.h
+++ b/be/src/util/runtime-profile-counters.h
@@ -19,10 +19,11 @@
 #ifndef IMPALA_UTIL_RUNTIME_PROFILE_COUNTERS_H
 #define IMPALA_UTIL_RUNTIME_PROFILE_COUNTERS_H
 
+#include <algorithm>
 #include <boost/scoped_ptr.hpp>
 #include <boost/unordered_map.hpp>
-#include <sys/time.h>
 #include <sys/resource.h>
+#include <sys/time.h>
 
 #include "common/atomic.h"
 #include "common/logging.h"
@@ -277,11 +278,10 @@ class RuntimeProfile::ThreadCounters {
   Counter* involuntary_context_switches_;
 };
 
-/// An EventSequence captures a sequence of events (each added by
-/// calling MarkEvent). Each event has a text label, and a time
-/// (measured relative to the moment Start() was called as t=0). It is
-/// useful for tracking the evolution of some serial process, such as
-/// the query lifecycle.
+/// An EventSequence captures a sequence of events (each added by calling MarkEvent()).
+/// Each event has a text label and a time (measured relative to the moment Start() was
+/// called as t=0, or to the parameter 'when' passed to Start(int64_t when)). It is useful
+/// for tracking the evolution of some serial process, such as the query lifecycle.
 class RuntimeProfile::EventSequence {
  public:
   EventSequence() { }
@@ -298,12 +298,20 @@ class RuntimeProfile::EventSequence {
   /// Starts the timer without resetting it.
   void Start() { sw_.Start(); }
 
+  /// Starts the timer. All events will be recorded as if the timer had been started at
+  /// 'start_time_ns', which must have been obtained by calling MonotonicStopWatch::Now().
+  void Start(int64_t start_time_ns) {
+    offset_ = MonotonicStopWatch::Now() - start_time_ns;
+    DCHECK_GE(offset_, 0);
+    sw_.Start();
+  }
+
   /// Stores an event in sequence with the given label and the current time
   /// (relative to the first time Start() was called) as the timestamp.
-  void MarkEvent(const std::string& label) {
-    Event event = make_pair(label, sw_.ElapsedTime());
+  void MarkEvent(std::string label) {
+    Event event = make_pair(move(label), sw_.ElapsedTime());
     boost::lock_guard<SpinLock> event_lock(lock_);
-    events_.push_back(event);
+    events_.emplace_back(move(event));
   }
 
   int64_t ElapsedTime() { return sw_.ElapsedTime(); }
@@ -311,34 +319,57 @@ class RuntimeProfile::EventSequence {
   /// An Event is a <label, timestamp> pair.
   typedef std::pair<std::string, int64_t> Event;
 
-  /// An EventList is a sequence of Events, in increasing timestamp order.
+  /// An EventList is a sequence of Events.
   typedef std::vector<Event> EventList;
 
-  /// Copies the member events_ into the supplied vector 'events'.
-  /// The supplied vector 'events' is cleared before this.
+  /// Returns a copy of 'events_' in the supplied vector 'events', sorted by their
+  /// timestamps. The supplied vector 'events' is cleared before this.
   void GetEvents(std::vector<Event>* events) {
     events->clear();
     boost::lock_guard<SpinLock> event_lock(lock_);
-    /// It's possible that concurrent events can be logged out of sequence.
-    /// So sort the events each time we are here.
+    /// It's possible that MarkEvent() logs concurrent events out of sequence so we sort
+    /// the events each time we are here.
+    SortEvents();
+    events->insert(events->end(), events_.begin(), events_.end());
+  }
+
+  /// Adds all events from the input parameters that are newer than the last member of
+  /// 'events_'. The caller must make sure that 'timestamps' is sorted.
+  void AddNewerEvents(
+      const std::vector<int64_t>& timestamps, const std::vector<std::string>& labels) {
+    DCHECK_EQ(timestamps.size(), labels.size());
+    DCHECK(std::is_sorted(timestamps.begin(), timestamps.end()));
+    boost::lock_guard<SpinLock> event_lock(lock_);
+    int64_t last_timestamp = events_.back().second;
+    for (int64_t i = 0; i < timestamps.size(); ++i) {
+      if (timestamps[i] <= last_timestamp) continue;
+      events_.push_back(make_pair(labels[i], timestamps[i]));
+    }
+  }
+
+  void ToThrift(TEventSequence* seq);
+
+ private:
+  /// Sorts events by their timestamp. Caller must hold lock_.
+  void SortEvents() {
     std::sort(events_.begin(), events_.end(),
         [](Event const &event1, Event const &event2) {
         return event1.second < event2.second;
       });
-    events->insert(events->end(), events_.begin(), events_.end());
   }
 
-  void ToThrift(TEventSequence* seq) const;
-
- private:
   /// Protect access to events_.
   SpinLock lock_;
 
-  /// Stored in increasing time order.
+  /// Sequence of events. Due to a race in MarkEvent() these are not necessarily ordered.
   EventList events_;
 
   /// Timer which allows events to be timestamped when they are recorded.
   MonotonicStopWatch sw_;
+
+  /// Constant offset that gets added to each event's timestamp. This allows to
+  /// synchronize events captured in multiple threads to a common starting point.
+  int64_t offset_ = 0;
 };
 
 typedef StreamingSampler<int64_t, 64> StreamingCounterSampler;

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/util/runtime-profile.cc
----------------------------------------------------------------------
diff --git a/be/src/util/runtime-profile.cc b/be/src/util/runtime-profile.cc
index a05e55c..c057cac 100644
--- a/be/src/util/runtime-profile.cc
+++ b/be/src/util/runtime-profile.cc
@@ -291,7 +291,6 @@ void RuntimeProfile::Update(const vector<TRuntimeProfileNode>& nodes, int* idx)
       if (it == time_series_counter_map_.end()) {
         time_series_counter_map_[c.name] =
             pool_->Add(new TimeSeriesCounter(c.name, c.unit, c.period_ms, c.values));
-        it = time_series_counter_map_.find(c.name);
       } else {
         it->second->samples_.SetSamples(c.period_ms, c.values);
       }
@@ -299,6 +298,20 @@ void RuntimeProfile::Update(const vector<TRuntimeProfileNode>& nodes, int* idx)
   }
 
   {
+    lock_guard<SpinLock> l(event_sequence_lock_);
+    for (int i = 0; i < node.event_sequences.size(); ++i) {
+      const TEventSequence& seq = node.event_sequences[i];
+      EventSequenceMap::iterator it = event_sequence_map_.find(seq.name);
+      if (it == event_sequence_map_.end()) {
+        event_sequence_map_[seq.name] =
+            pool_->Add(new EventSequence(seq.timestamps, seq.labels));
+      } else {
+        it->second->AddNewerEvents(seq.timestamps, seq.labels);
+      }
+    }
+  }
+
+  {
     lock_guard<SpinLock> l(summary_stats_map_lock_);
     for (int i = 0; i < node.summary_stats_counters.size(); ++i) {
       const TSummaryStatsCounter& c = node.summary_stats_counters[i];
@@ -1058,7 +1071,11 @@ string RuntimeProfile::TimeSeriesCounter::DebugString() const {
   return ss.str();
 }
 
-void RuntimeProfile::EventSequence::ToThrift(TEventSequence* seq) const {
+void RuntimeProfile::EventSequence::ToThrift(TEventSequence* seq) {
+  lock_guard<SpinLock> l(lock_);
+  /// It's possible that concurrent events can be logged out of sequence so we sort the
+  /// events before serializing them.
+  SortEvents();
   for (const EventSequence::Event& ev: events_) {
     seq->labels.push_back(ev.first);
     seq->timestamps.push_back(ev.second);

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/be/src/util/stopwatch.h
----------------------------------------------------------------------
diff --git a/be/src/util/stopwatch.h b/be/src/util/stopwatch.h
index c1f85aa..443b655 100644
--- a/be/src/util/stopwatch.h
+++ b/be/src/util/stopwatch.h
@@ -145,20 +145,8 @@ class MonotonicStopWatch {
     return total_time_;
   }
 
- private:
-  /// Start epoch value.
-  uint64_t start_;
-
-  /// Total elapsed time in nanoseconds.
-  uint64_t total_time_;
-
-  /// Upper bound of the running time as a epoch value. If the value is larger than 0,
-  /// the stopwatch interprets this as a time ceiling is set.
-  uint64_t time_ceiling_;
-
-  /// True if stopwatch is running.
-  bool running_;
-
+  /// Returns an representation of the current time in nanoseconds. It can be used to
+  /// measure time durations by repeatedly calling this function and comparing the result.
   /// While this function returns nanoseconds, its resolution may be as large as
   /// milliseconds, depending on OsInfo::fast_clock().
   static inline int64_t Now() {
@@ -174,6 +162,20 @@ class MonotonicStopWatch {
 #endif
   }
 
+ private:
+  /// Start epoch value.
+  uint64_t start_;
+
+  /// Total elapsed time in nanoseconds.
+  uint64_t total_time_;
+
+  /// Upper bound of the running time as a epoch value. If the value is larger than 0,
+  /// the stopwatch interprets this as a time ceiling is set.
+  uint64_t time_ceiling_;
+
+  /// True if stopwatch is running.
+  bool running_;
+
   /// Returns the time since start.
   /// If time_ceiling_ is set, the stop watch won't run pass the ceiling.
   uint64_t RunningTime() const {

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/common/thrift/ImpalaInternalService.thrift
----------------------------------------------------------------------
diff --git a/common/thrift/ImpalaInternalService.thrift b/common/thrift/ImpalaInternalService.thrift
index a584ce1..121c551 100644
--- a/common/thrift/ImpalaInternalService.thrift
+++ b/common/thrift/ImpalaInternalService.thrift
@@ -50,7 +50,7 @@ enum TParquetFallbackSchemaResolution {
   NAME
 }
 
-// The order of the enum values needs to be kepy in sync with
+// The order of the enum values needs to be kept in sync with
 // ParquetMetadataUtils::ORDERED_ARRAY_ENCODINGS in parquet-metadata-utils.cc.
 enum TParquetArrayResolution {
   THREE_LEVEL,
@@ -606,6 +606,21 @@ struct TErrorLogEntry {
   2: list<string> messages
 }
 
+// Represents the states that a fragment instance goes through during its execution. The
+// current state gets sent back to the coordinator and will be presented to users through
+// the debug webpages.
+enum TFInstanceExecState {
+  WAITING_FOR_EXEC,
+  WAITING_FOR_CODEGEN,
+  WAITING_FOR_PREPARE,
+  WAITING_FOR_OPEN,
+  WAITING_FOR_FIRST_BATCH,
+  FIRST_BATCH_PRODUCED,
+  PRODUCING_DATA,
+  LAST_BATCH_SENT,
+  FINISHED
+}
+
 struct TFragmentInstanceExecStatus {
   // required in V1
   1: optional Types.TUniqueId fragment_instance_id
@@ -621,6 +636,10 @@ struct TFragmentInstanceExecStatus {
   // cumulative profile
   // required in V1
   4: optional RuntimeProfile.TRuntimeProfileTree profile
+
+  // The current state of this fragment instance's execution.
+  // required in V1
+  5: optional TFInstanceExecState current_state
 }
 
 struct TReportExecStatusParams {

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/tests/query_test/test_observability.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_observability.py b/tests/query_test/test_observability.py
index e8599b5..85fc4f1 100644
--- a/tests/query_test/test_observability.py
+++ b/tests/query_test/test_observability.py
@@ -181,3 +181,16 @@ class TestObservability(ImpalaTestSuite):
     dbg_str = "Debug thrift profile for query {0} not available in {1} seconds".format(
       query_id, MAX_WAIT)
     assert False, dbg_str
+
+  def test_query_profile_contains_instance_events(self, unique_database):
+    """Test that /query_profile_encoded contains an event timeline for fragment
+    instances, even when there are errors."""
+    events = ["Fragment Instance Lifecycle Event Timeline",
+              "Prepare Finished",
+              "First Batch Produced",
+              "First Batch Sent",
+              "ExecInternal Finished"]
+    query = "select count(*) from functional.alltypes"
+    runtime_profile = self.execute_query(query).runtime_profile
+    for event in events:
+      assert event in runtime_profile

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/tests/webserver/test_web_pages.py
----------------------------------------------------------------------
diff --git a/tests/webserver/test_web_pages.py b/tests/webserver/test_web_pages.py
index 54163dc..8dd17a4 100644
--- a/tests/webserver/test_web_pages.py
+++ b/tests/webserver/test_web_pages.py
@@ -31,6 +31,7 @@ class TestWebPage(ImpalaTestSuite):
   CATALOG_OBJECT_URL = "http://localhost:{0}/catalog_object"
   TABLE_METRICS_URL = "http://localhost:{0}/table_metrics"
   QUERY_BACKENDS_URL = "http://localhost:{0}/query_backends"
+  QUERY_FINSTANCES_URL = "http://localhost:{0}/query_finstances"
   THREAD_GROUP_URL = "http://localhost:{0}/thread-group"
   # log4j changes do not apply to the statestore since it doesn't
   # have an embedded JVM. So we make two sets of ports to test the
@@ -173,7 +174,21 @@ class TestWebPage(ImpalaTestSuite):
     self.get_and_check_status(self.TABLE_METRICS_URL +
       "?name=%s.%s" % (db_name, tbl_name), metric, ports_to_test=self.CATALOG_TEST_PORT)
 
-  def test_query_details(self, unique_database):
+  def __run_query_and_get_debug_page(self, query, page_url):
+    """Runs a query to obtain the content of the debug page pointed to by page_url, then
+    cancels the query."""
+    query_handle =  self.client.execute_async(query)
+    response_json = ""
+    try:
+      response = self.get_and_check_status(
+        page_url + "?query_id=%s&json" % query_handle.get_handle().id,
+        ports_to_test=[25000])
+      response_json = json.loads(response)
+    finally:
+      self.client.cancel(query_handle)
+    return response_json
+
+  def test_backend_states(self, unique_database):
     """Test that /query_backends returns the list of backend states for DML or queries;
     nothing for DDL statements"""
     CROSS_JOIN = ("select count(*) from functional.alltypes a "
@@ -181,20 +196,27 @@ class TestWebPage(ImpalaTestSuite):
     for q in [CROSS_JOIN,
               "CREATE TABLE {0}.foo AS {1}".format(unique_database, CROSS_JOIN),
               "DESCRIBE functional.alltypes"]:
-      query_handle =  self.client.execute_async(q)
-      try:
-        response = self.get_and_check_status(
-          self.QUERY_BACKENDS_URL + "?query_id=%s&json" % query_handle.get_handle().id,
-          ports_to_test=[25000])
+      response_json = self.__run_query_and_get_debug_page(q, self.QUERY_BACKENDS_URL)
 
-        response_json = json.loads(response)
+      if "DESCRIBE" not in q:
+        assert len(response_json['backend_states']) > 0
+      else:
+        assert 'backend_states' not in response_json
 
-        if "DESCRIBE" not in q:
-          assert len(response_json['backend_states']) > 0
-        else:
-          assert 'backend_states' not in response_json
-      finally:
-        self.client.cancel(query_handle)
+  def test_backend_instances(self, unique_database):
+    """Test that /query_finstances returns the list of fragment instances for DML or
+    queries; nothing for DDL statements"""
+    CROSS_JOIN = ("select count(*) from functional.alltypes a "
+                  "CROSS JOIN functional.alltypes b CROSS JOIN functional.alltypes c")
+    for q in [CROSS_JOIN,
+              "CREATE TABLE {0}.foo AS {1}".format(unique_database, CROSS_JOIN),
+              "DESCRIBE functional.alltypes"]:
+      response_json = self.__run_query_and_get_debug_page(q, self.QUERY_FINSTANCES_URL)
+
+      if "DESCRIBE" not in q:
+        assert len(response_json['backend_instances']) > 0
+      else:
+        assert 'backend_instances' not in response_json
 
   def test_io_mgr_threads(self):
     """Test that IoMgr threads have readable names. This test assumed that all systems we
@@ -207,4 +229,3 @@ class TestWebPage(ImpalaTestSuite):
     for pattern in expected_name_patterns:
       assert any(pattern in t for t in thread_names), \
            "Could not find thread matching '%s'" % pattern
-

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/www/query_detail_tabs.tmpl
----------------------------------------------------------------------
diff --git a/www/query_detail_tabs.tmpl b/www/query_detail_tabs.tmpl
index 0318761..8e413a1 100644
--- a/www/query_detail_tabs.tmpl
+++ b/www/query_detail_tabs.tmpl
@@ -28,4 +28,5 @@ under the License.
   <li id="profile-tab" role="presentation"><a href="/query_profile?query_id={{query_id}}">Profile</a></li>
   <li id="memory-tab" role="presentation"><a href="/query_memory?query_id={{query_id}}">Memory</a></li>
   <li id="backends-tab" role="presentation"><a href="/query_backends?query_id={{query_id}}">Backends</a></li>
+  <li id="finstances-tab" role="presentation"><a href="/query_finstances?query_id={{query_id}}">Fragment Instances</a></li>
 </ul>

http://git-wip-us.apache.org/repos/asf/impala/blob/057cc51b/www/query_finstances.tmpl
----------------------------------------------------------------------
diff --git a/www/query_finstances.tmpl b/www/query_finstances.tmpl
new file mode 100644
index 0000000..5d8ba33
--- /dev/null
+++ b/www/query_finstances.tmpl
@@ -0,0 +1,129 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+{{> www/common-header.tmpl }}
+{{> www/query_detail_tabs.tmpl }}
+<br/>
+{{?backend_instances}}
+<div>
+  <label>
+    <input type="checkbox" checked="true" id="toggle" onClick="toggleRefresh()"/>
+    <span id="refresh_on">Auto-refresh on</span>
+  </label>  Last updated: <span id="last-updated"></span>
+</div>
+
+<br/>
+<table id="finstances" class='table table-hover table-bordered'>
+  <thead>
+    <tr>
+      <th>Host</th>
+      <th>Fragment<br/>Name</th>
+      <th>Instance ID</th>
+      <th>Current state</th>
+      <th>Done</th>
+      <th>Time since last report (ms)</th>
+    </tr>
+  </thead>
+  <tbody>
+
+  </tbody>
+</table>
+
+<script>
+document.getElementById("finstances-tab").className = "active";
+
+var intervalId = 0;
+var table = null;
+var refresh = function () {
+    table.ajax.reload();
+    document.getElementById("last-updated").textContent = new Date();
+};
+
+// Unpack Json backend_states by merging the backend host name into every instance stats
+// row. Also clears the last report timestamp field for instances that have not started or
+// have already finished execution.
+function unpackJson(json) {
+    var result = new Array();
+    if (typeof json.backend_instances === "undefined") {
+        // Table will be empty, remove it.
+        table.table().destroy(true);
+        $("#finstances").remove();
+        // Display completion message.
+        $("#query_finished_alert").css("visibility", "visible");
+        // Stop auto refresh
+        $("#toggle").prop("checked", false);
+        toggleRefresh();
+        return json;
+    }
+    for (var i = 0; i < json.backend_instances.length; ++i) {
+        var backend_state = json.backend_instances[i];
+        var instance_stats = backend_state.instance_stats;
+        for (var j = 0; j < instance_stats.length; ++j) {
+            var instance = instance_stats[j];
+            instance.host = backend_state.host;
+            if (instance.done) instance.time_since_last_heard_from = "";
+            if (!instance.first_status_update_received) {
+              instance.time_since_last_heard_from = "";
+            }
+            delete instance.first_status_update_received;
+            result.push(instance);
+        }
+    }
+    return result;
+}
+
+$(document).ready(function() {
+    table = $('#finstances').DataTable({
+        ajax: { url: "/query_finstances?query_id={{query_id}}&json",
+                dataSrc: unpackJson,
+              },
+        "columns": [ {data: 'host'},
+                     {data: 'fragment_name'},
+                     {data: 'instance_id'},
+                     {data: 'current_state'},
+                     {data: 'done'},
+                     {data: 'time_since_last_heard_from'}],
+        "order": [[ 0, "desc" ]],
+        "pageLength": 100
+    });
+    intervalId = setInterval(refresh, 1000);
+});
+
+function toggleRefresh() {
+    if (document.getElementById("toggle").checked == true) {
+        intervalId = setInterval(refresh, 1000);
+        document.getElementById("refresh_on").textContent = "Auto-refresh on";
+    } else {
+        clearInterval(intervalId);
+        document.getElementById("refresh_on").textContent = "Auto-refresh off";
+    }
+}
+
+</script>
+{{/backend_instances}}
+
+<div class="alert alert-info" role="alert" id="query_finished_alert"
+     style="visibility:hidden">
+Query <strong>{{query_id}}</strong> has completed, or has not started any backends, yet.
+</div>
+{{^backend_instances}}
+<script>$("#query_finished_alert").css("visibility", "visible");</script>
+{{/backend_instances}}
+
+{{> www/common-footer.tmpl }}


[08/11] impala git commit: IMPALA-6410: Tool to cherrypick changes across branches.

Posted by ta...@apache.org.
IMPALA-6410: Tool to cherrypick changes across branches.

This script compares two branches and optionally cherry-picks changes
across. It uses the Gerrit Change-Id as the key, and it supports a
configuration file and a string to ignore commits.

Change-Id: I6120ec2d6e914a1e5fda568178b32aafda8722a9
Reviewed-on: http://gerrit.cloudera.org:8080/9045
Reviewed-by: Jim Apple <jb...@apache.org>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/14e170ad
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/14e170ad
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/14e170ad

Branch: refs/heads/2.x
Commit: 14e170adf7ecbbad5725996a8232e2f57c1f13b3
Parents: 057cc51
Author: Philip Zeyliger <ph...@cloudera.com>
Authored: Wed Jan 17 09:42:54 2018 -0800
Committer: Philip Zeyliger <ph...@cloudera.com>
Committed: Wed Jan 24 10:17:57 2018 -0800

----------------------------------------------------------------------
 bin/compare_branches.py  | 273 ++++++++++++++++++++++++++++++++++++++++++
 bin/ignored_commits.json |   1 +
 2 files changed, 274 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/14e170ad/bin/compare_branches.py
----------------------------------------------------------------------
diff --git a/bin/compare_branches.py b/bin/compare_branches.py
new file mode 100755
index 0000000..6a81901
--- /dev/null
+++ b/bin/compare_branches.py
@@ -0,0 +1,273 @@
+#!/usr/bin/env python
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+HELP = '''
+Compares two specified branches, using the Gerrit Change-Id as the
+primary identifier. Ignored commits can be added via a JSON
+configuration file or with a special string in the commit message.
+Changes can be cherrypicked with the --cherry_pick argument.
+
+This script can be used to keep two development branches
+(by default, "master" and "2.x", in sync). It is equivalent
+to cherry-picking commits one by one, but automates identifying
+the commits to cherry-pick. Unlike "git cherry", it uses
+the Gerrit Change-Id identifier in the commit message
+as a key.
+
+The ignored_commits.json configuration file is of the following
+form. Note that commits are the full 20-byte git hashes.
+
+[
+  {
+    "source": "master",
+    "target": "2.x",
+    "commits": [
+      { "hash": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "comment": "...",
+      { "hash": "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", "comment": "..."}
+    ]
+  }
+]
+
+The --target_remote_name is optional. If not specified, the target remote is set to
+the value of the --source_remote_name.  Debug logging to stderr can be enabled with
+--verbose.
+
+Example:
+
+$bin/compare_branches.py --source_branch master --target_branch 2.x
+--------------------------------------------------------------------------------
+Commits in asf-gerrit/master but not in asf-gerrit/2.x:
+--------------------------------------------------------------------------------
+35a3e186d61b8f365b0f7d1127be311758437e16 IMPALA-5478: Run TPCDS queries with decimal_v2 enabled (Thu Jan 18 03:28:51 2018 +0000) - Taras Bobrovytsky
+d9b6fd073055b436c7404d49454dc215b2c7a369 IMPALA-6386: Invalidate metadata at table level for dataload (Wed Jan 17 22:52:58 2018 +0000) - Joe McDonnell
+dcc7be0ed483b332dac22d6596f56ff2a6cfdaa3 IMPALA-4315: Allow USE and SHOW TABLES if the user has only column privileges (Wed Jan 17 22:40:13 2018 +0000) - Csaba Ringhofer
+b6e43133e671773d2757612f72cfcdb0ff303226 IMPALA-6399: Increase timeout in test_observability to reduce flakiness (Wed Jan 17 22:31:33 2018 +0000) - Lars Volker
+--------------------------------------------------------------------------------
+Jira keys referenced (Note: not all commit messages will reference a jira key):
+IMPALA-5478,IMPALA-6386,IMPALA-4315,IMPALA-6399
+--------------------------------------------------------------------------------
+'''
+
+import argparse
+import json
+import logging
+import os
+import re
+import sh
+import sys
+
+from collections import defaultdict
+try:
+  from collections import OrderedDict
+except ImportError:
+  from ordereddict import OrderedDict
+from pprint import pformat
+
+def create_parser():
+  class CustomFormatter(argparse.ArgumentDefaultsHelpFormatter,
+      argparse.RawDescriptionHelpFormatter):
+    """
+    Mix-in to leave the description alone, but show
+    defaults.
+    """
+    pass
+
+  parser = argparse.ArgumentParser(
+      formatter_class=CustomFormatter,
+      description=HELP)
+
+  parser.add_argument('--cherry_pick', action='store_true', default=False,
+      help='Cherry-pick mismatched commits to current branch. This ' +
+        'must match (in the hash sense) the target branch.')
+  parser.add_argument('--source_branch', default='master')
+  parser.add_argument('--target_branch', default='2.x')
+  parser.add_argument('--source_remote_name', default='asf-gerrit',
+      help='Name of the source git remote. If set to empty string, ' +
+           'this remote is not fetched and branch names are used ' +
+           ' as is; otherwise, the source ref is remote/branch.')
+  parser.add_argument('--target_remote_name', default=None,
+      help='Name of the target git remote; defaults to source remote. ' +
+           'Empty strings are handled the same way as --source_remote_name.')
+  default_ignored_commits_path = os.path.join(
+      os.path.dirname(os.path.abspath(__file__)), 'ignored_commits.json')
+  parser.add_argument('--ignored_commits_file', default=default_ignored_commits_path,
+      help='JSON File that contains ignored commits as specified in the help')
+  parser.add_argument('--skip_commits_matching', default="Cherry-picks: not for {branch}",
+      help='String in commit messages that causes the commit to be ignored. ' +
+           ' {branch} is replaced with target branch; the search is case-insensitive')
+  parser.add_argument('--verbose', '-v', action='store_true', default=False,
+      help='Turn on DEBUG and INFO logging')
+  return parser
+
+def read_ignored_commits(ignored_commits_file):
+  '''Returns a dictionary containing commits that should be ignored.
+
+  ignored_commits_file is a path to a JSON file with schema
+  specified at the top of this file.
+
+  The return structure has dictionary keys are a tuple containing
+  (source_branch, target_branch) and values are a set of git hashes.
+  '''
+  ignored_commits = defaultdict(set)
+  with open(ignored_commits_file) as f:
+    json_data = json.load(f)
+    for result_dict in json_data:
+      logging.debug("Parsing result_dict: {0}".format(result_dict))
+      ignored_commits[(result_dict['source'], result_dict['target'])] =\
+          set([ commit["hash"] for commit in result_dict['commits'] ])
+  return ignored_commits
+
+def build_commit_map(branch, merge_base):
+  '''Creates a map from change id to (hash, subject, author, date, body).'''
+  # Disable git pager in order for the sh.git.log command to work
+  os.environ['GIT_PAGER'] = ''
+
+  fields = ['%H', '%s', '%an', '%cd', '%b']
+  pretty_format = '\x1f'.join(fields) + '\x1e'
+  result = OrderedDict()
+  for line in sh.git.log(
+      branch, "^" + merge_base, pretty=pretty_format, color='never').split('\x1e'):
+    if line == "":
+      # if no changes are identified by the git log, we get an empty string
+      continue
+    if line == "\n":
+      # git log adds a newline to the end; we can skip it
+      continue
+    commit_hash, subject, author, date, body = [t.strip() for t in line.split('\x1f')]
+    change_id_matches = re.findall('Change-Id: (.*)', body)
+    if change_id_matches:
+      if len(change_id_matches) > 1:
+        logging.warning("Commit %s contains multiple change ids; using first one.",
+            commit_hash)
+      change_id = change_id_matches[0]
+      result[change_id] = (commit_hash, subject, author, date, body)
+    else:
+      logging.warning('Commit {0} ({1}...) has no Change-Id.'.format(
+        commit_hash, subject[:40]))
+  logging.debug("Commit map for branch %s has size %d.", branch, len(result))
+  return result
+
+def cherrypick(cherry_pick_hashes, full_target_branch_name):
+  """Cherrypicks the given commits.
+
+  Also, asserts that full_target_branch_name matches the current HEAD.
+
+  cherry_pick_hashes is a list of git hashes, in the order to
+  be cherry-picked.
+
+  Note that this function does not push to the remote.
+  """
+  print "Cherrypicking %d changes." % (len(cherry_pick_hashes),)
+
+  if len(cherry_pick_hashes) == 0:
+    return
+
+  # Cherrypicking only makes sense if we're on the equivalent of the target branch.
+  head_sha = sh.git('rev-parse', 'HEAD').strip()
+  target_branch_sha = sh.git('rev-parse', full_target_branch_name).strip()
+  if head_sha != target_branch_sha:
+    print "Cannot cherrypick because %s (%s) and HEAD (%s) are divergent." % (
+        full_target_branch_name, target_branch_sha, head_sha)
+    sys.exit(1)
+
+  cherry_pick_hashes.reverse()
+  for cherry_pick_hash in cherry_pick_hashes:
+    sh.git('cherry-pick', '--keep-redundant-commits', cherry_pick_hash)
+
+
+def main():
+  parser = create_parser()
+  options = parser.parse_args()
+
+  log_level = logging.WARNING
+  if options.verbose:
+    log_level = logging.DEBUG
+  logging.basicConfig(level=log_level,
+      format='%(asctime)s %(threadName)s %(levelname)s: %(message)s')
+
+  if options.target_remote_name is None:
+    options.target_remote_name = options.source_remote_name
+
+  # Ensure all branches are up to date, unless remotes are disabled
+  # by specifying them with an empty string.
+  if options.source_remote_name != "":
+    sh.git.fetch(options.source_remote_name)
+    full_source_branch_name = options.source_remote_name + '/' + options.source_branch
+  else:
+    full_source_branch_name = options.source_branch
+  if options.target_remote_name != "":
+    if options.source_remote_name != options.target_remote_name:
+      sh.git.fetch(options.target_remote_name)
+    full_target_branch_name = options.target_remote_name + '/' + options.target_branch
+  else:
+    full_target_branch_name = options.target_branch
+
+  merge_base = sh.git("merge-base",
+      full_source_branch_name, full_target_branch_name).strip()
+  source_commits = build_commit_map(full_source_branch_name, merge_base)
+  target_commits = build_commit_map(full_target_branch_name, merge_base)
+
+  ignored_commits = read_ignored_commits(options.ignored_commits_file)
+  logging.debug("ignored commits from {0}:\n{1}"
+               .format(options.ignored_commits_file, pformat(ignored_commits)))
+  commits_ignored = []  # Track commits actually ignored for debug logging
+
+  cherry_pick_hashes = []
+  print '-' * 80
+  print 'Commits in {0} but not in {1}:'.format(
+      full_source_branch_name, full_target_branch_name)
+  print '-' * 80
+  jira_keys = []
+  jira_key_pat = re.compile(r'(IMPALA-\d+)')
+  skip_commits_matching = options.skip_commits_matching.replace(
+      "{branch}", options.target_branch)
+  for change_id, (commit_hash, msg, author, date, body) in source_commits.iteritems():
+    change_in_target = change_id in target_commits
+    ignore_by_config = commit_hash in ignored_commits[
+        (options.source_branch, options.target_branch)]
+    ignore_by_commit_message = skip_commits_matching.lower() in msg.lower() \
+        or skip_commits_matching.lower() in body.lower()
+    # This conditional block just for debug logging of ignored commits
+    if ignore_by_config or ignore_by_commit_message:
+      if change_in_target:
+        logging.debug("Not ignoring commit because change is already in target: {0}"
+                     .format(commit_hash))
+      else:
+        if ignore_by_commit_message:
+          logging.debug("Ignoring commit {0} by commit message.".format(commit_hash))
+        else:
+          logging.debug("Ignoring commit {0} by config file.".format(commit_hash))
+        commits_ignored.append(commit_hash)
+    else:
+      logging.debug("NOT ignoring commit {0} since not in ignored commits ({1},{2})"
+                   .format(commit_hash, options.source_branch, options.target_branch))
+    if not change_in_target and not ignore_by_config and not ignore_by_commit_message:
+      print u'{0} {1} ({2}) - {3}'.format(commit_hash, msg, date, author)
+      cherry_pick_hashes.append(commit_hash)
+      jira_keys += jira_key_pat.findall(msg)
+
+  print '-' * 80
+
+  print "Jira keys referenced (Note: not all commit messages will reference a jira key):"
+  print ','.join(jira_keys)
+  print '-' * 80
+
+  logging.debug("Commits actually ignored (change was not in target): {0}"
+               .format(pformat(commits_ignored)))
+
+  if options.cherry_pick:
+    cherrypick(cherry_pick_hashes, full_target_branch_name)
+
+if __name__ == '__main__':
+  main()

http://git-wip-us.apache.org/repos/asf/impala/blob/14e170ad/bin/ignored_commits.json
----------------------------------------------------------------------
diff --git a/bin/ignored_commits.json b/bin/ignored_commits.json
new file mode 100644
index 0000000..fe51488
--- /dev/null
+++ b/bin/ignored_commits.json
@@ -0,0 +1 @@
+[]


[05/11] impala git commit: IMPALA-6418: Find a reliable way to detect supported TLS versions

Posted by ta...@apache.org.
IMPALA-6418: Find a reliable way to detect supported TLS versions

Due to a RHEL bug, finding the version of the dynamically linked
OpenSSL is unreliable on certain RHEL platforms. This means that
if Impala is built against OpenSSL 1.0.0 and run against OpenSSL
1.0.1, due to the bug, Impala will not be able to access the
capabilities of OpenSSL 1.0.1. The most significant drawback of
this is that in this scenario, Impala cannot use TLSv1.2, even though
it is available.

The RHEL bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1497859

This patch imitates Kudu's way of determining what TLS versions are
supported via this patch:
https://github.com/apache/kudu/commit/b88117415a02699c12a6eacbf065c4140ee0963c

This is tested and works on all supported platforms of Impala.

Change-Id: Idd40219b7be5889b3c24457acdb79a28bdcd9bfb
Reviewed-on: http://gerrit.cloudera.org:8080/9060
Reviewed-by: Sailesh Mukil <sa...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/e32b2305
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/e32b2305
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/e32b2305

Branch: refs/heads/2.x
Commit: e32b230545800b7cc6c1672224969a38ae017838
Parents: 2a27b4f
Author: Sailesh Mukil <sa...@cloudera.com>
Authored: Thu Jan 18 11:00:32 2018 -0800
Committer: Philip Zeyliger <ph...@cloudera.com>
Committed: Wed Jan 24 10:17:56 2018 -0800

----------------------------------------------------------------------
 be/src/rpc/thrift-client.cc |  5 +++--
 be/src/rpc/thrift-server.cc | 25 ++++++++++++++++---------
 be/src/util/openssl-util.cc |  4 ++++
 be/src/util/openssl-util.h  | 25 +++++++++++++++++++++++--
 4 files changed, 46 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/e32b2305/be/src/rpc/thrift-client.cc
----------------------------------------------------------------------
diff --git a/be/src/rpc/thrift-client.cc b/be/src/rpc/thrift-client.cc
index 085e92e..1f8d99e 100644
--- a/be/src/rpc/thrift-client.cc
+++ b/be/src/rpc/thrift-client.cc
@@ -24,6 +24,7 @@
 #include <gutil/strings/substitute.h>
 
 #include "util/network-util.h"
+#include "util/openssl-util.h"
 #include "util/time.h"
 
 #include "common/names.h"
@@ -46,8 +47,8 @@ ThriftClientImpl::ThriftClientImpl(const std::string& ipaddress, int port, bool
         SSLProtoVersions::StringToProtocol(FLAGS_ssl_minimum_version, &version);
     if (init_status_.ok() && !SSLProtoVersions::IsSupported(version)) {
       string err =
-          Substitute("TLS ($0) version not supported (linked OpenSSL version is $1)",
-              version, SSLeay());
+          Substitute("TLS ($0) version not supported (maximum supported version is $1)",
+              version, MaxSupportedTlsVersion());
       init_status_ = Status(err);
     }
     if (!init_status_.ok()) return;

http://git-wip-us.apache.org/repos/asf/impala/blob/e32b2305/be/src/rpc/thrift-server.cc
----------------------------------------------------------------------
diff --git a/be/src/rpc/thrift-server.cc b/be/src/rpc/thrift-server.cc
index eaca699..75ad424 100644
--- a/be/src/rpc/thrift-server.cc
+++ b/be/src/rpc/thrift-server.cc
@@ -40,6 +40,7 @@
 #include "util/debug-util.h"
 #include "util/metrics.h"
 #include "util/network-util.h"
+#include "util/openssl-util.h"
 #include "util/os-util.h"
 #include "util/uid-util.h"
 
@@ -86,14 +87,20 @@ Status SSLProtoVersions::StringToProtocol(const string& in, SSLProtocol* protoco
   return Status(Substitute("Unknown TLS version: '$0'", in));
 }
 
-#define OPENSSL_MIN_VERSION_WITH_TLS_1_1 0x10001000L
-
 bool SSLProtoVersions::IsSupported(const SSLProtocol& protocol) {
-  bool is_openssl_1_0_0_or_lower = (SSLeay() < OPENSSL_MIN_VERSION_WITH_TLS_1_1);
-  if (is_openssl_1_0_0_or_lower) return (protocol == TLSv1_0_plus);
-
-  // All other versions supported by OpenSSL 1.0.1 and later.
-  return true;
+  DCHECK_LE(protocol, TLSv1_2_plus);
+  int max_supported_tls_version = MaxSupportedTlsVersion();
+  DCHECK_GE(max_supported_tls_version, TLS1_VERSION);
+
+  switch (max_supported_tls_version) {
+    case TLS1_VERSION:
+      return protocol == TLSv1_0_plus || protocol == TLSv1_0;
+    case TLS1_1_VERSION:
+      return protocol != TLSv1_2_plus && protocol != TLSv1_2;
+    default:
+      DCHECK_GE(max_supported_tls_version, TLS1_2_VERSION);
+      return true;
+  }
 }
 
 bool EnableInternalSslConnections() {
@@ -376,8 +383,8 @@ Status ThriftServer::CreateSocket(boost::shared_ptr<TServerTransport>* socket) {
   if (ssl_enabled()) {
     if (!SSLProtoVersions::IsSupported(version_)) {
       return Status(TErrorCode::SSL_SOCKET_CREATION_FAILED,
-          Substitute("TLS ($0) version not supported (linked OpenSSL version is $1)",
-                        version_, SSLeay()));
+          Substitute("TLS ($0) version not supported (maximum supported version is $1)",
+                        version_, MaxSupportedTlsVersion()));
     }
     try {
       // This 'factory' is only called once, since CreateSocket() is only called from

http://git-wip-us.apache.org/repos/asf/impala/blob/e32b2305/be/src/util/openssl-util.cc
----------------------------------------------------------------------
diff --git a/be/src/util/openssl-util.cc b/be/src/util/openssl-util.cc
index a8ec976..264a49a 100644
--- a/be/src/util/openssl-util.cc
+++ b/be/src/util/openssl-util.cc
@@ -43,6 +43,10 @@ static const int RNG_RESEED_INTERVAL = 128;
 // Number of bytes of entropy to add at RNG_RESEED_INTERVAL.
 static const int RNG_RESEED_BYTES = 512;
 
+int MaxSupportedTlsVersion() {
+  return SSLv23_method()->version;
+}
+
 // Callback used by OpenSSLErr() - write the error given to us through buf to the
 // stringstream that's passed in through ctx.
 static int OpenSSLErrCallback(const char* buf, size_t len, void* ctx) {

http://git-wip-us.apache.org/repos/asf/impala/blob/e32b2305/be/src/util/openssl-util.h
----------------------------------------------------------------------
diff --git a/be/src/util/openssl-util.h b/be/src/util/openssl-util.h
index 22f8235..67d014d 100644
--- a/be/src/util/openssl-util.h
+++ b/be/src/util/openssl-util.h
@@ -21,12 +21,31 @@
 #include <openssl/aes.h>
 #include <openssl/evp.h>
 #include <openssl/sha.h>
+#include <openssl/ssl.h>
 
 #include "common/status.h"
 
 namespace impala {
 
-#define OPENSSL_VERSION_1_0_1 0x1000100L
+// From https://github.com/apache/kudu/commit/b88117415a02699c12a6eacbf065c4140ee0963c
+//
+// Hard code OpenSSL flag values from OpenSSL 1.0.1e[1][2] when compiling
+// against OpenSSL 1.0.0 and below. We detect when running against a too-old
+// version of OpenSSL using these definitions at runtime so that Kudu has full
+// functionality when run against a new OpenSSL version, even if it's compiled
+// against an older version.
+//
+// [1]: https://github.com/openssl/openssl/blob/OpenSSL_1_0_1e/ssl/ssl.h#L605-L609
+// [2]: https://github.com/openssl/openssl/blob/OpenSSL_1_0_1e/ssl/tls1.h#L166-L172
+#ifndef TLS1_1_VERSION
+#define TLS1_1_VERSION 0x0302
+#endif
+#ifndef TLS1_2_VERSION
+#define TLS1_2_VERSION 0x0303
+#endif
+
+/// Returns the maximum supported TLS version available in the linked OpenSSL library.
+int MaxSupportedTlsVersion();
 
 /// Add entropy from the system RNG to OpenSSL's global RNG. Called at system startup
 /// and again periodically to add new entropy.
@@ -67,7 +86,9 @@ class IntegrityHash {
 class EncryptionKey {
  public:
   EncryptionKey() : initialized_(false) {
-    mode_ = SSLeay() < OPENSSL_VERSION_1_0_1 ? AES_256_CFB : AES_256_CTR;
+    // If TLS1.2 is supported, then we're on a verison of OpenSSL that supports
+    // AES-256-CTR.
+    mode_ = MaxSupportedTlsVersion() < TLS1_2_VERSION ? AES_256_CFB : AES_256_CTR;
   }
 
   /// Initialize a key for temporary use with randomly generated data. Reinitializes with