You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@impala.apache.org by ta...@apache.org on 2017/12/22 15:47:04 UTC

[1/5] impala git commit: Remove unused deps, centralize some pom versions, upgrade SLF4J and commons-io.

Repository: impala
Updated Branches:
  refs/heads/master 2fb11fb73 -> a16fe803c


Remove unused deps, centralize some pom versions, upgrade SLF4J and commons-io.

As a follow-on to centralizing into one parent pom, we can now manage
thirdparty dependency versions in Java a little bit more clearly.

Upgrades SLF4J, commons.io:
  slf4j: 1.7.5 -> 1.7.25
  commons.io: 2.4 -> 2.6

  The SLF4J upgrade is nice to be able to run under Java9. The release
  notes at https://www.slf4j.org/news.html are uneventful.

  Commons IO 2.6 supports Java 9 and is source and binary compatible,
  per https://commons.apache.org/proper/commons-io/upgradeto2_6.html and
  https://commons.apache.org/proper/commons-io/upgradeto2_5.html.

Removes the following dependencies:
  htrace-core
  hadoop-mapreduce-client-core
  hive-shims
  com.stumbleupon:async
  commons-dbcp
  jdo-api

  I ran "mvn dependency:analyze" and these were some (but not all)
  of the "Unused declared dependencies found." Spelunking in git logs,
  these dependencies are from 2013 and possibly from an effort
  to run with dependencies from the filesystem. They don't seem
  to be required anymore.

Stops pulling in an old version of hadoop-client and kite-data-core in
testdata/TableFlattener by using the same versions as the Hadoop we use.
Doing so was unnecessarily causing us to download extra, old Hadoop
jars, and the new Hadoop jars seem to work just as well. This is the
kind of divergence that centralizing the versions into variables will
help with.

Creates variables for:
  junit.version
  slf4j.version
  hadoop.version
  commons-io.version
  httpcomponents.core.version
  thrift.version
  kite.version (controlled via $IMPALA_KITE_VERSION in impala-config.sh)

Cleans up unused IMPALA_PARQUET_URL variables in impala-config.sh. We
only download Parquet via Maven, rather than downloading it in the
toolchain, so this variable wasn't doing anything.

I ran the core tests with this change.

Change-Id: I717e0625dfe0fdbf7e9161312e9e80f405a359c5
Reviewed-on: http://gerrit.cloudera.org:8080/8853
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/f755910e
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/f755910e
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/f755910e

Branch: refs/heads/master
Commit: f755910e9773eabcf6c65caf992ac4153fbdf167
Parents: 2fb11fb
Author: Philip Zeyliger <ph...@cloudera.com>
Authored: Wed Dec 13 16:33:11 2017 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Wed Dec 20 22:04:18 2017 +0000

----------------------------------------------------------------------
 bin/impala-config.sh            |  2 +-
 fe/pom.xml                      | 45 ++++++------------------------------
 impala-parent/pom.xml           | 10 ++++++--
 testdata/TableFlattener/pom.xml |  4 ++--
 testdata/pom.xml                | 10 ++++----
 5 files changed, 23 insertions(+), 48 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/f755910e/bin/impala-config.sh
----------------------------------------------------------------------
diff --git a/bin/impala-config.sh b/bin/impala-config.sh
index 958e62e..058dc01 100755
--- a/bin/impala-config.sh
+++ b/bin/impala-config.sh
@@ -176,9 +176,9 @@ unset IMPALA_HIVE_URL
 export IMPALA_SENTRY_VERSION=1.5.1-cdh5.15.0-SNAPSHOT
 unset IMPALA_SENTRY_URL
 export IMPALA_PARQUET_VERSION=1.5.0-cdh5.15.0-SNAPSHOT
-unset IMPALA_PARQUET_URL
 export IMPALA_LLAMA_MINIKDC_VERSION=1.0.0
 unset IMPALA_LLAMA_MINIKDC_URL
+export IMPALA_KITE_VERSION=1.0.0-cdh5.15.0-SNAPSHOT
 
 # Source the branch and local config override files here to override any
 # variables above or any variables below that allow overriding via environment

http://git-wip-us.apache.org/repos/asf/impala/blob/f755910e/fe/pom.xml
----------------------------------------------------------------------
diff --git a/fe/pom.xml b/fe/pom.xml
index e069256..646820f 100644
--- a/fe/pom.xml
+++ b/fe/pom.xml
@@ -36,11 +36,6 @@ under the License.
 
   <dependencies>
     <dependency>
-      <groupId>org.htrace</groupId>
-      <artifactId>htrace-core</artifactId>
-      <version>3.0.4</version>
-    </dependency>
-    <dependency>
       <groupId>org.apache.impala</groupId>
       <artifactId>impala-data-source-api</artifactId>
       <version>${impala.extdatasrc.api.version}</version>
@@ -76,12 +71,6 @@ under the License.
     </dependency>
 
     <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-mapreduce-client-core</artifactId>
-      <version>${hadoop.version}</version>
-    </dependency>
-
-    <dependency>
       <groupId>org.apache.sentry</groupId>
       <artifactId>sentry-core-common</artifactId>
       <version>${sentry.version}</version>
@@ -190,7 +179,7 @@ under the License.
     <dependency>
       <groupId>org.apache.thrift</groupId>
       <artifactId>libthrift</artifactId>
-      <version>${env.IMPALA_THRIFT_JAVA_VERSION}</version>
+      <version>${thrift.version}</version>
       <!-- libthrift depends httpcore 4.1.3 which does not work with KMS. To workaround
            this problem the dependency is excluded here and we explicitly add a newer
            httpcore dependency version. See IMPALA-4210. TODO: Find a better fix. -->
@@ -205,13 +194,13 @@ under the License.
     <dependency>
       <groupId>org.apache.httpcomponents</groupId>
       <artifactId>httpcore</artifactId>
-      <version>4.2.5</version>
+      <version>${httpcomponents.core.version}</version>
     </dependency>
 
     <dependency>
       <groupId>org.apache.thrift</groupId>
       <artifactId>libfb303</artifactId>
-      <version>${env.IMPALA_THRIFT_JAVA_VERSION}</version>
+      <version>${thrift.version}</version>
     </dependency>
 
     <dependency>
@@ -248,11 +237,6 @@ under the License.
     </dependency>
     <dependency>
       <groupId>org.apache.hive</groupId>
-      <artifactId>hive-shims</artifactId>
-      <version>${hive.version}</version>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.hive</groupId>
       <artifactId>hive-exec</artifactId>
       <version>${hive.version}</version>
     </dependency>
@@ -262,11 +246,6 @@ under the License.
       <artifactId>kudu-client</artifactId>
       <version>${kudu.version}</version>
     </dependency>
-    <dependency>
-      <groupId>com.stumbleupon</groupId>
-      <artifactId>async</artifactId>
-      <version>1.3.1</version>
-    </dependency>
 
     <!-- This driver supports PostgreSQL 7.2 and newer -->
     <dependency>
@@ -275,16 +254,6 @@ under the License.
       <version>9.0-801.jdbc4</version>
     </dependency>
     <dependency>
-      <groupId>commons-dbcp</groupId>
-      <artifactId>commons-dbcp</artifactId>
-      <version>1.4</version>
-    </dependency>
-    <dependency>
-      <groupId>javax.jdo</groupId>
-      <artifactId>jdo-api</artifactId>
-      <version>3.0.1</version>
-    </dependency>
-    <dependency>
       <groupId>org.antlr</groupId>
       <artifactId>antlr-runtime</artifactId>
       <version>3.3</version>
@@ -305,24 +274,24 @@ under the License.
     <dependency>
       <groupId>org.slf4j</groupId>
       <artifactId>slf4j-api</artifactId>
-      <version>1.7.5</version>
+      <version>${slf4j.version}</version>
     </dependency>
     <dependency>
       <groupId>org.slf4j</groupId>
       <artifactId>slf4j-log4j12</artifactId>
-      <version>1.7.5</version>
+      <version>${slf4j.version}</version>
     </dependency>
 
     <dependency>
       <groupId>com.google.guava</groupId>
       <artifactId>guava</artifactId>
-      <version>11.0.2</version>
+      <version>${guava.version}</version>
     </dependency>
 
     <dependency>
       <groupId>junit</groupId>
       <artifactId>junit</artifactId>
-      <version>4.12</version>
+      <version>${junit.version}</version>
       <scope>test</scope>
     </dependency>
 

http://git-wip-us.apache.org/repos/asf/impala/blob/f755910e/impala-parent/pom.xml
----------------------------------------------------------------------
diff --git a/impala-parent/pom.xml b/impala-parent/pom.xml
index 196272a..6dc2064 100644
--- a/impala-parent/pom.xml
+++ b/impala-parent/pom.xml
@@ -41,13 +41,19 @@ under the License.
     <sentry.version>${env.IMPALA_SENTRY_VERSION}</sentry.version>
     <hbase.version>${env.IMPALA_HBASE_VERSION}</hbase.version>
     <parquet.version>${env.IMPALA_PARQUET_VERSION}</parquet.version>
+    <kite.version>${env.IMPALA_KITE_VERSION}</kite.version>
+    <thrift.version>${env.IMPALA_THRIFT_JAVA_VERSION}</thrift.version>
     <impala.extdatasrc.api.version>1.0-SNAPSHOT</impala.extdatasrc.api.version>
     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
     <kudu.version>${env.KUDU_JAVA_VERSION}</kudu.version>
+    <commons-io.version>2.6</commons-io.version>
+    <slf4j.version>1.7.25</slf4j.version>
+    <junit.version>4.12</junit.version>
+    <!-- Beware compatibility requirements with Thrift and
+         KMS; see IMPALA-4210. -->
+    <httpcomponents.core.version>4.2.5</httpcomponents.core.version>
     <yarn-extras.version>${project.version}</yarn-extras.version>
     <eclipse.output.directory>eclipse-classes</eclipse.output.directory>
-
-    <thrift.version>0.9.0</thrift.version>
     <guava.version>11.0.2</guava.version>
     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
   </properties>

http://git-wip-us.apache.org/repos/asf/impala/blob/f755910e/testdata/TableFlattener/pom.xml
----------------------------------------------------------------------
diff --git a/testdata/TableFlattener/pom.xml b/testdata/TableFlattener/pom.xml
index 12e0e90..527175b 100644
--- a/testdata/TableFlattener/pom.xml
+++ b/testdata/TableFlattener/pom.xml
@@ -50,12 +50,12 @@
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-client</artifactId>
-      <version>2.6.0</version>
+      <version>${hadoop.version}</version>
     </dependency>
     <dependency>
       <groupId>org.kitesdk</groupId>
       <artifactId>kite-data-core</artifactId>
-      <version>1.0.0-cdh5.4.1</version>
+      <version>${kite.version}</version>
     </dependency>
   </dependencies>
 </project>

http://git-wip-us.apache.org/repos/asf/impala/blob/f755910e/testdata/pom.xml
----------------------------------------------------------------------
diff --git a/testdata/pom.xml b/testdata/pom.xml
index e117e4b..ae584d9 100644
--- a/testdata/pom.xml
+++ b/testdata/pom.xml
@@ -39,13 +39,13 @@ under the License.
     <dependency>
       <groupId>com.google.guava</groupId>
       <artifactId>guava</artifactId>
-      <version>11.0.2</version>
+      <version>${guava.version}</version>
     </dependency>
 
     <dependency>
       <groupId>junit</groupId>
       <artifactId>junit</artifactId>
-      <version>4.12</version>
+      <version>${junit.version}</version>
       <scope>test</scope>
     </dependency>
 
@@ -88,7 +88,7 @@ under the License.
     <dependency>
       <groupId>commons-io</groupId>
       <artifactId>commons-io</artifactId>
-      <version>2.4</version>
+      <version>${commons-io.version}</version>
     </dependency>
 
     <dependency>
@@ -100,7 +100,7 @@ under the License.
     <dependency>
       <groupId>org.slf4j</groupId>
       <artifactId>slf4j-api</artifactId>
-      <version>1.7.5</version>
+      <version>${slf4j.version}</version>
     </dependency>
 
     <dependency>
@@ -112,7 +112,7 @@ under the License.
     <dependency>
       <groupId>org.kitesdk</groupId>
       <artifactId>kite-data-core</artifactId>
-      <version>1.0.0-cdh5.4.4</version>
+      <version>${kite.version}</version>
     </dependency>
   </dependencies>

[2/5] impala git commit: KUDU-2198. Allow disregarding system-wide auth-to-local mapping

Posted by ta...@apache.org.

KUDU-2198. Allow disregarding system-wide auth-to-local mapping

This adds a workaround for an issue reported on the user mailing list.
Some systems are configured such that the auth_to_local mapping provided
by the krb5 library doesn't work properly for service accounts.

This patch adds a new configuration which allows Kudu to disregard the
system auth_to_local rules and instead just map kerberos principals to
their first component, which is typically the username.

Change-Id: I2e893493f52965ea54d2ceaac83d375285b49486
Reviewed-on: http://gerrit.cloudera.org:8080/8373
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Dan Burkert <da...@apache.org>
Tested-by: Kudu Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/8875
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/ac9ca569
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/ac9ca569
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/ac9ca569

Branch: refs/heads/master
Commit: ac9ca569db3fa2dde5045fa19849d4086db148bf
Parents: f755910
Author: Todd Lipcon <to...@apache.org>
Authored: Tue Oct 24 12:44:23 2017 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Thu Dec 21 00:13:42 2017 +0000

----------------------------------------------------------------------
 be/src/kudu/security/init.cc | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/ac9ca569/be/src/kudu/security/init.cc
----------------------------------------------------------------------
diff --git a/be/src/kudu/security/init.cc b/be/src/kudu/security/init.cc
index cbba8c8..8c448c2 100644
--- a/be/src/kudu/security/init.cc
+++ b/be/src/kudu/security/init.cc
@@ -43,6 +43,22 @@
 DECLARE_string(keytab_file);
 TAG_FLAG(keytab_file, stable);
 
+#ifndef __APPLE__
+static constexpr bool kDefaultSystemAuthToLocal = true;
+#else
+// macOS's Heimdal library has a no-op implementation of
+// krb5_aname_to_localname, so instead we just use the simple
+// implementation.
+static constexpr bool kDefaultSystemAuthToLocal = false;
+#endif
+DEFINE_bool_hidden(use_system_auth_to_local, kDefaultSystemAuthToLocal,
+            "When enabled, use the system krb5 library to map Kerberos principal "
+            "names to local (short) usernames. If not enabled, the first component "
+            "of the principal will be used as the short name. For example, "
+            "'kudu/foo.example.com@EXAMPLE' will map to 'kudu'.");
+TAG_FLAG(use_system_auth_to_local, advanced);
+
+
 using std::mt19937;
 using std::random_device;
 using std::string;
@@ -400,18 +416,15 @@ Status MapPrincipalToLocalName(const std::string& principal, std::string* local_
       krb5_free_principal(g_krb5_ctx, princ);
     });
   char buf[1024];
-  krb5_error_code rc;
-#ifndef __APPLE__
-  rc = krb5_aname_to_localname(g_krb5_ctx, princ, arraysize(buf), buf);
-#else
-  // macOS's Heimdal library has a no-op implementation of
-  // krb5_aname_to_localname, so instead we fall down to below and grab the
-  // first component of the principal.
-  rc = KRB5_LNAME_NOTRANS;
-#endif
+  krb5_error_code rc = KRB5_LNAME_NOTRANS;
+  if (FLAGS_use_system_auth_to_local) {
+    rc = krb5_aname_to_localname(g_krb5_ctx, princ, arraysize(buf), buf);
+  }
   if (rc == KRB5_LNAME_NOTRANS || rc == KRB5_PLUGIN_NO_HANDLE) {
-    // No name mapping specified. We fall back to simply taking the first component
-    // of the principal, for compatibility with the default behavior of Hadoop.
+    // No name mapping specified, or krb5-based name mapping is disabled.
+    //
+    // We fall back to simply taking the first component of the principal, for
+    // compatibility with the default behavior of Hadoop.
     //
     // NOTE: KRB5_PLUGIN_NO_HANDLE isn't typically expected here, but works around
     // a bug in SSSD's auth_to_local implementation: https://pagure.io/SSSD/sssd/issue/3459

[3/5] impala git commit: IMPALA-6225: Part 2: Query profile date-time strings should have ns precision.

Posted by ta...@apache.org.

IMPALA-6225: Part 2: Query profile date-time strings should have ns
precision.

This commit follows 16d8dd58.

This patch adds a test case that inspects the thrift profile of a
completed query, and verifies that the "Start Time" and
"End Time" of the query have nanosecond precision. We chose to
work with the thrift profile directly, rather than parse the debug
web page, as it is the thrift profile which is consumed by
management API clients of Impala.

Change-Id: Id3421a34cc029ebca551730084c7cbd402d5c109
Reviewed-on: http://gerrit.cloudera.org:8080/8784
Reviewed-by: Michael Ho <kw...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/b581a9d1
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/b581a9d1
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/b581a9d1

Branch: refs/heads/master
Commit: b581a9d1ee277674671431dbd6880806d6120567
Parents: ac9ca56
Author: Zoram Thanga <zo...@cloudera.com>
Authored: Tue Dec 5 15:30:49 2017 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Thu Dec 21 04:26:33 2017 +0000

----------------------------------------------------------------------
 tests/common/impala_service.py         | 32 +++++++++++++++++++-
 tests/query_test/test_observability.py | 47 +++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/b581a9d1/tests/common/impala_service.py
----------------------------------------------------------------------
diff --git a/tests/common/impala_service.py b/tests/common/impala_service.py
index be33328..dfd4a6e 100644
--- a/tests/common/impala_service.py
+++ b/tests/common/impala_service.py
@@ -29,7 +29,11 @@ from tests.common.impala_connection import create_connection, create_ldap_connec
 from TCLIService import TCLIService
 from thrift.transport.TSocket import TSocket
 from thrift.transport.TTransport import TBufferedTransport
-from thrift.protocol import TBinaryProtocol
+from thrift.protocol import TBinaryProtocol, TCompactProtocol, TProtocol
+from thrift.TSerialization import deserialize
+from RuntimeProfile.ttypes import TRuntimeProfileTree
+import base64
+import zlib
 
 logging.basicConfig(level=logging.ERROR, format='%(threadName)s: %(message)s')
 LOG = logging.getLogger('impala_service')
@@ -57,6 +61,32 @@ class BaseImpalaService(object):
   def read_debug_webpage(self, page_name, timeout=10, interval=1):
     return self.open_debug_webpage(page_name, timeout=timeout, interval=interval).read()
 
+  def get_thrift_profile(self, query_id, timeout=10, interval=1):
+    """Returns thrift profile of the specified query ID, if available"""
+    page_name = "query_profile_encoded?query_id=%s" % (query_id)
+    try:
+      response = self.open_debug_webpage(page_name, timeout=timeout, interval=interval)
+      tbuf = response.read()
+    except Exception as e:
+      LOG.info("Thrift profile for query %s not yet available: %s", query_id, str(e))
+      return None
+    else:
+      tree = TRuntimeProfileTree()
+      try:
+        deserialize(tree, zlib.decompress(base64.b64decode(tbuf)),
+                  protocol_factory=TCompactProtocol.TCompactProtocolFactory())
+        tree.validate()
+        return tree
+      except Exception as e:
+        LOG.info("Exception while deserializing query profile of %s: %s", query_id,
+                str(e));
+        # We should assert that the response code is not 200 once
+        # IMPALA-6332: Impala webserver should return HTTP error code for missing query
+        # profiles, is fixed.
+        if str(e) == 'Incorrect padding':
+          assert "Could not obtain runtime profile" in tbuf, tbuf
+        return None
+
   def get_debug_webpage_json(self, page_name):
     """Returns the json for the given Impala debug webpage, eg. '/queries'"""
     return json.loads(self.read_debug_webpage(page_name + "?json"))

http://git-wip-us.apache.org/repos/asf/impala/blob/b581a9d1/tests/query_test/test_observability.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_observability.py b/tests/query_test/test_observability.py
index a7c3d29..1396ac5 100644
--- a/tests/query_test/test_observability.py
+++ b/tests/query_test/test_observability.py
@@ -17,6 +17,9 @@
 
 from tests.common.impala_test_suite import ImpalaTestSuite
 from tests.common.skip import SkipIfS3, SkipIfADLS, SkipIfIsilon, SkipIfLocal
+from tests.common.impala_cluster import ImpalaCluster
+import logging
+import time
 
 class TestObservability(ImpalaTestSuite):
   @classmethod
@@ -128,3 +131,47 @@ class TestObservability(ImpalaTestSuite):
     assert results.runtime_profile.count("HASH_JOIN_NODE") == 2
     assert results.runtime_profile.count("AGGREGATION_NODE") == 2
     assert results.runtime_profile.count("PLAN_ROOT_SINK") == 2
+
+  def test_query_profile_thrift_timestamps(self):
+    """Test that the query profile start and end time date-time strings have
+    nanosecond precision. Nanosecond precision is expected by management API clients
+    that consume Impala debug webpages."""
+    query = "select sleep(1000)"
+    handle = self.client.execute_async(query)
+    query_id = handle.get_handle().id
+    results = self.client.fetch(query, handle)
+    self.client.close()
+
+    start_time_sub_sec_str = ""
+    end_time_sub_sec_str = ""
+    start_time = ""
+    end_time = ""
+
+    MAX_RETRIES = 60
+    for retries in xrange(MAX_RETRIES):
+      tree = self.impalad_test_service.get_thrift_profile(query_id)
+
+      if tree is None:
+        continue
+      # tree.nodes[1] corresponds to ClientRequestState::summary_profile_
+      # See be/src/service/client-request-state.[h|cc].
+      start_time = tree.nodes[1].info_strings["Start Time"]
+      end_time = tree.nodes[1].info_strings["End Time"]
+      # Start and End Times are of the form "2017-12-07 22:26:52.167711000"
+      start_time_sub_sec_str = start_time.split('.')[-1]
+      end_time_sub_sec_str = end_time.split('.')[-1]
+      if len(end_time_sub_sec_str) == 0:
+        logging.info('end_time_sub_sec_str hasn\'t shown up yet, retries=%d', retries)
+        time.sleep(1)
+        continue
+      assert len(end_time_sub_sec_str) == 9, end_time
+      assert len(start_time_sub_sec_str) == 9, start_time
+      return True
+
+    # If we're here, we didn't get the final thrift profile from the debug web page.
+    # This could happen due to heavy system load. The test is then inconclusive.
+    # Log a message and fail this run.
+    dbg_str = 'Debug thrift profile for query ' + str(query_id) + ' not available in '
+    dbg_str += str(MAX_RETRIES) + ' seconds, '
+    dbg_str += '(' + start_time + ', ' + end_time + ').'
+    assert False, dgb_str

[5/5] impala git commit: IMPALA-5014: Part 1: Round when casting string to decimal

Posted by ta...@apache.org.

IMPALA-5014: Part 1: Round when casting string to decimal

In this patch we implement rounding when casting string to decimal if
DECIMAL_V2 is enabled. The backend method that parses strings and
converts them to decimals is refactored to make it easier to understand.

Testing:
- Added some BE tests.

Change-Id: Icd8b92727fb384e6ff2d145e4aab7ae5d27db26d
Reviewed-on: http://gerrit.cloudera.org:8080/8774
Reviewed-by: Taras Bobrovytsky <tb...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/a16fe803
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/a16fe803
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/a16fe803

Branch: refs/heads/master
Commit: a16fe803caac16d4ba64aeaa4fc750388aeca203
Parents: 4ce23f7
Author: Taras Bobrovytsky <tb...@cloudera.com>
Authored: Tue Nov 7 20:03:11 2017 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Dec 22 11:39:08 2017 +0000

----------------------------------------------------------------------
 be/src/benchmarks/atod-benchmark.cc      |   2 +-
 be/src/benchmarks/overflow-benchmark.cc  |   2 +-
 be/src/exec/hdfs-scanner-ir.cc           |  12 +--
 be/src/exec/text-converter.inline.h      |   6 +-
 be/src/exprs/decimal-operators-ir.cc     |  10 +-
 be/src/exprs/expr-test.cc                |   6 +-
 be/src/runtime/decimal-test.cc           | 115 +++++++++++++++++++++--
 be/src/runtime/decimal-value.inline.h    |  31 +------
 be/src/util/decimal-util.h               |  25 +++++
 be/src/util/string-parser.cc             |  12 +--
 be/src/util/string-parser.h              | 126 +++++++++++++++++---------
 tests/query_test/test_decimal_casting.py |   2 +-
 12 files changed, 245 insertions(+), 104 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/a16fe803/be/src/benchmarks/atod-benchmark.cc
----------------------------------------------------------------------
diff --git a/be/src/benchmarks/atod-benchmark.cc b/be/src/benchmarks/atod-benchmark.cc
index 9beb942..1e86844 100644
--- a/be/src/benchmarks/atod-benchmark.cc
+++ b/be/src/benchmarks/atod-benchmark.cc
@@ -83,7 +83,7 @@ void TestImpala(int batch_size, void* d) {
       const StringValue& str = data->data[j];
       StringParser::ParseResult dummy;
       val = StringParser::StringToDecimal<Storage>(
-          str.ptr, str.len, column_type, &dummy);
+          str.ptr, str.len, column_type, false, &dummy);
       data->result[j] = val;
     }
   }

http://git-wip-us.apache.org/repos/asf/impala/blob/a16fe803/be/src/benchmarks/overflow-benchmark.cc
----------------------------------------------------------------------
diff --git a/be/src/benchmarks/overflow-benchmark.cc b/be/src/benchmarks/overflow-benchmark.cc
index 1d4caf6..d59ab9d 100644
--- a/be/src/benchmarks/overflow-benchmark.cc
+++ b/be/src/benchmarks/overflow-benchmark.cc
@@ -84,7 +84,7 @@ void AddTestData(TestData* data, int n) {
     string str = ss.str();
     StringParser::ParseResult dummy;
     val = StringParser::StringToDecimal<int128_t>(
-        str.c_str(), str.length(), column_type, &dummy);
+        str.c_str(), str.length(), column_type, false, &dummy);
     data->values.push_back(val);
     data->int128_values.push_back(numeric_limits<int128_t>::max() * Rand());
   }

http://git-wip-us.apache.org/repos/asf/impala/blob/a16fe803/be/src/exec/hdfs-scanner-ir.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-scanner-ir.cc b/be/src/exec/hdfs-scanner-ir.cc
index d7c2d81..ec1d2a3 100644
--- a/be/src/exec/hdfs-scanner-ir.cc
+++ b/be/src/exec/hdfs-scanner-ir.cc
@@ -90,9 +90,9 @@ ScalarExprEvaluator* HdfsScanner::GetConjunctEval(int idx) const {
 
 void StringToDecimalSymbolDummy() {
   // Force linker to to link the object file containing these functions.
-  StringToDecimal4(nullptr, 0, 0, 0, nullptr);
-  StringToDecimal8(nullptr, 0, 0, 0, nullptr);
-  StringToDecimal16(nullptr, 0, 0, 0, nullptr);
+  StringToDecimal4(nullptr, 0, 0, 0, false, nullptr);
+  StringToDecimal8(nullptr, 0, 0, 0, false, nullptr);
+  StringToDecimal16(nullptr, 0, 0, 0, false, nullptr);
 }
 
 // Define the string parsing functions for llvm.  Stamp out the templated functions
@@ -142,7 +142,7 @@ void IrStringToTimestamp(TimestampValue* out, const char* s, int len,
 extern "C"
 Decimal4Value IrStringToDecimal4(const char* s, int len, int type_precision,
     int type_scale, ParseResult* result)  {
-  auto ret = StringToDecimal4(s, len, type_precision, type_scale, result);
+  auto ret = StringToDecimal4(s, len, type_precision, type_scale, false, result);
   if (*result != ParseResult::PARSE_SUCCESS) *result = ParseResult::PARSE_FAILURE;
   return ret;
 }
@@ -150,7 +150,7 @@ Decimal4Value IrStringToDecimal4(const char* s, int len, int type_precision,
 extern "C"
 Decimal8Value IrStringToDecimal8(const char* s, int len, int type_precision,
     int type_scale, ParseResult* result)  {
-  auto ret = StringToDecimal8(s, len, type_precision, type_scale, result);
+  auto ret = StringToDecimal8(s, len, type_precision, type_scale, false, result);
   if (*result != ParseResult::PARSE_SUCCESS) *result = ParseResult::PARSE_FAILURE;
   return ret;
 }
@@ -158,7 +158,7 @@ Decimal8Value IrStringToDecimal8(const char* s, int len, int type_precision,
 extern "C"
 Decimal16Value IrStringToDecimal16(const char* s, int len, int type_precision,
     int type_scale, ParseResult* result)  {
-  auto ret = StringToDecimal16(s, len, type_precision, type_scale, result);
+  auto ret = StringToDecimal16(s, len, type_precision, type_scale, false, result);
   if (*result != ParseResult::PARSE_SUCCESS) *result = ParseResult::PARSE_FAILURE;
   return ret;
 }

http://git-wip-us.apache.org/repos/asf/impala/blob/a16fe803/be/src/exec/text-converter.inline.h
----------------------------------------------------------------------
diff --git a/be/src/exec/text-converter.inline.h b/be/src/exec/text-converter.inline.h
index 49f4e8a..c56bcce 100644
--- a/be/src/exec/text-converter.inline.h
+++ b/be/src/exec/text-converter.inline.h
@@ -141,12 +141,12 @@ inline bool TextConverter::WriteSlot(const SlotDescriptor* slot_desc, Tuple* tup
         case 4:
           *reinterpret_cast<Decimal4Value*>(slot) =
               StringParser::StringToDecimal<int32_t>(
-                  data, len, slot_desc->type(), &parse_result);
+                  data, len, slot_desc->type(), false, &parse_result);
           break;
         case 8:
           *reinterpret_cast<Decimal8Value*>(slot) =
               StringParser::StringToDecimal<int64_t>(
-                  data, len, slot_desc->type(), &parse_result);
+                  data, len, slot_desc->type(), false, &parse_result);
           break;
         case 12:
           DCHECK(false) << "Planner should not generate this.";
@@ -154,7 +154,7 @@ inline bool TextConverter::WriteSlot(const SlotDescriptor* slot_desc, Tuple* tup
         case 16:
           *reinterpret_cast<Decimal16Value*>(slot) =
               StringParser::StringToDecimal<int128_t>(
-                  data, len, slot_desc->type(), &parse_result);
+                  data, len, slot_desc->type(), false, &parse_result);
           break;
         default:
           DCHECK(false) << "Decimal slots can't be this size.";

http://git-wip-us.apache.org/repos/asf/impala/blob/a16fe803/be/src/exprs/decimal-operators-ir.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/decimal-operators-ir.cc b/be/src/exprs/decimal-operators-ir.cc
index bb54a0f..8612561 100644
--- a/be/src/exprs/decimal-operators-ir.cc
+++ b/be/src/exprs/decimal-operators-ir.cc
@@ -534,22 +534,26 @@ IR_ALWAYS_INLINE DecimalVal DecimalOperators::CastToDecimalVal(
   DecimalVal dv;
   int precision = ctx->impl()->GetConstFnAttr(FunctionContextImpl::RETURN_TYPE_PRECISION);
   int scale = ctx->impl()->GetConstFnAttr(FunctionContextImpl::RETURN_TYPE_SCALE);
+  bool is_decimal_v2 = ctx->impl()->GetConstFnAttr(FunctionContextImpl::DECIMAL_V2);
   switch (ColumnType::GetDecimalByteSize(precision)) {
     case 4: {
       Decimal4Value dv4 = StringParser::StringToDecimal<int32_t>(
-          reinterpret_cast<char*>(val.ptr), val.len, precision, scale, &result);
+          reinterpret_cast<char*>(val.ptr), val.len, precision, scale,
+          is_decimal_v2, &result);
       dv = DecimalVal(dv4.value());
       break;
     }
     case 8: {
       Decimal8Value dv8 = StringParser::StringToDecimal<int64_t>(
-          reinterpret_cast<char*>(val.ptr), val.len, precision, scale, &result);
+          reinterpret_cast<char*>(val.ptr), val.len, precision, scale,
+          is_decimal_v2, &result);
       dv = DecimalVal(dv8.value());
       break;
     }
     case 16: {
       Decimal16Value dv16 = StringParser::StringToDecimal<int128_t>(
-          reinterpret_cast<char*>(val.ptr), val.len, precision, scale, &result);
+          reinterpret_cast<char*>(val.ptr), val.len, precision, scale,
+          is_decimal_v2, &result);
       dv = DecimalVal(dv16.value());
       break;
     }

http://git-wip-us.apache.org/repos/asf/impala/blob/a16fe803/be/src/exprs/expr-test.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/expr-test.cc b/be/src/exprs/expr-test.cc
index 35599c0..88d957d 100644
--- a/be/src/exprs/expr-test.cc
+++ b/be/src/exprs/expr-test.cc
@@ -494,15 +494,15 @@ class ExprTest : public testing::Test {
     switch (expected_type.GetByteSize()) {
       case 4:
         EXPECT_EQ(expected_result.value(), StringParser::StringToDecimal<int32_t>(
-            value.data(), value.size(), expected_type, &result).value()) << query;
+            value.data(), value.size(), expected_type, false, &result).value()) << query;
         break;
       case 8:
         EXPECT_EQ(expected_result.value(), StringParser::StringToDecimal<int64_t>(
-            value.data(), value.size(), expected_type, &result).value()) << query;
+            value.data(), value.size(), expected_type, false, &result).value()) << query;
         break;
       case 16:
         EXPECT_EQ(expected_result.value(), StringParser::StringToDecimal<int128_t>(
-            value.data(), value.size(), expected_type, &result).value()) << query;
+            value.data(), value.size(), expected_type, false, &result).value()) << query;
         break;
       default:
         EXPECT_TRUE(false) << expected_type << " " << expected_type.GetByteSize();

http://git-wip-us.apache.org/repos/asf/impala/blob/a16fe803/be/src/runtime/decimal-test.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/decimal-test.cc b/be/src/runtime/decimal-test.cc
index f84fdaf..25d2f65 100644
--- a/be/src/runtime/decimal-test.cc
+++ b/be/src/runtime/decimal-test.cc
@@ -45,11 +45,11 @@ void VerifyEquals(const DecimalValue<T>& t1, const DecimalValue<T>& t2) {
 }
 
 template <typename T>
-void VerifyParse(const string& s, int precision, int scale,
+void VerifyParse(const string& s, int precision, int scale, bool round,
     const DecimalValue<T>& expected_val, StringParser::ParseResult expected_result) {
   StringParser::ParseResult parse_result;
   DecimalValue<T> val = StringParser::StringToDecimal<T>(
-      s.c_str(), s.size(), precision, scale, &parse_result);
+      s.c_str(), s.size(), precision, scale, round, &parse_result);
   EXPECT_EQ(expected_result, parse_result) << "Failed test string: " << s;
   if (expected_result == StringParser::PARSE_SUCCESS ||
       expected_result == StringParser::PARSE_UNDERFLOW) {
@@ -57,6 +57,22 @@ void VerifyParse(const string& s, int precision, int scale,
   }
 }
 
+template <typename T>
+void VerifyParse(const string& s, int precision, int scale,
+    const DecimalValue<T>& expected_val, StringParser::ParseResult expected_result) {
+  VerifyParse(s, precision, scale, false, expected_val, expected_result);
+  VerifyParse(s, precision, scale, true, expected_val, expected_result);
+}
+
+template <typename T>
+void VerifyParse(const string& s, int precision, int scale,
+    const DecimalValue<T>& expected_val_v1, StringParser::ParseResult expected_result_v1,
+    const DecimalValue<T>& expected_val_v2, StringParser::ParseResult expected_result_v2)
+{
+  VerifyParse(s, precision, scale, false, expected_val_v1, expected_result_v1);
+  VerifyParse(s, precision, scale, true, expected_val_v2, expected_result_v2);
+}
+
 template<typename T>
 void VerifyToString(const T& decimal, int precision, int scale, const string& expected) {
   EXPECT_EQ(decimal.ToString(precision, scale), expected);
@@ -69,6 +85,17 @@ void StringToAllDecimals(const string& s, int precision, int scale, int32_t val,
   VerifyParse(s, precision, scale, Decimal16Value(val), result);
 }
 
+void StringToAllDecimals(const string& s, int precision, int scale,
+    int32_t val_v1, StringParser::ParseResult result_v1,
+    int32_t val_v2, StringParser::ParseResult result_v2) {
+  VerifyParse(s, precision, scale,
+      Decimal4Value(val_v1), result_v1, Decimal4Value(val_v2), result_v2);
+  VerifyParse(s, precision, scale,
+      Decimal8Value(val_v1), result_v1, Decimal8Value(val_v2), result_v2);
+  VerifyParse(s, precision, scale,
+      Decimal16Value(val_v1), result_v1, Decimal16Value(val_v2), result_v2);
+}
+
 TEST(IntToDecimal, Basic) {
   Decimal16Value d16;
   bool overflow = false;
@@ -271,13 +298,27 @@ TEST(StringToDecimal, Basic) {
   StringToAllDecimals(".1", 10, 2, 10, StringParser::PARSE_SUCCESS);
   StringToAllDecimals("00012.3", 10, 2, 1230, StringParser::PARSE_SUCCESS);
   StringToAllDecimals("-00012.3", 10, 2, -1230, StringParser::PARSE_SUCCESS);
+  StringToAllDecimals("0.00000", 6, 5, 0, StringParser::PARSE_SUCCESS);
+  StringToAllDecimals("1.00000", 6, 5, 100000, StringParser::PARSE_SUCCESS);
+  StringToAllDecimals("0.000000", 6, 5, 0, StringParser::PARSE_UNDERFLOW);
+  StringToAllDecimals("1.000000", 6, 5, 100000, StringParser::PARSE_UNDERFLOW);
+  StringToAllDecimals("1.000004", 6, 5, 100000, StringParser::PARSE_UNDERFLOW);
+  StringToAllDecimals("1.000005", 6, 5,
+      100000, StringParser::PARSE_UNDERFLOW,
+      100001, StringParser::PARSE_UNDERFLOW);
+  StringToAllDecimals("0.4", 5, 0, 0, StringParser::PARSE_UNDERFLOW);
+  StringToAllDecimals("0.5", 5, 0,
+      0, StringParser::PARSE_UNDERFLOW,
+      1, StringParser::PARSE_UNDERFLOW);
 
   StringToAllDecimals("123.45", 10, 2, 12345, StringParser::PARSE_SUCCESS);
   StringToAllDecimals(".45", 10, 2, 45, StringParser::PARSE_SUCCESS);
   StringToAllDecimals("-.45", 10, 2, -45, StringParser::PARSE_SUCCESS);
   StringToAllDecimals(" 123.4 ", 10, 5, 12340000, StringParser::PARSE_SUCCESS);
   StringToAllDecimals("-123.45", 10, 5, -12345000, StringParser::PARSE_SUCCESS);
-  StringToAllDecimals("-123.456", 10, 2, -12345, StringParser::PARSE_UNDERFLOW);
+  StringToAllDecimals("-123.456", 10, 2,
+      -12345, StringParser::PARSE_UNDERFLOW,
+      -12346, StringParser::PARSE_UNDERFLOW);
 
   StringToAllDecimals("e", 2, 0, 0, StringParser::PARSE_FAILURE);
   StringToAllDecimals("E", 2, 0, 0, StringParser::PARSE_FAILURE);
@@ -347,7 +388,8 @@ TEST(StringToDecimal, LargeDecimals) {
   VerifyParse("123456.78", 8, 3,
       Decimal8Value(12345678L), StringParser::PARSE_OVERFLOW);
   VerifyParse("1234.5678", 8, 3,
-      Decimal8Value(1234567L), StringParser::PARSE_UNDERFLOW);
+      Decimal8Value(1234567L), StringParser::PARSE_UNDERFLOW,
+      Decimal8Value(1234568L), StringParser::PARSE_UNDERFLOW);
   VerifyParse("12345.678", 8, 3,
       Decimal16Value(12345678L), StringParser::PARSE_SUCCESS);
   VerifyParse("-12345.678", 8, 3,
@@ -355,7 +397,8 @@ TEST(StringToDecimal, LargeDecimals) {
   VerifyParse("123456.78", 8, 3,
       Decimal16Value(12345678L), StringParser::PARSE_OVERFLOW);
   VerifyParse("1234.5678", 8, 3,
-      Decimal16Value(1234567L), StringParser::PARSE_UNDERFLOW);
+      Decimal16Value(1234567L), StringParser::PARSE_UNDERFLOW,
+      Decimal16Value(1234568L), StringParser::PARSE_UNDERFLOW);
 
   // Test max unscaled value for each of the decimal types.
   VerifyParse("999999999", 9, 0,
@@ -411,16 +454,68 @@ TEST(StringToDecimal, LargeDecimals) {
       38, 38, Decimal16Value(-result), StringParser::PARSE_SUCCESS);
   VerifyParse("-.99999999999999999999999999999999999999e1",
       38, 38, Decimal16Value(-result), StringParser::PARSE_OVERFLOW);
-  VerifyParse("-.999999999999999999999999999999999999990e-1",
-      38, 38, Decimal16Value(-result / 10), StringParser::PARSE_UNDERFLOW);
-  VerifyParse("-.999999999999999999999999999999999999990000000000000000e-20",
-      38, 38,
+  VerifyParse("-.999999999999999999999999999999999999990e-1", 38, 38,
+      Decimal16Value(-result / 10), StringParser::PARSE_UNDERFLOW,
+      Decimal16Value(-result / 10 - 1), StringParser::PARSE_UNDERFLOW);
+  VerifyParse("-.999999999999999999999999999999999999990000000000000000e-20", 38, 38,
       Decimal16Value(-result / DecimalUtil::GetScaleMultiplier<int128_t>(20)),
+      StringParser::PARSE_UNDERFLOW,
+      Decimal16Value(-result / DecimalUtil::GetScaleMultiplier<int128_t>(20) - 1),
       StringParser::PARSE_UNDERFLOW);
   VerifyParse("100000000000000000000000000000000000000",
       38, 0, Decimal16Value(0), StringParser::PARSE_OVERFLOW);
   VerifyParse("-100000000000000000000000000000000000000",
       38, 0, Decimal16Value(0), StringParser::PARSE_OVERFLOW);
+
+  // Rounding tests.
+  VerifyParse("555.554", 5, 2, Decimal4Value(55555), StringParser::PARSE_UNDERFLOW);
+  VerifyParse("555.555", 5, 2,
+      Decimal4Value(55555), StringParser::PARSE_UNDERFLOW,
+      Decimal4Value(55556), StringParser::PARSE_UNDERFLOW);
+  // Too many digits to the left of the dot so we overflow.
+  VerifyParse("555.555e1", 5, 2, Decimal4Value(0), StringParser::PARSE_OVERFLOW);
+  VerifyParse("-555.555e1", 5, 2, Decimal4Value(0), StringParser::PARSE_OVERFLOW);
+  VerifyParse("5555.555", 5, 2, Decimal16Value(0), StringParser::PARSE_OVERFLOW);
+  VerifyParse("-555.555", 5, 2,
+        Decimal4Value(-55555), StringParser::PARSE_UNDERFLOW,
+        Decimal4Value(-55556), StringParser::PARSE_UNDERFLOW);
+  VerifyParse("-5555.555", 5, 2, Decimal16Value(0), StringParser::PARSE_OVERFLOW);
+  VerifyParse("5555.555e-1", 5, 2,
+          Decimal4Value(55555), StringParser::PARSE_UNDERFLOW,
+          Decimal4Value(55556), StringParser::PARSE_UNDERFLOW);
+  VerifyParse("55555.555e-1", 5, 2, Decimal16Value(0), StringParser::PARSE_OVERFLOW);
+  // Too many digits to the right of the dot and not enough to the left. Rounding via
+  // ScaleDownAndRound().
+  VerifyParse("5.55444", 5, 2, Decimal4Value(555), StringParser::PARSE_UNDERFLOW);
+  VerifyParse("5.55555", 5, 2,
+        Decimal4Value(555), StringParser::PARSE_UNDERFLOW,
+        Decimal4Value(556), StringParser::PARSE_UNDERFLOW);
+  VerifyParse("5.555e-9", 5, 2, Decimal4Value(0), StringParser::PARSE_UNDERFLOW);
+  // The number of digits to the left of the dot equals to precision - scale. Rounding
+  // by adding 1 if the first truncated digit is greater or equal to 5.
+  VerifyParse("555.554", 5, 2, Decimal4Value(55555), StringParser::PARSE_UNDERFLOW);
+  VerifyParse("555.555", 5, 2,
+      Decimal4Value(55555), StringParser::PARSE_UNDERFLOW,
+      Decimal4Value(55556), StringParser::PARSE_UNDERFLOW);
+  VerifyParse("5.55554e2", 5, 2, Decimal4Value(55555), StringParser::PARSE_UNDERFLOW);
+  VerifyParse("5.55555e2", 5, 2,
+      Decimal4Value(55555), StringParser::PARSE_UNDERFLOW,
+      Decimal4Value(55556), StringParser::PARSE_UNDERFLOW);
+  VerifyParse("55555.4e-2", 5, 2, Decimal4Value(55555), StringParser::PARSE_UNDERFLOW);
+  VerifyParse("55555.5e-2", 5, 2,
+          Decimal4Value(55555), StringParser::PARSE_UNDERFLOW,
+          Decimal4Value(55556), StringParser::PARSE_UNDERFLOW);
+  // Rounding causes overflow.
+  VerifyParse("999.994", 5, 2, Decimal4Value(99999), StringParser::PARSE_UNDERFLOW);
+  VerifyParse("999.995", 5, 2,
+        Decimal4Value(99999), StringParser::PARSE_UNDERFLOW,
+        Decimal4Value(0), StringParser::PARSE_OVERFLOW);
+  VerifyParse("9.99995e2", 5, 2,
+          Decimal4Value(99999), StringParser::PARSE_UNDERFLOW,
+          Decimal4Value(0), StringParser::PARSE_OVERFLOW);
+  VerifyParse("99999.5e-2", 5, 2,
+            Decimal4Value(99999), StringParser::PARSE_UNDERFLOW,
+            Decimal4Value(0), StringParser::PARSE_OVERFLOW);
 }
 
 TEST(DecimalTest, Overflow) {
@@ -780,7 +875,7 @@ TEST(DecimalArithmetic, DivideLargeScales) {
   StringParser::ParseResult result;
   const char* data = "319391280635.61476055";
   Decimal16Value x =
-      StringParser::StringToDecimal<int128_t>(data, strlen(data), t1, &result);
+      StringParser::StringToDecimal<int128_t>(data, strlen(data), t1, false, &result);
   Decimal16Value y(10000);
   bool is_nan = false;
   bool is_overflow = false;

http://git-wip-us.apache.org/repos/asf/impala/blob/a16fe803/be/src/runtime/decimal-value.inline.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/decimal-value.inline.h b/be/src/runtime/decimal-value.inline.h
index 8f76335..2097ac8 100644
--- a/be/src/runtime/decimal-value.inline.h
+++ b/be/src/runtime/decimal-value.inline.h
@@ -140,27 +140,6 @@ inline DecimalValue<T> DecimalValue<T>::ScaleTo(int src_scale, int dst_scale,
 
 namespace detail {
 
-// Helper function to scale down over multiplied values back into result type,
-// truncating if round is false or rounding otherwise.
-template<typename T, typename RESULT_T>
-inline RESULT_T ScaleDownAndRound(RESULT_T value, int delta_scale, bool round) {
-  DCHECK_GT(delta_scale, 0);
-  // Multiplier can always be computed in potentially smaller type T
-  T multiplier = DecimalUtil::GetScaleMultiplier<T>(delta_scale);
-  DCHECK(multiplier > 1 && multiplier % 2 == 0);
-  RESULT_T result = value / multiplier;
-  if (round) {
-    RESULT_T remainder = value % multiplier;
-    // In general, shifting down the multiplier is not safe, but we know
-    // here that it is a multiple of two.
-    if (abs(remainder) >= (multiplier >> 1)) {
-      // Bias at zero must be corrected by sign of dividend.
-      result += BitUtil::Sign(value);
-    }
-  }
-  return result;
-}
-
 // If we have a number with 'num_lz' leading zeros, and we scale it up by 10^scale_diff,
 // this function returns the minimum number of leading zeros the result would have.
 inline int MinLeadingZerosAfterScaling(int num_lz, int scale_diff) {
@@ -232,7 +211,7 @@ inline int128_t AddLarge(int128_t x, int x_scale, int128_t y, int y_scale,
     right = x_right + y_right;
   }
   if (result_scale_decrease > 0) {
-    right = detail::ScaleDownAndRound<int128_t, int128_t>(
+    right = DecimalUtil::ScaleDownAndRound<int128_t>(
         right, result_scale_decrease, round);
   }
   DCHECK(right >= 0);
@@ -288,7 +267,7 @@ inline int128_t SubtractLarge(int128_t x, int x_scale, int128_t y, int y_scale,
   if (result_scale_decrease > 0) {
     // At this point, the scale of the fractional part is either x_scale or y_scale,
     // whichever is greater. We scale down the fractional part to result_scale here.
-    right = detail::ScaleDownAndRound<int128_t, int128_t>(
+    right = DecimalUtil::ScaleDownAndRound<int128_t>(
         right, result_scale_decrease, round);
   }
 
@@ -351,7 +330,7 @@ inline DecimalValue<RESULT_T> DecimalValue<T>::Add(int this_scale,
     if (result_scale_decrease > 0) {
       // After first adjusting x and y to the same scale and adding them together, we now
       // need scale down the result to result_scale.
-      x = detail::ScaleDownAndRound<T, RESULT_T>(x, result_scale_decrease, round);
+      x = DecimalUtil::ScaleDownAndRound<RESULT_T>(x, result_scale_decrease, round);
     }
     return DecimalValue<RESULT_T>(x);
   }
@@ -418,7 +397,7 @@ DecimalValue<RESULT_T> DecimalValue<T>::Multiply(int this_scale,
       DCHECK(*overflow);
     } else {
       int256_t intermediate_result = ConvertToInt256(x) * ConvertToInt256(y);
-      intermediate_result = detail::ScaleDownAndRound<int256_t, int256_t>(
+      intermediate_result = DecimalUtil::ScaleDownAndRound<int256_t>(
           intermediate_result, delta_scale, round);
       result = ConvertToInt128(
           intermediate_result, DecimalUtil::MAX_UNSCALED_DECIMAL16, overflow);
@@ -436,7 +415,7 @@ DecimalValue<RESULT_T> DecimalValue<T>::Multiply(int this_scale,
       result = x * y;
       // The largest value that result can have here is (2^64 - 1) * (2^63 - 1), which is
       // greater than MAX_UNSCALED_DECIMAL16.
-      result = detail::ScaleDownAndRound<T, RESULT_T>(result, delta_scale, round);
+      result = DecimalUtil::ScaleDownAndRound<RESULT_T>(result, delta_scale, round);
       // Since delta_scale is greater than zero, result can now be at most
       // ((2^64 - 1) * (2^63 - 1)) / 10, which is less than MAX_UNSCALED_DECIMAL16, so
       // there is no need to check for overflow.

http://git-wip-us.apache.org/repos/asf/impala/blob/a16fe803/be/src/util/decimal-util.h
----------------------------------------------------------------------
diff --git a/be/src/util/decimal-util.h b/be/src/util/decimal-util.h
index 288caf3..c87b2c5 100644
--- a/be/src/util/decimal-util.h
+++ b/be/src/util/decimal-util.h
@@ -69,6 +69,31 @@ class DecimalUtil {
     return result;
   }
 
+  /// Helper function to scale down values by 10^delta_scale, truncating if
+  /// round is false or rounding otherwise.
+  template<typename T>
+  static inline T ScaleDownAndRound(T value, int delta_scale, bool round) {
+    DCHECK_GT(delta_scale, 0);
+    T divisor = DecimalUtil::GetScaleMultiplier<T>(delta_scale);
+    if (divisor > 0) {
+      DCHECK(divisor % 2 == 0);
+      T result = value / divisor;
+      if (round) {
+        T remainder = value % divisor;
+        // In general, shifting down the multiplier is not safe, but we know
+        // here that it is a multiple of two.
+        if (abs(remainder) >= (divisor >> 1)) {
+          // Bias at zero must be corrected by sign of dividend.
+          result += BitUtil::Sign(value);
+        }
+      }
+      return result;
+    } else {
+      DCHECK(divisor == -1);
+      return 0;
+    }
+  }
+
   /// Write decimals as big endian (byte comparable) in fixed_len_size bytes.
   template<typename T>
   static inline void EncodeToFixedLenByteArray(

http://git-wip-us.apache.org/repos/asf/impala/blob/a16fe803/be/src/util/string-parser.cc
----------------------------------------------------------------------
diff --git a/be/src/util/string-parser.cc b/be/src/util/string-parser.cc
index f59c6b88..1f03de1 100644
--- a/be/src/util/string-parser.cc
+++ b/be/src/util/string-parser.cc
@@ -21,20 +21,20 @@ namespace impala {
 
 using ParseResult = StringParser::ParseResult;
 Decimal4Value StringToDecimal4(const char* s, int len, int type_precision,
-    int type_scale, StringParser::ParseResult* result) {
+    int type_scale, bool round, StringParser::ParseResult* result) {
   return StringParser::StringToDecimal<int32_t>(s, len, type_precision,
-      type_scale, result);
+      type_scale, round, result);
 }
 
 Decimal8Value StringToDecimal8(const char* s, int len, int type_precision,
-    int type_scale, StringParser::ParseResult* result) {
+    int type_scale, bool round, StringParser::ParseResult* result) {
   return StringParser::StringToDecimal<int64_t>(s, len, type_precision,
-      type_scale, result);
+      type_scale, round, result);
 }
 
 Decimal16Value StringToDecimal16(const char* s, int len, int type_precision,
-    int type_scale, StringParser::ParseResult* result) {
+    int type_scale, bool round, StringParser::ParseResult* result) {
   return StringParser::StringToDecimal<int128_t>(s, len, type_precision,
-      type_scale, result);
+      type_scale, round, result);
 }
 }

http://git-wip-us.apache.org/repos/asf/impala/blob/a16fe803/be/src/util/string-parser.h
----------------------------------------------------------------------
diff --git a/be/src/util/string-parser.h b/be/src/util/string-parser.h
index 10c3fc7..606e5b8 100644
--- a/be/src/util/string-parser.h
+++ b/be/src/util/string-parser.h
@@ -112,13 +112,13 @@ class StringParser {
   /// On underflow, the truncated value is returned.
   template <typename T>
   static inline DecimalValue<T> StringToDecimal(const char* s, int len,
-      const ColumnType& type, StringParser::ParseResult* result) {
-    return StringToDecimal<T>(s, len, type.precision, type.scale, result);
+      const ColumnType& type, bool round, StringParser::ParseResult* result) {
+    return StringToDecimal<T>(s, len, type.precision, type.scale, round, result);
   }
 
   template <typename T>
   static inline DecimalValue<T> StringToDecimal(const char* s, int len,
-      int type_precision, int type_scale, StringParser::ParseResult* result) {
+      int type_precision, int type_scale, bool round, StringParser::ParseResult* result) {
     // Special cases:
     //   1) '' == Fail, an empty string fails to parse.
     //   2) '   #   ' == #, leading and trailing white space is ignored.
@@ -156,7 +156,7 @@ class StringParser {
     // Ignore leading zeros even after a dot. This allows for differentiating between
     // cases like 0.01e2, which would fit in a DECIMAL(1, 0), and 0.10e2, which would
     // overflow.
-    int scale = 0;
+    int digits_after_dot_count = 0;
     int found_dot = 0;
     if (len > 0 && *s == '.') {
       found_dot = 1;
@@ -164,30 +164,36 @@ class StringParser {
       --len;
       while (len > 0 && UNLIKELY(*s == '0')) {
         found_value = true;
-        ++scale;
+        ++digits_after_dot_count;
         ++s;
         --len;
       }
     }
 
-    int precision = 0;
+    int total_digits_count = 0;
     bool found_exponent = false;
     int8_t exponent = 0;
+    int first_truncated_digit = 0;
     T value = 0;
     for (int i = 0; i < len; ++i) {
-      const char& c = s[i];
+      const char c = s[i];
       if (LIKELY('0' <= c && c <= '9')) {
         found_value = true;
         // Ignore digits once the type's precision limit is reached. This avoids
         // overflowing the underlying storage while handling a string like
         // 10000000000e-10 into a DECIMAL(1, 0). Adjustments for ignored digits and
         // an exponent will be made later.
-        if (LIKELY(type_precision > precision)) {
-          value = (value * 10) + (c - '0');  // Benchmarks are faster with parenthesis...
+        if (LIKELY(total_digits_count < type_precision)) {
+          // Benchmarks are faster with parenthesis.
+          T new_value = (value * 10) + (c - '0');
+          DCHECK(new_value >= value);
+          value = new_value;
+        } else if (UNLIKELY(round && total_digits_count == type_precision)) {
+          first_truncated_digit = c - '0';
         }
-        DCHECK(value >= 0);  // For some reason DCHECK_GE doesn't work with int128_t.
-        ++precision;
-        scale += found_dot;
+        DCHECK(value >= 0); // DCHECK_GE does not work with int128_t
+        ++total_digits_count;
+        digits_after_dot_count += found_dot;
       } else if (c == '.' && LIKELY(!found_dot)) {
         found_dot = 1;
       } else if ((c == 'e' || c == 'E') && LIKELY(!found_exponent)) {
@@ -207,48 +213,56 @@ class StringParser {
     }
 
     // Find the number of truncated digits before adjusting the precision for an exponent.
-    int truncated_digit_count = precision - type_precision;
-    if (exponent > scale) {
-      // Ex: 0.1e3 (which at this point would have precision == 1 and scale == 1), the
-      //     scale must be set to 0 and the value set to 100 which means a precision of 3.
-      precision += exponent - scale;
-      value *= DecimalUtil::GetScaleMultiplier<T>(exponent - scale);
-      scale = 0;
-    } else {
-      // Ex: 100e-4, the scale must be set to 4 but no adjustment to the value is needed,
-      //     the precision must also be set to 4 but that will be done below for the
-      //     non-exponent case anyways.
-      scale -= exponent;
-    }
-    // Ex: 0.001, at this point would have precision 1 and scale 3 since leading zeros
-    //     were ignored during previous parsing.
-    if (scale > precision) precision = scale;
+    int truncated_digit_count = std::max(total_digits_count - type_precision, 0);
+    // 'scale' and 'precision' refer to the scale and precision of the number that
+    // is contained the string that we are parsing. The scale of 'value' may be
+    // different because some digits may have been truncated.
+    int scale, precision;
+    ApplyExponent(total_digits_count, digits_after_dot_count,
+        exponent, &value, &precision, &scale);
 
     // Microbenchmarks show that beyond this point, returning on parse failure is slower
     // than just letting the function run out.
     *result = StringParser::PARSE_SUCCESS;
     if (UNLIKELY(precision - scale > type_precision - type_scale)) {
+      // The number in the string has too many digits to the left of the dot,
+      // so we overflow.
       *result = StringParser::PARSE_OVERFLOW;
     } else if (UNLIKELY(scale > type_scale)) {
+      // There are too many digits to the right of the dot in the string we are parsing.
       *result = StringParser::PARSE_UNDERFLOW;
-      int shift = scale - type_scale;
-      if (UNLIKELY(truncated_digit_count > 0)) shift -= truncated_digit_count;
+      // The scale of 'value'.
+      int value_scale = scale - truncated_digit_count;
+      int shift = value_scale - type_scale;
       if (shift > 0) {
-        T divisor = DecimalUtil::GetScaleMultiplier<T>(shift);
-        if (LIKELY(divisor >= 0)) {
-          value /= divisor;
-        } else {
-          DCHECK(divisor == -1); // DCHECK_EQ doesn't work with int128_t.
-          value = 0;
+        // There are less than maximum number of digits to the left of the dot.
+        value = DecimalUtil::ScaleDownAndRound<T>(value, shift, round);
+        DCHECK(value >= 0);
+        DCHECK(value < DecimalUtil::GetScaleMultiplier<int128_t>(type_precision));
+      } else {
+        // There are a maximum number of digits to the left of the dot. We round by
+        // looking at the first truncated digit.
+        DCHECK_EQ(shift, 0);
+        DCHECK(0 <= first_truncated_digit && first_truncated_digit <= 9);
+        DCHECK(first_truncated_digit == 0 || truncated_digit_count != 0);
+        DCHECK(first_truncated_digit == 0 || round);
+        // Apply the rounding.
+        value += (first_truncated_digit >= 5);
+        DCHECK(value >= 0);
+        DCHECK(value <= DecimalUtil::GetScaleMultiplier<int128_t>(type_precision));
+        if (UNLIKELY(value == DecimalUtil::GetScaleMultiplier<T>(type_precision))) {
+          // Overflow due to rounding.
+          *result = StringParser::PARSE_OVERFLOW;
         }
       }
-      DCHECK(value >= 0); // DCHECK_GE doesn't work with int128_t.
     } else if (UNLIKELY(!found_value && !found_dot)) {
       *result = StringParser::PARSE_FAILURE;
-    }
-
-    if (type_scale > scale) {
+    } else if (type_scale > scale) {
+      // There were not enough digits after the dot, so we have scale up the value.
+      DCHECK_EQ(truncated_digit_count, 0);
       value *= DecimalUtil::GetScaleMultiplier<T>(type_scale - scale);
+      // Overflow should be impossible.
+      DCHECK(value < DecimalUtil::GetScaleMultiplier<int128_t>(type_precision));
     }
 
     return DecimalValue<T>(is_negative ? -value : value);
@@ -372,6 +386,30 @@ class StringParser {
     return static_cast<T>(negative ? -val : val);
   }
 
+  /// Helper function that applies the exponent to the value. Also computes and
+  /// sets the precision and scale.
+  template <typename T>
+  static inline void ApplyExponent(int total_digits_count,
+      int digits_after_dot_count,int8_t exponent, T* value, int* precision, int* scale) {
+    if (exponent > digits_after_dot_count) {
+      // Ex: 0.1e3 (which at this point would have precision == 1 and scale == 1), the
+      //     scale must be set to 0 and the value set to 100 which means a precision of 3.
+      *value *= DecimalUtil::GetScaleMultiplier<T>(exponent - digits_after_dot_count);
+      *precision = total_digits_count + (exponent - digits_after_dot_count);
+      *scale = 0;
+    } else {
+      // Ex: 100e-4, the scale must be set to 4 but no adjustment to the value is needed,
+      //     the precision must also be set to 4 but that will be done below for the
+      //     non-exponent case anyways.
+      *precision = total_digits_count;
+      *scale = digits_after_dot_count - exponent;
+      // Ex: 0.001, at this point would have precision 1 and scale 3 since leading zeros
+      //     were ignored during previous parsing.
+      if (*precision < *scale) *precision = *scale;
+    }
+    DCHECK_GE(*precision, *scale);
+  }
+
   /// Checks if "inf" or "infinity" matches 's' in a case-insensitive manner. The match
   /// has to start at the beginning of 's', leading whitespace is considered invalid.
   /// Trailing whitespace characters are allowed.
@@ -580,7 +618,7 @@ class StringParser {
     return val;
   }
 
-  static inline bool IsWhitespace(const char& c) {
+  static inline bool IsWhitespace(const char c) {
     return c == ' ' || UNLIKELY(c == '\t' || c == '\n' || c == '\v' || c == '\f'
         || c == '\r');
   }
@@ -600,13 +638,13 @@ inline int StringParser::StringParseTraits<int64_t>::max_ascii_len() { return 19
 
 // These functions are too large to benefit from inlining.
 Decimal4Value StringToDecimal4(const char* s, int len, int type_precision,
-    int type_scale, StringParser::ParseResult* result);
+    int type_scale, bool round, StringParser::ParseResult* result);
 
 Decimal8Value StringToDecimal8(const char* s, int len, int type_precision,
-    int type_scale, StringParser::ParseResult* result);
+    int type_scale, bool round, StringParser::ParseResult* result);
 
 Decimal16Value StringToDecimal16(const char* s, int len, int type_precision,
-    int type_scale, StringParser::ParseResult* result);
+    int type_scale, bool round, StringParser::ParseResult* result);
 
 }
 #endif

http://git-wip-us.apache.org/repos/asf/impala/blob/a16fe803/tests/query_test/test_decimal_casting.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_decimal_casting.py b/tests/query_test/test_decimal_casting.py
index ce8037d..445fe6b 100644
--- a/tests/query_test/test_decimal_casting.py
+++ b/tests/query_test/test_decimal_casting.py
@@ -159,7 +159,7 @@ class TestDecimalCasting(ImpalaTestSuite):
           .format(val, precision, scale)
       res = Decimal(self.execute_scalar(cast, vector.get_value('exec_option')))
       # TODO: Remove check for cast_from once string to decimal is supported in decimal_v2.
-      if is_decimal_v2 and cast_from != 'string':
+      if is_decimal_v2:
         expected_val = val.quantize(Decimal('0e-%s' % scale), rounding=ROUND_HALF_UP)
       else:
         expected_val = val.quantize(Decimal('0e-%s' % scale), rounding=ROUND_DOWN)

[4/5] impala git commit: Move symlinked auxiliary tests/* to tests/functional/*

Posted by ta...@apache.org.

Move symlinked auxiliary tests/* to tests/functional/*

The layout of the Impala-auxiliary-tests/tests directory is
changing to allow for different kinds of tests to be saved
there. But just in case the new functional sub-directory
does not exist, preserve backwards compatibility with the
older layout.

Change-Id: Ifb2bbbebc38bbaf3d6a4ad01fa8dd918b7d99b3b
Reviewed-on: http://gerrit.cloudera.org:8080/8896
Reviewed-by: David Knupp <dk...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/4ce23f72
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/4ce23f72
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/4ce23f72

Branch: refs/heads/master
Commit: 4ce23f72bae9ca1ccfea5a8b9498aa115c65eecd
Parents: b581a9d
Author: David Knupp <dk...@cloudera.com>
Authored: Tue Dec 19 12:52:21 2017 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Dec 22 04:48:53 2017 +0000

----------------------------------------------------------------------
 bin/create-test-configuration.sh | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/4ce23f72/bin/create-test-configuration.sh
----------------------------------------------------------------------
diff --git a/bin/create-test-configuration.sh b/bin/create-test-configuration.sh
index 2f981aa..97e05a3 100755
--- a/bin/create-test-configuration.sh
+++ b/bin/create-test-configuration.sh
@@ -191,4 +191,11 @@ function symlink_subdirs {
 echo "Searching for auxiliary tests, workloads, and datasets (if any exist)."
 symlink_subdirs ${IMPALA_AUX_WORKLOAD_DIR} ${IMPALA_WORKLOAD_DIR}
 symlink_subdirs ${IMPALA_AUX_DATASET_DIR} ${IMPALA_DATASET_DIR}
-symlink_subdirs ${IMPALA_AUX_TEST_HOME}/tests ${IMPALA_HOME}/tests
+
+if [ -d ${IMPALA_AUX_TEST_HOME}/tests/functional ]; then
+  symlink_subdirs ${IMPALA_AUX_TEST_HOME}/tests/functional ${IMPALA_HOME}/tests
+else
+  # For compatibility with older auxiliary tests, which aren't in the
+  # functional subdirectory.
+  symlink_subdirs ${IMPALA_AUX_TEST_HOME}/tests ${IMPALA_HOME}/tests
+fi