You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2018/11/27 00:57:02 UTC

[1/3] impala git commit: IMPALA-7233: [DOCS] Support for IANA timezone database

Repository: impala
Updated Branches:
  refs/heads/master 622e19c5f -> 12dc29e5e


IMPALA-7233: [DOCS] Support for IANA timezone database

- Updated the timezone section
- Added the sections on customizing timezone db and aliases

Change-Id: Id400cda5a1be321063d17e0ee6337e92a5da732a
Reviewed-on: http://gerrit.cloudera.org:8080/11946
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/e421223c
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/e421223c
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/e421223c

Branch: refs/heads/master
Commit: e421223c5d4ef4a7536a223779834064d74df75d
Parents: 622e19c
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Fri Nov 16 13:21:22 2018 -0800
Committer: Alex Rodoni <ar...@cloudera.com>
Committed: Sun Nov 25 04:28:38 2018 +0000

----------------------------------------------------------------------
 docs/impala.ditamap                     |   4 +-
 docs/topics/impala_custom_timezones.xml | 181 +++++++++++
 docs/topics/impala_timestamp.xml        | 452 +++++++++------------------
 3 files changed, 340 insertions(+), 297 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/e421223c/docs/impala.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index e4c35a7..9b58786 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -110,7 +110,9 @@ under the License.
       <topicref href="topics/impala_smallint.xml"/>
       <topicref href="topics/impala_string.xml"/>
       <topicref href="topics/impala_struct.xml"/>
-      <topicref href="topics/impala_timestamp.xml"/>
+      <topicref href="topics/impala_timestamp.xml">
+        <topicref href="topics/impala_custom_timezones.xml"/>
+      </topicref>
       <topicref href="topics/impala_tinyint.xml"/>
       <topicref href="topics/impala_varchar.xml"/>
       <topicref href="topics/impala_complex_types.xml"/>

http://git-wip-us.apache.org/repos/asf/impala/blob/e421223c/docs/topics/impala_custom_timezones.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_custom_timezones.xml b/docs/topics/impala_custom_timezones.xml
new file mode 100644
index 0000000..be651e9
--- /dev/null
+++ b/docs/topics/impala_custom_timezones.xml
@@ -0,0 +1,181 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="custom_timezone">
+
+  <title>Customizing Time Zones</title>
+
+  <titlealts audience="PDF">
+
+    <navtitle>Customizing Time Zones</navtitle>
+
+  </titlealts>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Dates and Times"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Starting in <keyword keyref="impala31">Impala 3.1</keyword>, you can customize the time
+      zone definitions used in Impala.
+      <ul>
+        <li>
+          <p>
+            By default, Impala uses the OS’s time zone database located in
+            <codeph>/usr/share/zoneinfo</codeph>. This directory contains the IANA timezone
+            database in a compiled binary format. The contents of the <codeph>zoneinfo</codeph>
+            directory is controlled by the OS’s package manager.
+          </p>
+        </li>
+
+        <li>
+          <p >
+            New startup flags have been introduced:
+          </p>
+          <ul>
+            <li >
+              <codeph>--hdfs_zone_info_zip</codeph>: This flag allows Impala administrators to
+              specify a custom timezone database. The flag should be set to a shared (not
+              necessarily HDFS) path that points to a zip archive of a custom IANA timezone
+              database. The timezone database is expected to be in a compiled binary format. If
+              the startup flag is set, Impala will use the specified timezone database instead
+              of the default <codeph>/usr/share/zoneinfo </codeph>database. The timezone db
+              upgrade process is described in detail below.
+            </li>
+
+            <li >
+              <p >
+                <codeph>--hdfs_zone_alias_conf</codeph>: This flag allows Impala administrators
+                to specify definitions for custom timezone aliases. The flag should be set to a
+                shared (not necessarily HDFS) path that specifies a config file containing
+                custom timezone alias definitions. This config file can be used as a workaround
+                for users who want to keep using their legacy timezone names. Configuring custom
+                aliases is described in detail below.
+              </p>
+            </li>
+          </ul>
+        </li>
+      </ul>
+    </p>
+
+    <p>
+      <b>Upgrading custom IANA time zone database:</b>
+      <ol>
+        <li >
+          Download latest IANA time zone database distribution:
+<codeblock>git clone https://github.com/eggert/tz</codeblock>
+          <p >
+            Alternatively, download a specific tzdb version from:
+<codeblock> https://www.iana.org/time-zones/repository</codeblock>
+          </p>
+        </li>
+
+        <li >
+          Build timezone tools:
+<codeblock>cd tz
+make TOPDIR=tzdata install</codeblock>
+        </li>
+
+        <li >
+          Generate the compiled binary time zone database:
+<codeblock>./zic -d ./tzdata/etc/zoneinfo africa antarctica asia australasia backward backzone etcetera europe factory northamerica pacificnew southamerica systemv</codeblock>
+        </li>
+
+        <li >
+          Create zip archive:
+<codeblock>pushd ./tzdata/etc
+zip -r zoneinfo.zip zoneinfo
+popd</codeblock>
+        </li>
+
+        <li >
+          Copy the time zone database to HDFS:
+<codeblock>hdfs dfs -mkdir -p /tzdb/latest
+hdfs dfs -copyFromLocal ./tzdata/etc/zoneinfo.zip /tzdb/latest</codeblock>
+        </li>
+
+        <li >
+          Set the <codeph>--hdfs_zone_info_zip</codeph> startup flag to
+          <codeph>/tzdb/latest/zoneinfo.zip</codeph> as an <codeph>impalad </codeph>safety
+          valve.
+        </li>
+
+        <li >
+          Perform a full restart of Impala service.
+        </li>
+      </ol>
+    </p>
+
+    <p>
+      <b>Configuring custom time zone aliases:</b>
+    </p>
+
+    <p>
+      <ol>
+        <li >
+          Create a <codeph>tzalias.conf</codeph> config file that contains time zone alias
+          definitions formatted as <codeph><i>ALIAS</i></codeph><codeph> =
+          </codeph><codeph><i>DEFINITION</i></codeph>. For example:
+<codeblock>#
+# Define aliases for existing timezone names:
+#
+Universal Coordinated Time = UTC
+Mideast/Riyadh89 = Asia/Riyadh
+PDT = America/Los_Angeles
+#
+# Define aliases as UTC offsets in seconds:
+#
+GMT-01:00 = 3600
+GMT+01:00 = -3600</codeblock>
+        </li>
+
+        <li >
+          Copy the config file to HDFS:
+<codeblock>hdfs dfs -mkdir -p /tzdb
+hdfs dfs -copyFromLocal tzalias.conf /tzdb</codeblock>
+        </li>
+
+        <li >
+          Set the <codeph>--hdfs_zone_alias_conf</codeph> startup flag to
+          <codeph>/tzdb/tzalias.conf</codeph> as an <codeph>impalad </codeph>safety valve.
+        </li>
+
+        <li >
+          Perform a full restart of Impala service.
+        </li>
+      </ol>
+    </p>
+
+    <p>
+      <b>Added in:</b> <keyword keyref="impala31"/>
+    </p>
+
+  </conbody>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/impala/blob/e421223c/docs/topics/impala_timestamp.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_timestamp.xml b/docs/topics/impala_timestamp.xml
index d032e33..15dca34 100644
--- a/docs/topics/impala_timestamp.xml
+++ b/docs/topics/impala_timestamp.xml
@@ -42,339 +42,196 @@ under the License.
   <conbody>
 
     <p>
-      A data type used in <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph>
-      statements, representing a point in time.
+      The <codeph>TIMESTAMP</codeph> data type holds a value that represents a point in time.
     </p>
 
-    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
-
     <p>
-      In the column definition of a <codeph>CREATE TABLE</codeph> statement:
+      Internally, the resolution of the time portion of a <codeph>TIMESTAMP</codeph> value is in
+      nanoseconds.
     </p>
 
-<codeblock><varname>column_name</varname> TIMESTAMP</codeblock>
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
 
     <p>
-      <b>Range:</b> Allowed date values range from 1400-01-01 to 9999-12-31; this range is
-      different from the Hive <codeph>TIMESTAMP</codeph> type. Internally, the resolution of the
-      time portion of a <codeph>TIMESTAMP</codeph> value is in nanoseconds.
+      In the column definition of a <codeph>CREATE TABLE</codeph> statement:
     </p>
 
-    <p>
-      <b>INTERVAL expressions:</b>
-    </p>
+<codeblock><varname>column_name</varname> TIMESTAMP
 
-    <p>
-      You can perform date arithmetic by adding or subtracting a specified number of time units,
-      using the <codeph>INTERVAL</codeph> keyword and the <codeph>+</codeph> and
-      <codeph>-</codeph> operators or <codeph>date_add()</codeph> and
-      <codeph>date_sub()</codeph> functions. You can specify units as <codeph>YEAR[S]</codeph>,
-      <codeph>MONTH[S]</codeph>, <codeph>WEEK[S]</codeph>, <codeph>DAY[S]</codeph>,
-      <codeph>HOUR[S]</codeph>, <codeph>MINUTE[S]</codeph>, <codeph>SECOND[S]</codeph>,
-      <codeph>MILLISECOND[S]</codeph>, <codeph>MICROSECOND[S]</codeph>, and
-      <codeph>NANOSECOND[S]</codeph>. You can only specify one time unit in each interval
-      expression, for example <codeph>INTERVAL 3 DAYS</codeph> or <codeph>INTERVAL 25
-      HOURS</codeph>, but you can produce any granularity by adding together successive
-      <codeph>INTERVAL</codeph> values, such as <codeph><varname>timestamp_value</varname> +
-      INTERVAL 3 WEEKS - INTERVAL 1 DAY + INTERVAL 10 MICROSECONDS</codeph>.
-    </p>
+<varname>timestamp</varname> [+ | -] INTERVAL <varname>interval</varname>
+DATE_ADD (<varname>timestamp</varname>, INTERVAL <varname>interval</varname> <varname>time_unit</varname>)</codeblock>
 
     <p>
-      For example:
+      <b>Range:</b> 1400-01-01 to 9999-12-31
     </p>
 
-<codeblock>select now() + interval 1 day;
-select date_sub(now(), interval 5 minutes);
-insert into auction_details
-  select auction_id, auction_start_time, auction_start_time + interval 2 days + interval 12 hours
-  from new_auctions;</codeblock>
-
     <p>
-      <b>Time zones:</b>
+      Out of range <codeph>TIMESTAMP</codeph> values are converted to NULL.
     </p>
 
     <p>
-      By default, Impala does not store timestamps using the local timezone, to avoid undesired
-      results from unexpected time zone issues. Timestamps are stored and interpreted relative
-      to UTC, both when written to or read from data files, or when converted to or from Unix
-      time values through functions such as <codeph>from_unixtime()</codeph> or
-      <codeph>unix_timestamp()</codeph>. To convert such a <codeph>TIMESTAMP</codeph> value to
-      one that represents the date and time in a specific time zone, convert the original value
-      with the <codeph>from_utc_timestamp()</codeph> function.
+      The range of Impala <codeph>TIMESTAMP</codeph> is different from the Hive
+      <codeph>TIMESTAMP</codeph> type. Refer to
+      <xref
+        href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-timestamp"
+        format="html" scope="external">Hive
+      documentation</xref> for detail.
     </p>
 
     <p>
-      Because Impala does not assume that <codeph>TIMESTAMP</codeph> values are in any
-      particular time zone, you must be conscious of the time zone aspects of data that you
-      query, insert, or convert.
+      <b>INTERVAL expressions:</b>
     </p>
 
     <p>
-      For consistency with Unix system calls, the <codeph>TIMESTAMP</codeph> returned by the
-      <codeph>now()</codeph> function represents the local time in the system time zone, rather
-      than in UTC. To store values relative to the current time in a portable way, convert any
-      <codeph>now()</codeph> return values using the <codeph>to_utc_timestamp()</codeph>
-      function first. For example, the following example shows that the current time in
-      California (where this Impala cluster is located) is shortly after 2 PM. If that value was
-      written to a data file, and shipped off to a distant server to be analyzed alongside other
-      data from far-flung locations, the dates and times would not match up precisely because of
-      time zone differences. Therefore, the <codeph>to_utc_timestamp()</codeph> function
-      converts it using a common reference point, the UTC time zone (descended from the old
-      Greenwich Mean Time standard). The <codeph>'PDT'</codeph> argument indicates that the
-      original value is from the Pacific time zone with Daylight Saving Time in effect. When
-      servers in all geographic locations run the same transformation on any local date and time
-      values (with the appropriate time zone argument), the stored data uses a consistent
-      representation. Impala queries can use functions such as <codeph>EXTRACT()</codeph>,
-      <codeph>MIN()</codeph>, <codeph>AVG()</codeph>, and so on to do time-series analysis on
-      those timestamps.
+      You can perform date arithmetic by adding or subtracting a specified number of time units,
+      using the <codeph>INTERVAL</codeph> keyword and the <codeph>+</codeph> operator, the
+      <codeph>-</codeph> operator, <codeph>date_add()</codeph> or <codeph>date_sub()</codeph>.
     </p>
 
-<codeblock>[localhost:21000] > select now();
-+-------------------------------+
-| now()                         |
-+-------------------------------+
-| 2015-04-09 14:07:46.580465000 |
-+-------------------------------+
-[localhost:21000] > select to_utc_timestamp(now(), 'PDT');
-+--------------------------------+
-| to_utc_timestamp(now(), 'pdt') |
-+--------------------------------+
-| 2015-04-09 21:08:07.664547000  |
-+--------------------------------+
-</codeblock>
-
     <p>
-      The converse function, <codeph>from_utc_timestamp()</codeph>, lets you take stored
-      <codeph>TIMESTAMP</codeph> data or calculated results and convert back to local date and
-      time for processing on the application side. The following example shows how you might
-      represent some future date (such as the ending date and time of an auction) in UTC, and
-      then convert back to local time when convenient for reporting or other processing. The
-      final query in the example tests whether this arbitrary UTC date and time has passed yet,
-      by converting it back to the local time zone and comparing it against the current date and
-      time.
+      The following units are supported for <codeph><i>time_unit</i></codeph> in the
+      <codeph>INTERVAL</codeph> clause:
+      <ul>
+        <li>
+          <codeph>YEAR[S]</codeph>
+        </li>
+
+        <li>
+          <codeph>MONTH[S]</codeph>
+        </li>
+
+        <li>
+          <codeph>WEEK[S]</codeph>
+        </li>
+
+        <li>
+          <codeph>DAY[S]</codeph>
+        </li>
+
+        <li>
+          <codeph>HOUR[S]</codeph>
+        </li>
+
+        <li>
+          <codeph>MINUTE[S]</codeph>
+        </li>
+
+        <li>
+          <codeph>SECOND[S]</codeph>
+        </li>
+
+        <li>
+          <codeph>MILLISECOND[S]</codeph>
+        </li>
+
+        <li>
+          <codeph>MICROSECOND[S]</codeph>
+        </li>
+
+        <li>
+          <codeph>NANOSECOND[S]</codeph>
+        </li>
+      </ul>
     </p>
 
-<codeblock>[localhost:21000] > select to_utc_timestamp(now() + interval 2 weeks, 'PDT');
-+---------------------------------------------------+
-| to_utc_timestamp(now() + interval 2 weeks, 'pdt') |
-+---------------------------------------------------+
-| 2015-04-23 21:08:34.152923000                     |
-+---------------------------------------------------+
-[localhost:21000] > select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT');
-+------------------------------------------------------------+
-| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') |
-+------------------------------------------------------------+
-| 2015-04-23 14:08:34.152923000                              |
-+------------------------------------------------------------+
-[localhost:21000] > select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT') &lt; now();
-+--------------------------------------------------------------------+
-| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') &lt; now() |
-+--------------------------------------------------------------------+
-| false                                                              |
-+--------------------------------------------------------------------+
-</codeblock>
-
-    <p rev="2.2.0">
-      If you have data files written by Hive, those <codeph>TIMESTAMP</codeph> values represent
-      the local timezone of the host where the data was written, potentially leading to
-      inconsistent results when processed by Impala. To avoid compatibility problems or having
-      to code workarounds, you can specify one or both of these <cmdname>impalad</cmdname>
-      startup flags: <codeph>--use_local_tz_for_unix_timestamp_conversions=true</codeph>
-      <codeph>-convert_legacy_hive_parquet_utc_timestamps=true</codeph>. Although
-      <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> is turned off by default to
-      avoid performance overhead, where practical turn it on when processing
-      <codeph>TIMESTAMP</codeph> columns in Parquet files written by Hive, to avoid unexpected
-      behavior.
+    <p>
+      You can only specify one time unit in each interval expression, for example
+      <codeph>INTERVAL 3 DAYS</codeph> or <codeph>INTERVAL 25 HOURS</codeph>, but you can
+      produce any granularity by adding together successive <codeph>INTERVAL</codeph> values,
+      such as <codeph><varname>timestamp_value</varname> + INTERVAL 3 WEEKS - INTERVAL 1 DAY +
+      INTERVAL 10 MICROSECONDS</codeph>.
     </p>
 
-    <p rev="2.2.0">
-      The <codeph>--use_local_tz_for_unix_timestamp_conversions</codeph> setting affects
-      conversions from <codeph>TIMESTAMP</codeph> to <codeph>BIGINT</codeph>, or from
-      <codeph>BIGINT</codeph> to <codeph>TIMESTAMP</codeph>. By default, Impala treats all
-      <codeph>TIMESTAMP</codeph> values as UTC, to simplify analysis of time-series data from
-      different geographic regions. When you enable the
-      <codeph>--use_local_tz_for_unix_timestamp_conversions</codeph> setting, these operations
-      treat the input values as if they are in the local tie zone of the host doing the
-      processing. See <xref
-        href="impala_datetime_functions.xml#datetime_functions"/>
-      for the list of functions affected by the
-      <codeph>--use_local_tz_for_unix_timestamp_conversions</codeph> setting.
-    </p>
+    <p conref="../shared/impala_common.xml#common/internals_16_bytes"/>
 
     <p>
-      The following sequence of examples shows how the interpretation of
-      <codeph>TIMESTAMP</codeph> values in Parquet tables is affected by the setting of the
-      <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting.
+      <b>Time zones:</b>
     </p>
 
     <p>
-      Regardless of the <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting,
-      <codeph>TIMESTAMP</codeph> columns in text tables can be written and read interchangeably
-      by Impala and Hive:
+      By default, Impala stores and interprets <codeph>TIMESTAMP</codeph> values in UTC time
+      zone when writing to data files, reading from data files, or converting to and from system
+      time values through functions.
     </p>
 
-<codeblock>Impala DDL and queries for text table:
-
-[localhost:21000] > create table t1 (x timestamp);
-[localhost:21000] > insert into t1 values (now()), (now() + interval 1 day);
-[localhost:21000] > select x from t1;
-+-------------------------------+
-| x                             |
-+-------------------------------+
-| 2015-04-07 15:43:02.892403000 |
-| 2015-04-08 15:43:02.892403000 |
-+-------------------------------+
-[localhost:21000] > select to_utc_timestamp(x, 'PDT') from t1;
-+-------------------------------+
-| to_utc_timestamp(x, 'pdt')    |
-+-------------------------------+
-| 2015-04-07 22:43:02.892403000 |
-| 2015-04-08 22:43:02.892403000 |
-+-------------------------------+
-
-Hive query for text table:
-
-hive> select * from t1;
-OK
-2015-04-07 15:43:02.892403
-2015-04-08 15:43:02.892403
-Time taken: 1.245 seconds, Fetched: 2 row(s)
-</codeblock>
-
     <p>
-      When the table uses Parquet format, Impala expects any time zone adjustment to be applied
-      prior to writing, while <codeph>TIMESTAMP</codeph> values written by Hive are adjusted to
-      be in the UTC time zone. When Hive queries Parquet data files that it wrote, it adjusts
-      the <codeph>TIMESTAMP</codeph> values back to the local time zone, while Impala does no
-      conversion. Hive does no time zone conversion when it queries Impala-written Parquet
-      files.
+      When you set the <codeph>--use_local_tz_for_unix_timestamp_conversions</codeph> startup
+      flag to <codeph>TRUE</codeph>, Impala treats the <codeph>TIMESTAMP</codeph> values
+      specified in the local time zone. The local time zone is determined in the following order
+      with the <codeph>TIMESTAMP</codeph> query option takes the highest precedence:
+      <ol>
+        <li>
+          The <codeph>TIMESTAMP</codeph> query option
+        </li>
+
+        <li>
+          <codeph>$TZ</codeph> environment variable
+        </li>
+
+        <li>
+          System time zone where the impalad coordinator runs
+        </li>
+      </ol>
     </p>
 
-<codeblock>Impala DDL and queries for Parquet table:
-
-[localhost:21000] > create table p1 stored as parquet as select x from t1;
-+-------------------+
-| summary           |
-+-------------------+
-| Inserted 2 row(s) |
-+-------------------+
-[localhost:21000] > select x from p1;
-+-------------------------------+
-| x                             |
-+-------------------------------+
-| 2015-04-07 15:43:02.892403000 |
-| 2015-04-08 15:43:02.892403000 |
-+-------------------------------+
-
-Hive DDL and queries for Parquet table:
-
-hive> create table h1 (x timestamp) stored as parquet;
-OK
-hive> insert into h1 select * from p1;
-...
-OK
-Time taken: 35.573 seconds
-hive> select x from p1;
-OK
-2015-04-07 15:43:02.892403
-2015-04-08 15:43:02.892403
-Time taken: 0.324 seconds, Fetched: 2 row(s)
-hive> select x from h1;
-OK
-2015-04-07 15:43:02.892403
-2015-04-08 15:43:02.892403
-Time taken: 0.197 seconds, Fetched: 2 row(s)
-</codeblock>
+    <p> The <codeph>--use_local_tz_for_unix_timestamp_conversions</codeph>
+      setting can be used to fix discrepancy in <codeph>INTERVAL</codeph>
+      operations. For example, a <codeph>TIMESTAMP + INTERVAL
+          <varname>n-hours</varname></codeph> can be affected by Daylight Saving
+      Time, which Impala does not consider by default as these operations are
+      applied as if the timestamp was in UTC. You can use the
+        <codeph>--use_local_tz_for_unix_timestamp_conversions</codeph> setting
+      to fix the issue. </p>
+    <p>See <xref href="impala_custom_timezones.xml#custom_timezone"/> for
+      configuring to use custom time zone database and aliases.</p>
 
     <p>
-      The discrepancy arises when Impala queries the Hive-created Parquet table. The underlying
-      values in the <codeph>TIMESTAMP</codeph> column are different from the ones written by
-      Impala, even though they were copied from one table to another by an <codeph>INSERT ...
-      SELECT</codeph> statement in Hive. Hive did an implicit conversion from the local time
-      zone to UTC as it wrote the values to Parquet.
+      See <xref href="impala_datetime_functions.xml#datetime_functions">Impala Date and Time
+      Functions</xref> for the list of functions affected by the
+      <codeph>--use_local_tz_for_unix_timestamp_conversions</codeph> setting.
     </p>
 
-<codeblock>Impala query for TIMESTAMP values from Impala-written and Hive-written data:
-
-[localhost:21000] > select * from p1;
-+-------------------------------+
-| x                             |
-+-------------------------------+
-| 2015-04-07 15:43:02.892403000 |
-| 2015-04-08 15:43:02.892403000 |
-+-------------------------------+
-Fetched 2 row(s) in 0.29s
-[localhost:21000] > select * from h1;
-+-------------------------------+
-| x                             |
-+-------------------------------+
-| 2015-04-07 22:43:02.892403000 |
-| 2015-04-08 22:43:02.892403000 |
-+-------------------------------+
-Fetched 2 row(s) in 0.41s
-
-Underlying integer values for Impala-written and Hive-written data:
-
-[localhost:21000] > select cast(x as bigint) from p1;
-+-------------------+
-| cast(x as bigint) |
-+-------------------+
-| 1428421382        |
-| 1428507782        |
-+-------------------+
-Fetched 2 row(s) in 0.38s
-[localhost:21000] > select cast(x as bigint) from h1;
-+-------------------+
-| cast(x as bigint) |
-+-------------------+
-| 1428446582        |
-| 1428532982        |
-+-------------------+
-Fetched 2 row(s) in 0.20s
-</codeblock>
-
     <p>
-      When the <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting is enabled,
-      Impala recognizes the Parquet data files written by Hive, and applies the same
-      UTC-to-local-timezone conversion logic during the query as Hive uses, making the contents
-      of the Impala-written <codeph>P1</codeph> table and the Hive-written <codeph>H1</codeph>
-      table appear identical, whether represented as <codeph>TIMESTAMP</codeph> values or the
-      underlying <codeph>BIGINT</codeph> integers:
+      <b>Time zone handling between Impala and Hive:</b>
     </p>
-
-<codeblock>[localhost:21000] > select x from p1;
-+-------------------------------+
-| x                             |
-+-------------------------------+
-| 2015-04-07 15:43:02.892403000 |
-| 2015-04-08 15:43:02.892403000 |
-+-------------------------------+
-Fetched 2 row(s) in 0.37s
-[localhost:21000] > select x from h1;
-+-------------------------------+
-| x                             |
-+-------------------------------+
-| 2015-04-07 15:43:02.892403000 |
-| 2015-04-08 15:43:02.892403000 |
-+-------------------------------+
-Fetched 2 row(s) in 0.19s
-[localhost:21000] > select cast(x as bigint) from p1;
-+-------------------+
-| cast(x as bigint) |
-+-------------------+
-| 1428446582        |
-| 1428532982        |
-+-------------------+
-Fetched 2 row(s) in 0.29s
-[localhost:21000] > select cast(x as bigint) from h1;
-+-------------------+
-| cast(x as bigint) |
-+-------------------+
-| 1428446582        |
-| 1428532982        |
-+-------------------+
-Fetched 2 row(s) in 0.22s
-</codeblock>
+    <p>Interoperability between Hive and Impala is different depending on the
+      file format.</p>
+    <ul>
+      <li><i>Text</i><p> For text tables, <codeph>TIMESTAMP</codeph> values can
+          be written and read interchangeably by Impala and Hive as Hive reads
+          and writes <codeph>TIMESTAMP</codeph> values without converting with
+          respect to time zones. </p></li>
+      <li><i>Parquet</i><p> When Hive writes to Parquet data files, the
+            <codeph>TIMESTAMP</codeph> values are normalized to UTC from the
+          local time zone of the host where the data was written. On the other
+          hand, Impala does not make any time zone adjustment when it writes or
+          reads <codeph>TIMESTAMP</codeph> values to Parquet files. This
+          difference in time zone handling can cause potentially inconsistent
+          results when Impala processes <codeph>TIMESTAMP</codeph> values in the
+          Parquet files written by Hive. </p><p> To avoid incompatibility
+          problems or having to code workarounds, you can specify one or both of
+          these impalad startup flags: <ul>
+            <li>
+              <codeph>--use_local_tz_for_unix_timestamp_conversions=true</codeph>
+            </li>
+            <li>
+              <codeph>--convert_legacy_hive_parquet_utc_timestamps=true</codeph>
+            </li>
+          </ul>
+        </p><p> When the
+            <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting
+          is enabled, Impala recognizes the Parquet data files written by Hive,
+          and applies the same UTC-to-local-timezone conversion logic during the
+          query as Hive does. </p><p>In <keyword keyref="impala30"/> and lower,
+          this option had severe impact on multi-threaded performance. The new
+          time zone implementation in <keyword keyref="impala31"/> eliminated
+          most of the performance overhead and made Impala scale well to
+          multiple threads. Although
+            <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> is
+          turned off by default for this performance reason, where practical
+          turn it on when processing <codeph>TIMESTAMP</codeph> columns in
+          Parquet files written by Hive, to avoid unexpected behavior. </p></li>
+    </ul>
 
     <p>
       <b>Conversions:</b>
@@ -422,26 +279,31 @@ ERROR: AnalysisException: Type 'TIMESTAMP' is not supported as partition-column
 
     <p conref="../shared/impala_common.xml#common/null_bad_timestamp_cast"/>
 
-    <p conref="../shared/impala_common.xml#common/partitioning_worrisome"/>
-
     <p conref="../shared/impala_common.xml#common/hbase_ok"/>
 
+    <p>
+      <b>Parquet consideration:</b> int96 encoded Parquet timestamps are supported in Impala.
+      int64 timestamps will be supported in a future release.
+    </p>
+
+    <p/>
+
     <p conref="../shared/impala_common.xml#common/parquet_ok"/>
 
     <p conref="../shared/impala_common.xml#common/text_bulky"/>
 
 <!--    <p conref="../shared/impala_common.xml#common/compatibility_blurb"/> -->
 
-    <p conref="../shared/impala_common.xml#common/internals_16_bytes"/>
-
-    <p conref="../shared/impala_common.xml#common/added_forever"/>
-
     <p conref="../shared/impala_common.xml#common/column_stats_constant"/>
 
     <p conref="../shared/impala_common.xml#common/sqoop_blurb"/>
 
     <p conref="../shared/impala_common.xml#common/sqoop_timestamp_caveat"/>
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/kudu_timestamp_details"/>
+
     <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
 
     <p>
@@ -453,10 +315,6 @@ ERROR: AnalysisException: Type 'TIMESTAMP' is not supported as partition-column
 
     <p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
 
-    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
-
-    <p conref="../shared/impala_common.xml#common/kudu_timestamp_details"/>
-
     <p conref="../shared/impala_common.xml#common/example_blurb"/>
 
     <p>
@@ -524,6 +382,8 @@ select s, t, b from timestamp_t order by t;
 +-------------------------------+-------------------------------+------------+
 </codeblock>
 
+    <p conref="../shared/impala_common.xml#common/added_forever"/>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
 
     <ul>


[3/3] impala git commit: IMPALA-7867 (Part 1): Expose List in TreeNode, parser

Posted by ar...@apache.org.
IMPALA-7867 (Part 1): Expose List in TreeNode, parser

When using Java collections, a common Java best practice is to expose
the collection interface, but hide the implementation choice. This
pattern allows us to start with a generic implementation (an ArrayList,
say), but evolve to a more specific implementation to achieve certain
goals (a LinkedList or ImmutableList, say.)

For whatever reason, the Impala FE code exposes ArrayList, HashMap and
other implementation choices as variable types and in method signatures.

Also, since Java 7, the preferred way to create an array is

new ArrayList<>()

Replaced older forms:

new ArrayList<foo>() // Pre-Java 7
Lists.newArrayList() // Guava form, pre-Java 7

This ticket cleans up two files, and their dependencies:

* TreeNode (the root of all parser nodes)
* sql-parser.cup (the code which creates the parser nodes)

Many other uses exist, and will be submitted as separate patches to keep
patches small.

In TreeNode, also cleaned up some of the generic expresions, which
caused dependencies to change in order to be more type-safe.

Tests: This is purely a refactoring, no functionality changed. Ran the
FE unit tests to verify no regressions.

Change-Id: Iebab7dccdb4b2fa0b5ca812beab0e8bdba39f539
Reviewed-on: http://gerrit.cloudera.org:8080/11954
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/12dc29e5
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/12dc29e5
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/12dc29e5

Branch: refs/heads/master
Commit: 12dc29e5ee1ec03bbc1556eca4539645e2b9bbf1
Parents: fcfabe0
Author: Paul Rogers <pr...@cloudera.com>
Authored: Sun Nov 18 15:13:34 2018 -0800
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue Nov 27 00:04:57 2018 +0000

----------------------------------------------------------------------
 fe/src/main/cup/sql-parser.cup                  | 81 ++++++++++----------
 .../apache/impala/analysis/AnalyticInfo.java    |  8 +-
 .../impala/analysis/DescribeTableStmt.java      |  6 +-
 .../java/org/apache/impala/analysis/Expr.java   |  2 +-
 .../apache/impala/analysis/FunctionName.java    | 10 +--
 .../org/apache/impala/analysis/QueryStmt.java   |  4 +-
 .../org/apache/impala/analysis/SelectStmt.java  | 10 +--
 .../org/apache/impala/analysis/SlotRef.java     |  3 +-
 .../org/apache/impala/analysis/UnionStmt.java   |  2 +-
 .../org/apache/impala/analysis/ValuesStmt.java  |  3 +-
 .../org/apache/impala/analysis/WithClause.java  |  5 +-
 .../org/apache/impala/catalog/StructType.java   |  7 +-
 .../java/org/apache/impala/common/TreeNode.java | 40 +++++++---
 .../apache/impala/planner/AnalyticPlanner.java  |  4 +-
 .../org/apache/impala/planner/HdfsScanNode.java |  2 +-
 .../java/org/apache/impala/planner/Planner.java |  4 +-
 .../org/apache/impala/service/Frontend.java     |  4 +-
 17 files changed, 103 insertions(+), 92 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/cup/sql-parser.cup
----------------------------------------------------------------------
diff --git a/fe/src/main/cup/sql-parser.cup b/fe/src/main/cup/sql-parser.cup
index 50418d0..12fd55d 100644
--- a/fe/src/main/cup/sql-parser.cup
+++ b/fe/src/main/cup/sql-parser.cup
@@ -20,7 +20,6 @@ package org.apache.impala.analysis;
 import com.google.common.collect.Lists;
 import com.google.common.collect.Maps;
 import java.math.BigDecimal;
-import java.math.BigInteger;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.HashMap;
@@ -44,7 +43,6 @@ import org.apache.impala.catalog.StructField;
 import org.apache.impala.catalog.StructType;
 import org.apache.impala.catalog.Type;
 import org.apache.impala.catalog.View;
-import org.apache.impala.common.AnalysisException;
 import org.apache.impala.common.Pair;
 import org.apache.impala.thrift.TCatalogObjectType;
 import org.apache.impala.thrift.TDescribeOutputStyle;
@@ -67,7 +65,7 @@ parser code {:
 
   // list of expected tokens ids from current parsing state
   // for generating syntax error message
-  private final List<Integer> expectedTokenIds_ = new ArrayList<Integer>();
+  private final List<Integer> expectedTokenIds_ = new ArrayList<>();
 
   // Currently used to tell if it's decimal V1 or V2 mode.
   // TODO: remove when V1 code is dropped.
@@ -110,6 +108,7 @@ parser code {:
   }
 
   // override to save error token
+  @Override
   public void syntax_error(java_cup.runtime.Symbol token) {
     errorToken_ = token;
 
@@ -365,10 +364,10 @@ nonterminal SelectList select_list;
 nonterminal SelectListItem select_list_item;
 nonterminal SelectListItem star_expr;
 nonterminal Expr expr, non_pred_expr, arithmetic_expr, timestamp_arithmetic_expr;
-nonterminal ArrayList<Expr> expr_list;
+nonterminal List<Expr> expr_list;
 nonterminal String alias_clause;
-nonterminal ArrayList<String> ident_list, primary_keys;
-nonterminal ArrayList<String> opt_ident_list, opt_sort_cols;
+nonterminal List<String> ident_list, primary_keys;
+nonterminal List<String> opt_ident_list, opt_sort_cols;
 nonterminal TableName table_name;
 nonterminal ColumnName column_name;
 nonterminal FunctionName function_name;
@@ -376,9 +375,9 @@ nonterminal Expr where_clause;
 nonterminal Expr predicate, bool_test_expr;
 nonterminal Predicate between_predicate, comparison_predicate, compound_predicate,
   in_predicate, like_predicate, exists_predicate;
-nonterminal ArrayList<Expr> group_by_clause, opt_partition_by_clause;
+nonterminal List<Expr> group_by_clause, opt_partition_by_clause;
 nonterminal Expr having_clause;
-nonterminal ArrayList<OrderByElement> order_by_elements, opt_order_by_clause;
+nonterminal List<OrderByElement> order_by_elements, opt_order_by_clause;
 nonterminal OrderByElement order_by_element;
 nonterminal Boolean opt_order_param;
 nonterminal Boolean opt_nulls_order_param;
@@ -393,15 +392,15 @@ nonterminal AnalyticWindow.Boundary window_boundary;
 nonterminal LiteralExpr literal;
 nonterminal NumericLiteral numeric_literal;
 nonterminal CaseExpr case_expr;
-nonterminal ArrayList<CaseWhenClause> case_when_clause_list;
+nonterminal List<CaseWhenClause> case_when_clause_list;
 nonterminal FunctionParams function_params;
-nonterminal ArrayList<String> dotted_path;
+nonterminal List<String> dotted_path;
 nonterminal SlotRef slot_ref;
 nonterminal FromClause from_clause;
-nonterminal ArrayList<TableRef> table_ref_list;
+nonterminal List<TableRef> table_ref_list;
 nonterminal TableSampleClause opt_tablesample;
 nonterminal WithClause opt_with_clause;
-nonterminal ArrayList<View> with_view_def_list;
+nonterminal List<View> with_view_def_list;
 nonterminal View with_view_def;
 nonterminal TableRef table_ref;
 nonterminal Subquery subquery;
@@ -415,7 +414,7 @@ nonterminal Expr sign_chain_expr;
 nonterminal InsertStmt insert_stmt, upsert_stmt;
 nonterminal UpdateStmt update_stmt;
 nonterminal DeleteStmt delete_stmt;
-nonterminal ArrayList<Pair<SlotRef, Expr>> update_set_expr_list;
+nonterminal List<Pair<SlotRef, Expr>> update_set_expr_list;
 nonterminal StatementBase explain_stmt;
 // Optional partition spec
 nonterminal PartitionSpec opt_partition_spec;
@@ -425,9 +424,9 @@ nonterminal PartitionSpec partition_spec;
 nonterminal PartitionSet opt_partition_set;
 // Required partition set
 nonterminal PartitionSet partition_set;
-nonterminal ArrayList<PartitionKeyValue> partition_clause;
-nonterminal ArrayList<PartitionKeyValue> static_partition_key_value_list;
-nonterminal ArrayList<PartitionKeyValue> partition_key_value_list;
+nonterminal List<PartitionKeyValue> partition_clause;
+nonterminal List<PartitionKeyValue> static_partition_key_value_list;
+nonterminal List<PartitionKeyValue> partition_key_value_list;
 nonterminal PartitionKeyValue partition_key_value;
 nonterminal PartitionKeyValue static_partition_key_value;
 nonterminal Qualifier union_op;
@@ -462,13 +461,13 @@ nonterminal List<RangePartition> range_params_list;
 nonterminal RangePartition range_param;
 nonterminal Pair<List<Expr>, Boolean> opt_lower_range_val,
    opt_upper_range_val;
-nonterminal ArrayList<KuduPartitionParam> hash_partition_param_list;
-nonterminal ArrayList<KuduPartitionParam> partition_param_list;
+nonterminal List<KuduPartitionParam> hash_partition_param_list;
+nonterminal List<KuduPartitionParam> partition_param_list;
 nonterminal KuduPartitionParam range_partition_param;
 nonterminal ColumnDef column_def, view_column_def;
-nonterminal ArrayList<ColumnDef> column_def_list, partition_column_defs,
+nonterminal List<ColumnDef> column_def_list, partition_column_defs,
   view_column_def_list, view_column_defs;
-nonterminal ArrayList<StructField> struct_field_def_list;
+nonterminal List<StructField> struct_field_def_list;
 // Options for DDL commands - CREATE/DROP/ALTER
 nonterminal HdfsCachingOp cache_op_val, opt_cache_op_val;
 nonterminal BigDecimal opt_cache_op_replication;
@@ -839,7 +838,7 @@ update_stmt ::=
 update_set_expr_list ::=
   slot_ref:slot EQUAL expr:e
   {:
-    ArrayList<Pair<SlotRef, Expr>> tmp =
+    List<Pair<SlotRef, Expr>> tmp =
         Lists.newArrayList(new Pair<SlotRef, Expr>(slot, e));
     RESULT = tmp;
   :}
@@ -906,7 +905,7 @@ opt_ident_list ::=
   ident_list:ident
   {: RESULT = ident; :}
   | /* empty */
-  {: RESULT = Lists.newArrayList(); :}
+  {: RESULT = new ArrayList<>(); :}
   ;
 
 opt_kw_table ::=
@@ -1442,7 +1441,7 @@ hash_partition_param ::=
   {: RESULT = KuduPartitionParam.createHashParam(cols, numPartitions.intValue()); :}
   | KW_HASH KW_PARTITIONS INTEGER_LITERAL:numPartitions
   {:
-    RESULT = KuduPartitionParam.createHashParam(Lists.<String>newArrayList(),
+    RESULT = KuduPartitionParam.createHashParam(new ArrayList<>(),
         numPartitions.intValue());
   :}
   ;
@@ -1719,7 +1718,7 @@ properties_map ::=
 column_def_list ::=
   column_def:col_def
   {:
-    ArrayList<ColumnDef> list = Lists.newArrayList();
+    List<ColumnDef> list = new ArrayList<>();
     list.add(col_def);
     RESULT = list;
   :}
@@ -1901,7 +1900,7 @@ view_column_defs ::=
 view_column_def_list ::=
   view_column_def:col_def
   {:
-    ArrayList<ColumnDef> list = new ArrayList<ColumnDef>();
+    List<ColumnDef> list = new ArrayList<>();
     list.add(col_def);
     RESULT = list;
   :}
@@ -2025,7 +2024,7 @@ partition_clause ::=
 partition_key_value_list ::=
   partition_key_value:item
   {:
-    ArrayList<PartitionKeyValue> list = new ArrayList<PartitionKeyValue>();
+    List<PartitionKeyValue> list = new ArrayList<>();
     list.add(item);
     RESULT = list;
   :}
@@ -2071,7 +2070,7 @@ opt_partition_spec ::=
 static_partition_key_value_list ::=
   static_partition_key_value:item
   {:
-    ArrayList<PartitionKeyValue> list = new ArrayList<PartitionKeyValue>();
+    List<PartitionKeyValue> list = new ArrayList<>();
     list.add(item);
     RESULT = list;
   :}
@@ -2241,7 +2240,7 @@ with_view_def ::=
 with_view_def_list ::=
   with_view_def:v
   {:
-    ArrayList<View> list = new ArrayList<View>();
+    List<View> list = new ArrayList<>();
     list.add(v);
     RESULT = list;
   :}
@@ -2307,7 +2306,7 @@ union_operand ::=
 union_operand_list ::=
   union_operand:operand
   {:
-    List<UnionOperand> operands = new ArrayList<UnionOperand>();
+    List<UnionOperand> operands = new ArrayList<>();
     operands.add(new UnionOperand(operand, null));
     RESULT = operands;
   :}
@@ -2345,7 +2344,7 @@ values_stmt ::=
 values_operand_list ::=
   LPAREN select_list:selectList RPAREN
   {:
-    List<UnionOperand> operands = new ArrayList<UnionOperand>();
+    List<UnionOperand> operands = new ArrayList<>();
     operands.add(new UnionOperand(
         new SelectStmt(selectList, null, null, null, null, null, null), null));
     RESULT = operands;
@@ -2608,7 +2607,7 @@ from_clause ::=
 table_ref_list ::=
   table_ref:table opt_plan_hints:hints
   {:
-    ArrayList<TableRef> list = new ArrayList<TableRef>();
+    List<TableRef> list = new ArrayList<>();
     table.setTableHints(hints);
     list.add(table);
     RESULT = list;
@@ -2703,7 +2702,7 @@ opt_plan_hints ::=
   plan_hints:hints
   {: RESULT = hints; :}
   | /* empty */
-  {: RESULT = Lists.newArrayList(); :}
+  {: RESULT = new ArrayList<>(); :}
   ;
 
 plan_hints ::=
@@ -2731,7 +2730,7 @@ plan_hint ::=
 plan_hint_list ::=
   plan_hint:hint
   {:
-    ArrayList<PlanHint> hints = Lists.newArrayList(hint);
+    List<PlanHint> hints = Lists.newArrayList(hint);
     RESULT = hints;
   :}
   | plan_hint_list:hints COMMA plan_hint:hint
@@ -2754,7 +2753,7 @@ opt_tablesample ::=
 ident_list ::=
   ident_or_default:ident
   {:
-    ArrayList<String> list = new ArrayList<String>();
+    List<String> list = new ArrayList<>();
     list.add(ident);
     RESULT = list;
   :}
@@ -2768,7 +2767,7 @@ ident_list ::=
 expr_list ::=
   expr:e
   {:
-    ArrayList<Expr> list = new ArrayList<Expr>();
+    List<Expr> list = new ArrayList<>();
     list.add(e);
     RESULT = list;
   :}
@@ -2810,7 +2809,7 @@ opt_order_by_clause ::=
 order_by_elements ::=
   order_by_element:e
   {:
-    ArrayList<OrderByElement> list = new ArrayList<OrderByElement>();
+    List<OrderByElement> list = new ArrayList<>();
     list.add(e);
     RESULT = list;
   :}
@@ -2891,7 +2890,7 @@ case_expr ::=
 case_when_clause_list ::=
   KW_WHEN expr:whenExpr KW_THEN expr:thenExpr
   {:
-    ArrayList<CaseWhenClause> list = new ArrayList<CaseWhenClause>();
+    List<CaseWhenClause> list = new ArrayList<>();
     list.add(new CaseWhenClause(whenExpr, thenExpr));
     RESULT = list;
   :}
@@ -2985,7 +2984,7 @@ function_call_expr ::=
   function_name:fn_name LPAREN RPAREN
   {:
     RESULT = FunctionCallExpr.createExpr(
-        fn_name, new FunctionParams(new ArrayList<Expr>()), parser.getQueryOptions());
+        fn_name, new FunctionParams(new ArrayList<>()), parser.getQueryOptions());
   :}
   | function_name:fn_name LPAREN function_params:params RPAREN
   {: RESULT = FunctionCallExpr.createExpr(fn_name, params, parser.getQueryOptions()); :}
@@ -3112,7 +3111,7 @@ timestamp_arithmetic_expr ::=
       // Report parsing failure on keyword interval.
       parser.parseError("interval", SqlParserSymbols.KW_INTERVAL);
     }
-    ArrayList<String> fnNamePath = functionName.getFnNamePath();
+    List<String> fnNamePath = functionName.getFnNamePath();
     if (fnNamePath.size() > 1) {
       // This production should not accept fully qualified function names
       throw new Exception("interval should not be qualified by database name");
@@ -3318,7 +3317,7 @@ slot_ref ::=
 dotted_path ::=
   ident_or_default:ident
   {:
-    ArrayList<String> list = new ArrayList<String>();
+    List<String> list = new ArrayList<>();
     list.add(ident);
     RESULT = list;
   :}
@@ -3390,7 +3389,7 @@ struct_field_def ::=
 struct_field_def_list ::=
   struct_field_def:field_def
   {:
-    ArrayList<StructField> list = new ArrayList<StructField>();
+    List<StructField> list = new ArrayList<>();
     list.add(field_def);
     RESULT = list;
   :}

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/analysis/AnalyticInfo.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/AnalyticInfo.java b/fe/src/main/java/org/apache/impala/analysis/AnalyticInfo.java
index 26ce3f8..09d72d9 100644
--- a/fe/src/main/java/org/apache/impala/analysis/AnalyticInfo.java
+++ b/fe/src/main/java/org/apache/impala/analysis/AnalyticInfo.java
@@ -38,7 +38,7 @@ public class AnalyticInfo extends AggregateInfoBase {
   // All unique analytic exprs of a select block. Used to populate
   // super.aggregateExprs_ based on AnalyticExpr.getFnCall() for each analytic expr
   // in this list.
-  private final ArrayList<Expr> analyticExprs_;
+  private final ArrayList<AnalyticExpr> analyticExprs_;
 
   // Intersection of the partition exps of all the analytic functions.
   private final List<Expr> commonPartitionExprs_;
@@ -46,7 +46,7 @@ public class AnalyticInfo extends AggregateInfoBase {
   // map from analyticExprs_ to their corresponding analytic tuple slotrefs
   private final ExprSubstitutionMap analyticTupleSmap_;
 
-  private AnalyticInfo(ArrayList<Expr> analyticExprs) {
+  private AnalyticInfo(ArrayList<AnalyticExpr> analyticExprs) {
     super(new ArrayList<Expr>(), new ArrayList<FunctionCallExpr>());
     analyticExprs_ = Expr.cloneList(analyticExprs);
     // Extract the analytic function calls for each analytic expr.
@@ -68,7 +68,7 @@ public class AnalyticInfo extends AggregateInfoBase {
     commonPartitionExprs_ = Expr.cloneList(other.commonPartitionExprs_);
   }
 
-  public ArrayList<Expr> getAnalyticExprs() { return analyticExprs_; }
+  public ArrayList<AnalyticExpr> getAnalyticExprs() { return analyticExprs_; }
   public ExprSubstitutionMap getSmap() { return analyticTupleSmap_; }
   public List<Expr> getCommonPartitionExprs() { return commonPartitionExprs_; }
 
@@ -77,7 +77,7 @@ public class AnalyticInfo extends AggregateInfoBase {
    * smaps.
    */
   static public AnalyticInfo create(
-      ArrayList<Expr> analyticExprs, Analyzer analyzer) {
+      ArrayList<AnalyticExpr> analyticExprs, Analyzer analyzer) {
     Preconditions.checkState(analyticExprs != null && !analyticExprs.isEmpty());
     Expr.removeDuplicates(analyticExprs);
     AnalyticInfo result = new AnalyticInfo(analyticExprs);

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java b/fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java
index 65d9c75..1e7c645 100644
--- a/fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java
@@ -17,13 +17,11 @@
 
 package org.apache.impala.analysis;
 
-import java.util.ArrayList;
 import java.util.List;
 
 import org.apache.commons.lang3.StringUtils;
 import org.apache.impala.analysis.Path.PathType;
 import org.apache.impala.authorization.Privilege;
-import org.apache.impala.authorization.PrivilegeRequest;
 import org.apache.impala.authorization.PrivilegeRequestBuilder;
 import org.apache.impala.catalog.FeTable;
 import org.apache.impala.catalog.StructType;
@@ -54,7 +52,7 @@ public class DescribeTableStmt extends StatementBase {
   private final TDescribeOutputStyle outputStyle_;
 
   /// "."-separated path from the describe statement.
-  private final ArrayList<String> rawPath_;
+  private final List<String> rawPath_;
 
   /// The resolved path to describe, set after analysis.
   private Path path_;
@@ -66,7 +64,7 @@ public class DescribeTableStmt extends StatementBase {
   /// Only set when describing a path to a nested collection.
   private StructType resultStruct_;
 
-  public DescribeTableStmt(ArrayList<String> rawPath, TDescribeOutputStyle outputStyle) {
+  public DescribeTableStmt(List<String> rawPath, TDescribeOutputStyle outputStyle) {
     Preconditions.checkNotNull(rawPath);
     Preconditions.checkArgument(!rawPath.isEmpty());
     rawPath_ = rawPath;

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/analysis/Expr.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/Expr.java b/fe/src/main/java/org/apache/impala/analysis/Expr.java
index a17877e..6141a11 100644
--- a/fe/src/main/java/org/apache/impala/analysis/Expr.java
+++ b/fe/src/main/java/org/apache/impala/analysis/Expr.java
@@ -1031,7 +1031,7 @@ abstract public class Expr extends TreeNode<Expr> implements ParseNode, Cloneabl
     return this;
   }
 
-  public static ArrayList<Expr> resetList(ArrayList<Expr> l) {
+  public static List<Expr> resetList(List<Expr> l) {
     for (int i = 0; i < l.size(); ++i) {
       l.set(i, l.get(i).reset());
     }

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/analysis/FunctionName.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/FunctionName.java b/fe/src/main/java/org/apache/impala/analysis/FunctionName.java
index aeaa6e1..f3b3d5d 100644
--- a/fe/src/main/java/org/apache/impala/analysis/FunctionName.java
+++ b/fe/src/main/java/org/apache/impala/analysis/FunctionName.java
@@ -17,13 +17,13 @@
 
 package org.apache.impala.analysis;
 
-import java.util.ArrayList;
+import java.util.List;
 
 import org.apache.impala.catalog.BuiltinsDb;
-import org.apache.impala.catalog.Catalog;
 import org.apache.impala.catalog.Db;
 import org.apache.impala.common.AnalysisException;
 import org.apache.impala.thrift.TFunctionName;
+
 import com.google.common.base.Joiner;
 import com.google.common.base.Preconditions;
 
@@ -33,7 +33,7 @@ import com.google.common.base.Preconditions;
  */
 public class FunctionName {
   // Only set for parsed function names.
-  private final ArrayList<String> fnNamePath_;
+  private final List<String> fnNamePath_;
 
   // Set/validated during analysis.
   private String db_;
@@ -45,7 +45,7 @@ public class FunctionName {
    * C'tor for parsed function names. The function names could be invalid. The validity
    * is checked during analysis.
    */
-  public FunctionName(ArrayList<String> fnNamePath) {
+  public FunctionName(List<String> fnNamePath) {
     fnNamePath_ = fnNamePath;
   }
 
@@ -75,7 +75,7 @@ public class FunctionName {
   public String getFunction() { return fn_; }
   public boolean isFullyQualified() { return db_ != null; }
   public boolean isBuiltin() { return isBuiltin_; }
-  public ArrayList<String> getFnNamePath() { return fnNamePath_; }
+  public List<String> getFnNamePath() { return fnNamePath_; }
 
   @Override
   public String toString() {

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java b/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
index a561cbd..50d32c1 100644
--- a/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
@@ -49,7 +49,7 @@ public abstract class QueryStmt extends StatementBase {
 
   protected WithClause withClause_;
 
-  protected ArrayList<OrderByElement> orderByElements_;
+  protected List<OrderByElement> orderByElements_;
   protected LimitElement limitElement_;
 
   // For a select statment:
@@ -98,7 +98,7 @@ public abstract class QueryStmt extends StatementBase {
   // returns a single row.
   protected boolean isRuntimeScalar_ = false;
 
-  QueryStmt(ArrayList<OrderByElement> orderByElements, LimitElement limitElement) {
+  QueryStmt(List<OrderByElement> orderByElements, LimitElement limitElement) {
     orderByElements_ = orderByElements;
     sortInfo_ = null;
     limitElement_ = limitElement == null ? new LimitElement(null, null) : limitElement;

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java b/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
index 449a16e..f60396a 100644
--- a/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
@@ -53,10 +53,10 @@ public class SelectStmt extends QueryStmt {
   // BEGIN: Members that need to be reset()
 
   protected SelectList selectList_;
-  protected final ArrayList<String> colLabels_; // lower case column labels
+  protected final List<String> colLabels_; // lower case column labels
   protected final FromClause fromClause_;
   protected Expr whereClause_;
-  protected ArrayList<Expr> groupingExprs_;
+  protected List<Expr> groupingExprs_;
   protected Expr havingClause_;  // original having clause
 
   // havingClause with aliases and agg output resolved
@@ -81,8 +81,8 @@ public class SelectStmt extends QueryStmt {
 
   SelectStmt(SelectList selectList,
              FromClause fromClause,
-             Expr wherePredicate, ArrayList<Expr> groupingExprs,
-             Expr havingPredicate, ArrayList<OrderByElement> orderByElements,
+             Expr wherePredicate, List<Expr> groupingExprs,
+             Expr havingPredicate, List<OrderByElement> orderByElements,
              LimitElement limitElement) {
     super(orderByElements, limitElement);
     selectList_ = selectList;
@@ -840,7 +840,7 @@ public class SelectStmt extends QueryStmt {
     private void createAnalyticInfo()
         throws AnalysisException {
       // collect AnalyticExprs from the SELECT and ORDER BY clauses
-      ArrayList<Expr> analyticExprs = Lists.newArrayList();
+      ArrayList<AnalyticExpr> analyticExprs = Lists.newArrayList();
       TreeNode.collect(resultExprs_, AnalyticExpr.class, analyticExprs);
       if (sortInfo_ != null) {
         TreeNode.collect(sortInfo_.getSortExprs(), AnalyticExpr.class,

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/SlotRef.java b/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
index 8860871..bd3d543 100644
--- a/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
+++ b/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
@@ -17,7 +17,6 @@
 
 package org.apache.impala.analysis;
 
-import java.util.ArrayList;
 import java.util.List;
 import java.util.Set;
 
@@ -41,7 +40,7 @@ public class SlotRef extends Expr {
   // Results of analysis.
   private SlotDescriptor desc_;
 
-  public SlotRef(ArrayList<String> rawPath) {
+  public SlotRef(List<String> rawPath) {
     super();
     rawPath_ = rawPath;
     label_ = ToSqlUtils.getPathSql(rawPath_);

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java b/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
index 8e73178..8aafc67 100644
--- a/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
@@ -162,7 +162,7 @@ public class UnionStmt extends QueryStmt {
   /////////////////////////////////////////
 
   public UnionStmt(List<UnionOperand> operands,
-      ArrayList<OrderByElement> orderByElements, LimitElement limitElement) {
+      List<OrderByElement> orderByElements, LimitElement limitElement) {
     super(orderByElements, limitElement);
     Preconditions.checkNotNull(operands);
     Preconditions.checkState(operands.size() > 0);

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java b/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
index 8bce93a..06a1c67 100644
--- a/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
@@ -17,7 +17,6 @@
 
 package org.apache.impala.analysis;
 
-import java.util.ArrayList;
 import java.util.List;
 
 import com.google.common.base.Preconditions;
@@ -34,7 +33,7 @@ import static org.apache.impala.analysis.ToSqlOptions.DEFAULT;
 public class ValuesStmt extends UnionStmt {
 
   public ValuesStmt(List<UnionOperand> operands,
-      ArrayList<OrderByElement> orderByElements, LimitElement limitElement) {
+      List<OrderByElement> orderByElements, LimitElement limitElement) {
     super(operands, orderByElements, limitElement);
   }
 

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/analysis/WithClause.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/WithClause.java b/fe/src/main/java/org/apache/impala/analysis/WithClause.java
index f0548c4..2e4d590 100644
--- a/fe/src/main/java/org/apache/impala/analysis/WithClause.java
+++ b/fe/src/main/java/org/apache/impala/analysis/WithClause.java
@@ -17,7 +17,6 @@
 
 package org.apache.impala.analysis;
 
-import java.util.ArrayList;
 import java.util.List;
 
 import org.apache.impala.authorization.PrivilegeRequest;
@@ -53,12 +52,12 @@ public class WithClause implements ParseNode {
   /////////////////////////////////////////
   // BEGIN: Members that need to be reset()
 
-  private final ArrayList<View> views_;
+  private final List<View> views_;
 
   // END: Members that need to be reset()
   /////////////////////////////////////////
 
-  public WithClause(ArrayList<View> views) {
+  public WithClause(List<View> views) {
     Preconditions.checkNotNull(views);
     Preconditions.checkState(!views.isEmpty());
     views_ = views;

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/catalog/StructType.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/catalog/StructType.java b/fe/src/main/java/org/apache/impala/catalog/StructType.java
index 54507ec..77d4648 100644
--- a/fe/src/main/java/org/apache/impala/catalog/StructType.java
+++ b/fe/src/main/java/org/apache/impala/catalog/StructType.java
@@ -19,6 +19,7 @@ package org.apache.impala.catalog;
 
 import java.util.ArrayList;
 import java.util.HashMap;
+import java.util.List;
 
 import org.apache.commons.lang3.StringUtils;
 
@@ -36,9 +37,9 @@ import com.google.common.collect.Maps;
  */
 public class StructType extends Type {
   private final HashMap<String, StructField> fieldMap_ = Maps.newHashMap();
-  private final ArrayList<StructField> fields_;
+  private final List<StructField> fields_;
 
-  public StructType(ArrayList<StructField> fields) {
+  public StructType(List<StructField> fields) {
     Preconditions.checkNotNull(fields);
     fields_ = fields;
     for (int i = 0; i < fields_.size(); ++i) {
@@ -74,7 +75,7 @@ public class StructType extends Type {
     fieldMap_.put(field.getName().toLowerCase(), field);
   }
 
-  public ArrayList<StructField> getFields() { return fields_; }
+  public List<StructField> getFields() { return fields_; }
 
   public StructField getField(String fieldName) {
     return fieldMap_.get(fieldName.toLowerCase());

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/common/TreeNode.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/common/TreeNode.java b/fe/src/main/java/org/apache/impala/common/TreeNode.java
index b89f133..c5e7a87 100644
--- a/fe/src/main/java/org/apache/impala/common/TreeNode.java
+++ b/fe/src/main/java/org/apache/impala/common/TreeNode.java
@@ -28,7 +28,7 @@ import com.google.common.base.Predicate;
  * Generic tree structure. Only concrete subclasses of this can be instantiated.
  */
 public abstract class TreeNode<NodeType extends TreeNode<NodeType>> {
-  protected ArrayList<NodeType> children_ = new ArrayList<NodeType>();
+  protected List<NodeType> children_ = new ArrayList<>();
 
   public NodeType getChild(int i) {
     return hasChild(i) ? children_.get(i) : null;
@@ -48,20 +48,25 @@ public abstract class TreeNode<NodeType extends TreeNode<NodeType>> {
 
   public boolean hasChild(int i) { return children_.size() > i; }
   public void setChild(int index, NodeType n) { children_.set(index, n); }
-  public ArrayList<NodeType> getChildren() { return children_; }
+  public List<NodeType> getChildren() { return children_; }
 
   /**
    * Return list of all nodes of the tree rooted at 'this', obtained
    * through pre-order traversal.
+   *
+   * Warning: this method is type-unsafe: it returns a list of nodes
+   * of the requested type, but does not verify that the actual
+   * nodes are indeed of that type.
    */
-  public <C extends TreeNode<NodeType>> ArrayList<C> getNodesPreOrder() {
-    ArrayList<C> result = new ArrayList<C>();
+  public <C extends TreeNode<NodeType>> List<C> getNodesPreOrder() {
+    List<C> result = new ArrayList<>();
     getNodesPreOrderAux(result);
     return result;
   }
 
+  @SuppressWarnings("unchecked")
   protected <C extends TreeNode<NodeType>> void getNodesPreOrderAux(
-      ArrayList<C> result) {
+      List<C> result) {
     result.add((C) this);
     for (NodeType child: children_) child.getNodesPreOrderAux(result);
   }
@@ -69,15 +74,20 @@ public abstract class TreeNode<NodeType extends TreeNode<NodeType>> {
   /**
    * Return list of all nodes of the tree rooted at 'this', obtained
    * through post-order traversal.
+   *
+   * Warning: this method is type-unsafe: it returns a list of nodes
+   * of the requested type, but does not verify that the actual
+   * nodes are indeed of that type.
    */
-  public <C extends TreeNode<NodeType>> ArrayList<C> getNodesPostOrder() {
-    ArrayList<C> result = new ArrayList<C>();
+  public <C extends TreeNode<NodeType>> List<C> getNodesPostOrder() {
+    List<C> result = new ArrayList<>();
     getNodesPostOrderAux(result);
     return result;
   }
 
+  @SuppressWarnings("unchecked")
   protected <C extends TreeNode<NodeType>> void getNodesPostOrderAux(
-      ArrayList<C> result) {
+      List<C> result) {
     for (NodeType child: children_) child.getNodesPostOrderAux(result);
     result.add((C) this);
   }
@@ -97,6 +107,7 @@ public abstract class TreeNode<NodeType extends TreeNode<NodeType>> {
    * This node is checked first, followed by its children in order. If the node
    * itself matches, the children are skipped.
    */
+  @SuppressWarnings("unchecked")
   public <C extends TreeNode<NodeType>, D extends C> void collect(
       Predicate<? super C> predicate, Collection<D> matches) {
     // TODO: the semantics of this function are very strange. contains()
@@ -116,8 +127,9 @@ public abstract class TreeNode<NodeType extends TreeNode<NodeType>> {
    * This node is checked first, followed by its children in order. If the node
    * itself is of class 'cl', the children are skipped.
    */
+  @SuppressWarnings("unchecked")
   public <C extends TreeNode<NodeType>, D extends C> void collect(
-      Class cl, Collection<D> matches) {
+      Class<D> cl, Collection<D> matches) {
     if (cl.equals(getClass())) {
       matches.add((D) this);
       return;
@@ -130,6 +142,7 @@ public abstract class TreeNode<NodeType extends TreeNode<NodeType>> {
    * This node is checked first, followed by its children in order. All nodes
    * that match in the subtree are added.
    */
+  @SuppressWarnings("unchecked")
   public <C extends TreeNode<NodeType>, D extends C> void collectAll(
       Predicate<? super C> predicate, List<D> matches) {
     if (predicate.apply((C) this)) matches.add((D) this);
@@ -150,13 +163,14 @@ public abstract class TreeNode<NodeType extends TreeNode<NodeType>> {
    * into 'matches'
    */
   public static <C extends TreeNode<C>, D extends C> void collect(
-      Collection<C> nodeList, Class cl, Collection<D> matches) {
+      Collection<C> nodeList, Class<D> cl, Collection<D> matches) {
     for (C node: nodeList) node.collect(cl, matches);
   }
 
   /**
    * Return true if this node or any of its children satisfy 'predicate'.
    */
+  @SuppressWarnings("unchecked")
   public <C extends TreeNode<NodeType>> boolean contains(
       Predicate<? super C> predicate) {
     if (predicate.apply((C) this)) return true;
@@ -167,7 +181,7 @@ public abstract class TreeNode<NodeType extends TreeNode<NodeType>> {
   /**
    * Return true if this node or any of its children is an instance of class 'cl'.
    */
-  public boolean contains(Class cl) {
+  public <C extends TreeNode<NodeType>> boolean contains(Class<C> cl) {
     if (cl.equals(getClass())) return true;
     for (NodeType child: children_) if (child.contains(cl)) return true;
     return false;
@@ -196,7 +210,7 @@ public abstract class TreeNode<NodeType extends TreeNode<NodeType>> {
    * Return true if any node in nodeList contains children of class cl.
    */
   public static <C extends TreeNode<C>> boolean contains(
-      List<C> nodeList, Class cl) {
+      List<C> nodeList, Class<? extends C> cl) {
     for (C node: nodeList) if (node.contains(cl)) return true;
     return false;
   }
@@ -204,6 +218,7 @@ public abstract class TreeNode<NodeType extends TreeNode<NodeType>> {
   /**
    * Returns the first node/child of class cl (depth-first traversal).
    */
+  @SuppressWarnings("unchecked")
   public <C extends NodeType> C findFirstOf(Class<C> cl) {
     if (this.getClass().equals(cl)) return (C) this;
     for (NodeType child: children_) {
@@ -216,6 +231,7 @@ public abstract class TreeNode<NodeType extends TreeNode<NodeType>> {
   /**
    * Visitor pattern accept method
    */
+  @SuppressWarnings("unchecked")
   public <C extends TreeNode<NodeType>> void accept(Visitor<C> visitor) {
     visitor.visit((C) this);
     for (NodeType p: children_) p.accept(visitor);

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java b/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
index 3685bf4..13f8d44 100644
--- a/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
+++ b/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
@@ -292,7 +292,7 @@ public class AnalyticPlanner {
     // we materialize those rhs TupleIsNullPredicates, which are then substituted
     // by a SlotRef into the sort's tuple in ancestor nodes (IMPALA-1519).
     if (inputSmap != null) {
-      List<Expr> tupleIsNullPreds = Lists.newArrayList();
+      List<TupleIsNullPredicate> tupleIsNullPreds = Lists.newArrayList();
       for (Expr rhsExpr: inputSmap.getRhs()) {
         // Ignore substitutions that are irrelevant at this plan node and its ancestors.
         if (!rhsExpr.isBoundByTupleIds(input.getTupleIds())) continue;
@@ -591,7 +591,7 @@ public class AnalyticPlanner {
    * Extract a minimal set of WindowGroups from analyticExprs.
    */
   private List<WindowGroup> collectWindowGroups() {
-    List<Expr> analyticExprs = analyticInfo_.getAnalyticExprs();
+    List<AnalyticExpr> analyticExprs = analyticInfo_.getAnalyticExprs();
     List<WindowGroup> groups = Lists.newArrayList();
     for (int i = 0; i < analyticExprs.size(); ++i) {
       AnalyticExpr analyticExpr = (AnalyticExpr) analyticExprs.get(i);

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java b/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
index 950382c..1f4c2bd 100644
--- a/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
+++ b/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
@@ -538,7 +538,7 @@ public class HdfsScanNode extends ScanNode {
     if (slotRef.getDesc().isArrayPosRef()) return;
     if (inPred.isNotIn()) return;
 
-    ArrayList<Expr> children = inPred.getChildren();
+    List<Expr> children = inPred.getChildren();
     LiteralExpr min = null;
     LiteralExpr max = null;
     for (int i = 1; i < children.size(); ++i) {

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/planner/Planner.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/planner/Planner.java b/fe/src/main/java/org/apache/impala/planner/Planner.java
index 59d3b8c..eea26e7 100644
--- a/fe/src/main/java/org/apache/impala/planner/Planner.java
+++ b/fe/src/main/java/org/apache/impala/planner/Planner.java
@@ -246,7 +246,7 @@ public class Planner {
    * Uses a default level of EXTENDED, unless overriden by the
    * 'explain_level' query option.
    */
-  public String getExplainString(ArrayList<PlanFragment> fragments,
+  public String getExplainString(List<PlanFragment> fragments,
       TQueryExecRequest request) {
     // use EXTENDED by default for all non-explain statements
     TExplainLevel explainLevel = TExplainLevel.EXTENDED;
@@ -262,7 +262,7 @@ public class Planner {
    * explicit explain level.
    * Includes the estimated resource requirements from the request if set.
    */
-  public String getExplainString(ArrayList<PlanFragment> fragments,
+  public String getExplainString(List<PlanFragment> fragments,
       TQueryExecRequest request, TExplainLevel explainLevel) {
     StringBuilder str = new StringBuilder();
     boolean hasHeader = false;

http://git-wip-us.apache.org/repos/asf/impala/blob/12dc29e5/fe/src/main/java/org/apache/impala/service/Frontend.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/service/Frontend.java b/fe/src/main/java/org/apache/impala/service/Frontend.java
index 8aeafb1..f6aa123 100644
--- a/fe/src/main/java/org/apache/impala/service/Frontend.java
+++ b/fe/src/main/java/org/apache/impala/service/Frontend.java
@@ -1049,7 +1049,7 @@ public class Frontend {
   private TPlanExecInfo createPlanExecInfo(PlanFragment planRoot, Planner planner,
       TQueryCtx queryCtx, TQueryExecRequest queryExecRequest) {
     TPlanExecInfo result = new TPlanExecInfo();
-    ArrayList<PlanFragment> fragments = planRoot.getNodesPreOrder();
+    List<PlanFragment> fragments = planRoot.getNodesPreOrder();
 
     // collect ScanNodes
     List<ScanNode> scanNodes = Lists.newArrayList();
@@ -1147,7 +1147,7 @@ public class Frontend {
 
     // create EXPLAIN output after setting everything else
     result.setQuery_ctx(queryCtx);  // needed by getExplainString()
-    ArrayList<PlanFragment> allFragments = planRoots.get(0).getNodesPreOrder();
+    List<PlanFragment> allFragments = planRoots.get(0).getNodesPreOrder();
     explainString.append(planner.getExplainString(allFragments, result));
     result.setQuery_plan(explainString.toString());
 


[2/3] impala git commit: [DOCS] A number of typos were fixed in impala_dedicated_coordinator

Posted by ar...@apache.org.
[DOCS] A number of typos were fixed in impala_dedicated_coordinator

Change-Id: I5758a5beabdf46feaf52fa0b3ed14bdce4408754
Reviewed-on: http://gerrit.cloudera.org:8080/11986
Reviewed-by: Alex Rodoni <ar...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/fcfabe0f
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/fcfabe0f
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/fcfabe0f

Branch: refs/heads/master
Commit: fcfabe0f5c38a4e37d52d14a1010b02ae2973afb
Parents: e421223
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Mon Nov 26 11:49:28 2018 -0800
Committer: Alex Rodoni <ar...@cloudera.com>
Committed: Mon Nov 26 20:04:24 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_dedicated_coordinator.xml | 54 ++++++++++++-----------
 1 file changed, 28 insertions(+), 26 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/fcfabe0f/docs/topics/impala_dedicated_coordinator.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_dedicated_coordinator.xml b/docs/topics/impala_dedicated_coordinator.xml
index 1b43772..73aa2cf 100644
--- a/docs/topics/impala_dedicated_coordinator.xml
+++ b/docs/topics/impala_dedicated_coordinator.xml
@@ -165,13 +165,13 @@ under the License.
 
       <li >
         <p>
-          Provides a better concurrency by avoiding coordinator bottleneck.
+          Provides better concurrency by avoiding coordinator bottleneck.
         </p>
       </li>
 
       <li>
         <p>
-          Eliminates the query over admission by using one dedicated coordinator.
+          Eliminates query over-admission.
         </p>
       </li>
 
@@ -185,7 +185,7 @@ under the License.
       <li >
         <p>
           Improves reliability and performance for highly concurrent workloads by reducing
-          workload stress on coordinators. Dedicated coordinators require 50% or less
+          workload stress on coordinators. Dedicated coordinators require 50% or fewer
           connections and threads.
         </p>
       </li>
@@ -228,7 +228,7 @@ under the License.
       <p>
         To maintain a healthy state and optimal performance, it is recommended that you keep the
         peak utilization of all resources used by Impala, including CPU, the number of threads,
-        the number of connections, RPCs, under 80%.
+        the number of connections, and RPCs, under 80%.
       </p>
 
       <p >
@@ -337,9 +337,9 @@ under the License.
               On a large cluster with 50+ nodes, the number of network connections from a
               coordinator to executors can grow quickly as query complexity increases. The
               growth is much greater on coordinators than executors. Add a few more coordinators
-              if workload are complex, i.e. (an average number of fragments * number of Impalad)
-              > 500, but with the low memory/CPU usage to share the load. Watch IMPALA-4603 and
-              IMPALA-7213 to track the progress on fixing this issue.
+              if workloads are complex, i.e. (an average number of fragments * number of
+              Impalad) > 500, but with the low memory/CPU usage to share the load. Watch
+              IMPALA-4603 and IMPALA-7213 to track the progress on fixing this issue.
             </li>
 
             <li >
@@ -352,7 +352,7 @@ under the License.
             <li>
               The front-end connection requirement is not a factor in determining the number of
               dedicated coordinators. Consider setting up a connection pool at the client side
-              instead of adding coordinators. For a short term solution, you could increase the
+              instead of adding coordinators. For a short-term solution, you could increase the
               value of <codeph>fe_service_threads</codeph> on coordinators to allow more client
               connections.
             </li>
@@ -591,33 +591,35 @@ under the License.
 
         <li >
           <p>
-            <b>(Dedicated) Executors: </b>They should be collocated with DataNodes as usual.
-            The number of hosts with this setting typically increases as the cluster grows
-            larger and handles more table partitions, data files, and concurrent queries.
+            <b>(Dedicated) Executors: </b>They should be collocated with DataNodes as usual. The
+            number of hosts with this setting typically increases as the cluster grows larger
+            and handles more table partitions, data files, and concurrent queries.
           </p>
         </li>
       </ul>
 
-      <p> To configuring dedicated coordinators/executors, you specify one of
-        the following startup flags for the <cmdname>impalad</cmdname> daemon on
-        each host: <ul>
+      <p>
+        To configuring dedicated coordinators/executors, you specify one of the following
+        startup flags for the <cmdname>impalad</cmdname> daemon on each host:
+        <ul>
           <li>
             <p>
-              <codeph>is_executor=false</codeph> for each host that does not act
-              as an executor for Impala queries. These hosts act exclusively as
-              query coordinators. This setting typically applies to a relatively
-              small number of hosts, because the most common topology is to have
-              nearly all DataNodes doing work for query execution. </p>
+              <codeph>is_executor=false</codeph> for each host that does not act as an executor
+              for Impala queries. These hosts act exclusively as query coordinators. This
+              setting typically applies to a relatively small number of hosts, because the most
+              common topology is to have nearly all DataNodes doing work for query execution.
+            </p>
           </li>
+
           <li>
             <p>
-              <codeph>is_coordinator=false</codeph> for each host that does not
-              act as a coordinator for Impala queries. These hosts act
-              exclusively as executors. The number of hosts with this setting
-              typically increases as the cluster grows larger and handles more
-              table partitions, data files, and concurrent queries. As the
-              overhead for query coordination increases, it becomes more
-              important to centralize that work on dedicated hosts. </p>
+              <codeph>is_coordinator=false</codeph> for each host that does not act as a
+              coordinator for Impala queries. These hosts act exclusively as executors. The
+              number of hosts with this setting typically increases as the cluster grows larger
+              and handles more table partitions, data files, and concurrent queries. As the
+              overhead for query coordination increases, it becomes more important to centralize
+              that work on dedicated hosts.
+            </p>
           </li>
         </ul>
       </p>