You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2019/08/15 20:48:26 UTC

[impala] branch master updated (c3a67b6 -> 60ac7c0)

This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from c3a67b6  IMPALA-8849: fix IllegalStateException with VARCHAR
     new fdb728e  IMPALA-8160: [DOCS] Document CAST (...FORMAT..) function
     new 60ac7c0  IMPALA-8842 part 1: (Hive3) Use 'engine' field in HMS stat API

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 bin/impala-config.sh                               |  12 +-
 docs/topics/impala_conversion_functions.xml        | 781 +++++++++++++++++++--
 fe/pom.xml                                         |   4 +
 .../org/apache/impala/compat/MetastoreShim.java    |  33 +
 .../org/apache/impala/compat/MetastoreShim.java    |  44 ++
 .../main/java/org/apache/impala/catalog/Table.java |   3 +-
 .../impala/catalog/local/DirectMetaProvider.java   |   3 +-
 .../apache/impala/service/CatalogOpExecutor.java   |   7 +-
 .../org/apache/impala/catalog/CatalogTest.java     |  11 +-
 shaded-deps/pom.xml                                |   6 +
 10 files changed, 823 insertions(+), 81 deletions(-)


[impala] 01/02: IMPALA-8160: [DOCS] Document CAST (...FORMAT..) function

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit fdb728e9d84f3400a9adae128c6cbc1c9b4752fe
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Fri Aug 9 16:44:35 2019 -0700

    IMPALA-8160: [DOCS] Document CAST (...FORMAT..) function
    
    - Added the Date and Timestamp patterns supported for the new CAST
      signature.
    
    Change-Id: I6235ffd03ac56e648552058ff02491a55289c092
    Reviewed-on: http://gerrit.cloudera.org:8080/14044
    Reviewed-by: Gabor Kaszab <ga...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 docs/topics/impala_conversion_functions.xml | 781 +++++++++++++++++++++++++---
 1 file changed, 717 insertions(+), 64 deletions(-)

diff --git a/docs/topics/impala_conversion_functions.xml b/docs/topics/impala_conversion_functions.xml
index 120a5fa..2e21414 100644
--- a/docs/topics/impala_conversion_functions.xml
+++ b/docs/topics/impala_conversion_functions.xml
@@ -43,22 +43,15 @@ under the License.
   <conbody>
 
     <p>
-      Conversion functions are usually used in combination with other functions, to explicitly
+      Conversion functions are typically used in combination with other functions to explicitly
       pass the expected data types. Impala has strict rules regarding data types for function
       parameters. For example, Impala does not automatically convert a <codeph>DOUBLE</codeph>
       value to <codeph>FLOAT</codeph>, a <codeph>BIGINT</codeph> value to <codeph>INT</codeph>,
       or other conversion where precision could be lost or overflow could occur. Also, for
-      reporting or dealing with loosely defined schemas in big data contexts, you might
-      frequently need to convert values to or from the <codeph>STRING</codeph> type.
+      reporting or dealing with loosely defined schemas in big data contexts, you might need to
+      convert values to or from the <codeph>STRING</codeph> type.
     </p>
 
-    <note>
-      Although in <keyword keyref="impala23_full"/>, the <codeph>SHOW FUNCTIONS</codeph> output
-      for database <codeph>_IMPALA_BUILTINS</codeph> contains some function signatures matching
-      the pattern <codeph>castto*</codeph>, these functions are not intended for public use and
-      are expected to be hidden in future.
-    </note>
-
     <p>
       <b>Function reference:</b>
     </p>
@@ -81,51 +74,50 @@ under the License.
       <dlentry id="cast">
 
         <dt>
-          CAST(expr AS type)
+          CAST(expression AS type)
         </dt>
 
         <dd>
-          <b>Purpose:</b> Converts the value of an expression to any other type. If the
-          expression value is of a type that cannot be converted to the target type, the result
-          is <codeph>NULL</codeph>.
+          <b>Purpose:</b> Returns <varname>expression</varname> converted to the
+          <varname>type</varname> data type.
           <p>
-            <b>Usage notes:</b> Use <codeph>CAST</codeph> when passing a column value or literal
-            to a function that expects a parameter with a different type. Frequently used in SQL
-            operations such as <codeph>CREATE TABLE AS SELECT</codeph> and <codeph>INSERT ...
-            VALUES</codeph> to ensure that values from various sources are of the appropriate
-            type for the destination columns. Where practical, do a one-time
-            <codeph>CAST()</codeph> operation during the ingestion process to make each column
-            into the appropriate type, rather than using many <codeph>CAST()</codeph> operations
-            in each query; doing type conversions for each row during each query can be
-            expensive for tables with millions or billions of rows.
+            If the <varname>expression</varname> value is of a type that cannot be converted to
+            the target <varname>type</varname>:
+            <ul>
+              <li>
+                Of <codeph>DECIMAL</codeph>, <codeph>DATE</codeph>, and
+                <codeph>BOOLEAN</codeph>, the function returns an error.
+              </li>
+
+              <li>
+                Of all other types, the function returns <codeph>NULL</codeph>.
+              </li>
+            </ul>
           </p>
 
-          <p
-            conref="../shared/impala_common.xml#common/timezone_conversion_caveat"/>
+          <p>
+            <b>Usage notes:</b>
+          </p>
 
-          <p conref="../shared/impala_common.xml#common/example_blurb"/>
-<codeblock>SELECT CONCAT('Here are the first ',10,' results.'); -- Fails
-SELECT CONCAT('Here are the first ',CAST(10 AS STRING),' results.'); -- Succeeds
-</codeblock>
           <p>
-            The following example starts with a text table where every column has a type of
-            <codeph>STRING</codeph>, which might be how you ingest data of unknown schema until
-            you can verify the cleanliness of the underlying values. Then it uses
-            <codeph>CAST()</codeph> to create a new Parquet table with the same data, but using
-            specific numeric data types for the columns with numeric data. Using numeric types
-            of appropriate sizes can result in substantial space savings on disk and in memory,
-            and performance improvements in queries, over using strings or larger-than-necessary
-            numeric types.
+            Use <codeph>CAST</codeph> when passing a column value or literal to a function that
+            expects a parameter with a different type. Frequently used in SQL operations such as
+            <codeph>CREATE TABLE AS SELECT</codeph> and <codeph>INSERT ... VALUES</codeph> to
+            ensure that values from various sources are of the appropriate type for the
+            destination columns.
           </p>
-<codeblock>CREATE TABLE t1 (name STRING, x STRING, y STRING, z STRING);
-
-CREATE TABLE t2 STORED AS PARQUET
-AS SELECT
-  name,
-  CAST(x AS BIGINT) x,
-  CAST(y AS TIMESTAMP) y,
-  CAST(z AS SMALLINT) z
-FROM t1;</codeblock>
+
+          <p>
+            Where practical, do a one-time <codeph>CAST()</codeph> operation during the
+            ingestion process to make each column into the appropriate type, rather than using
+            many <codeph>CAST()</codeph> operations in each query; doing type conversions for
+            each row during each query can be expensive for tables with millions or billions of
+            rows.
+          </p>
+
+          <p
+            conref="../shared/impala_common.xml#common/timezone_conversion_caveat"/>
+
           <p conref="../shared/impala_common.xml#common/related_info"/>
 
           <p>
@@ -142,13 +134,13 @@ FROM t1;</codeblock>
             <xref
               href="impala_double.xml#double"/>,
             <xref
-              href="impala_decimal.xml#decimal"/>,
+              href="impala_decimal.xml#decimal">DECIMAL Data Type</xref>,
             <xref
               href="impala_string.xml#string"/>,
             <xref
-              href="impala_char.xml#char"/>,
+              href="impala_char.xml#char">CHAR Data Type</xref>,
             <xref
-              href="impala_varchar.xml#varchar"/>,
+              href="impala_varchar.xml#varchar">VARCHAR Data Type</xref>,
             <xref
               href="impala_timestamp.xml#timestamp"/>,
             <xref
@@ -158,18 +150,678 @@ FROM t1;</codeblock>
 
       </dlentry>
 
+      <dlentry id="cast_format">
+
+        <dt>
+          CAST(expression AS type FORMAT pattern)
+        </dt>
+
+        <dd>
+          <b>Purpose:</b> Returns <varname>expression</varname> converted to the
+          <varname>type</varname> data type based on the <varname>pattern</varname> format
+          string. This signature of <codeph>CAST()</codeph> with the <codeph>FORMAT</codeph>
+          clause is supported only for casts between <codeph>STRING</codeph> /
+          <codeph>CHAR</codeph> / <codeph>VARCHAR</codeph> types and <codeph>TIMESTAMP</codeph>
+          / <codeph>DATE</codeph> types.
+          <note>
+            The patterns allowed in the <codeph>FORMAT</codeph> clause support ISO SQL:2016
+            standard patterns. Those patterns are not the same format patterns used with the
+            other Impala conversion functions, e.g. <codeph>TO_TIMESTAMP()</codeph> and
+            <codeph>FROM_TIMESTAMP()</codeph>.
+          </note>
+
+          <p>
+            The following rules apply to <varname>pattern</varname>. Any exceptions to these
+            rules are noted in the Details column of the table below.
+            <ul>
+              <li>
+                <varname>pattern</varname> is a case-insensitive <codeph>STRING</codeph>.
+              </li>
+
+              <li>
+                If <varname>pattern</varname> is <codeph>NULL</codeph>, an empty string, or a
+                number, an error returns.
+              </li>
+
+              <li>
+                A fewer digits in <varname>expression</varname> than specified in the
+                <varname>pattern</varname> is accepted if a separator is correctly specified in
+                the <varname>pattern</varname>. For example, <codeph>CAST('5-01-2017' AS DATE
+                FORMAT 'MM-dd-yyyy')</codeph> returns <codeph>DATE'2017-05-01'</codeph>.
+              </li>
+
+              <li>
+                If fewer number of digits are in <varname>expression</varname> than specified in
+                the <varname>pattern</varname>, the current date is used to complete the year
+                pattern. For example, <codeph>CAST('19/05' AS DATE FORMAT 'YYYY/MM')</codeph>
+                will return <codeph>DATE'2019-05-01'</codeph> when executed on August 8, 2019.
+              </li>
+            </ul>
+          </p>
+
+          <p>
+            The following format patterns are supported in the <codeph>FORMAT</codeph> clause.
+            <table frame="all" rowsep="1"
+              colsep="1" id="table_gbt_5ym_r3b">
+              <tgroup cols="3" align="left">
+                <colspec colname="c1" colnum="1" colwidth="1*"/>
+                <colspec colname="c2" colnum="2" colwidth="1.75*"/>
+                <colspec colname="newCol3" colnum="3" colwidth="6.6*"/>
+                <thead>
+                  <row>
+                    <entry>
+                      Pattern
+                    </entry>
+                    <entry>
+                      Description
+                    </entry>
+                    <entry>
+                      Details
+                    </entry>
+                  </row>
+                </thead>
+                <tbody>
+                  <row>
+                    <entry>
+                      <codeph>YYYY</codeph>
+                    </entry>
+                    <entry>
+                      4-digit year.
+                    </entry>
+                    <entry/>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>YYY</codeph>
+                    </entry>
+                    <entry>
+                      Last 3 digits of a year.
+                    </entry>
+                    <entry/>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>YY</codeph>
+                    </entry>
+                    <entry>
+                      Last 2 digits of a year.
+                    </entry>
+                    <entry/>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>Y</codeph>
+                    </entry>
+                    <entry>
+                      Last digit of a year
+                    </entry>
+                    <entry/>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>RRRR</codeph>
+                    </entry>
+                    <entry>
+                      4-digit round year
+                    </entry>
+                    <entry>
+                      <p>
+                        If 1,3 or 4-digit year values are provided in
+                        <varname>expression</varname>, treated as <codeph>YYYY</codeph>.
+                      </p>
+
+
+
+                      <p>
+                        If 2-digit years are provided in <varname>expression</varname>, treated
+                        as <codeph>RR</codeph>.
+                      </p>
+
+
+
+                      <p>
+                        For datetime to string conversions, treated as <codeph>YYYY</codeph>.
+                      </p>
+
+
+
+                      <p>
+                        If <codeph>YYYY</codeph>, <codeph>YYY</codeph>, <codeph>YY</codeph>,
+                        <codeph>Y</codeph>, or <codeph>RR</codeph> is given in the same
+                        <varname>pattern</varname> for a string to datetime conversion, an error
+                        returns.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>RR</codeph>
+                    </entry>
+                    <entry>
+                      2-digit round year.
+                    </entry>
+                    <entry>
+                      <p>
+                        <ul>
+                          <li>
+                            For datetime to string conversion, same as <codeph>YY</codeph>.
+                          </li>
+
+                          <li>
+                            For string to datetime conversions, the first 2 digits of the year
+                            in the return value depends on the specified two-digit year and the
+                            last two digits of the current year as follows:
+                            <ul>
+                              <li>
+                                <p>
+                                  If the specified 2-digit year is 00 to 49:
+                                </p>
+                                <ul>
+                                  <li>
+                                    <p>
+                                      If the last 2 digits of the current year are 00 to 49, the
+                                      returned year has the same first 2 digits as the current
+                                      year.
+                                    </p>
+                                  </li>
+
+                                  <li>
+                                    <p>
+                                      If the last 2 digits of the current year are 50 to 99, the
+                                      first 2 digits of the returned year are 1 greater than the
+                                      first 2 digits of the current year.
+                                    </p>
+                                  </li>
+                                </ul>
+                              </li>
+
+                              <li>
+                                <p>
+                                  If the specified 2-digit year is 50 to 99:
+                                </p>
+                                <ul>
+                                  <li>
+                                    <p>
+                                      If the last 2 digits of the current year are 00 to 49, the
+                                      first 2 digits of the returned year are 1 less than the
+                                      first 2 digits of the current year.
+                                    </p>
+                                  </li>
+
+                                  <li>
+                                    <p>
+                                      If the last 2 digits of the current year are 50 to 99, the
+                                      returned year has the same first 2 digits as the current
+                                      year.
+                                    </p>
+                                  </li>
+                                </ul>
+                              </li>
+                            </ul>
+                          </li>
+                        </ul>
+                      </p>
+
+
+
+                      <p>
+                        If <codeph>YYYY</codeph>, <codeph>YYY</codeph>, <codeph>YY</codeph>,
+                        <codeph>Y</codeph>, or <codeph>RR</codeph> is given in the same
+                        <varname>pattern</varname> for a string to datetime conversion, an error
+                        returns.
+                      </p>
+
+
+
+                      <p>
+                        If 1-digit year values are provided in <varname>expression</varname>, it
+                        is treated as <codeph>YYYY</codeph>.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>MM</codeph>
+                    </entry>
+                    <entry>
+                      Month
+                    </entry>
+                    <entry>
+                      <p>
+                        In datetime to string conversions, 1-digit month is prefixed with a
+                        zero.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>DD</codeph>
+                    </entry>
+                    <entry>
+                      Day of month (1-31)
+                    </entry>
+                    <entry>
+                      <p>
+                        In datetime to string conversions, one digit day is prefixed with a
+                        zero.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>DDD</codeph>
+                    </entry>
+                    <entry>
+                      Day of year (1-366)
+                    </entry>
+                    <entry>
+                      <p>
+                        In string to datetime conversions, providing <codeph>MM</codeph> and
+                        <codeph>DD</codeph> along with <codeph>DDD</codeph> results an error,
+                        e.g. <codeph>YYYY-MM-DDD</codeph>.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>HH</codeph>
+
+                      <p>
+                        <codeph>HH12</codeph>
+                      </p>
+                    </entry>
+                    <entry>
+                      Hour of day (1-12)
+                    </entry>
+                    <entry>
+                      <p>
+                        In datetime to string conversions, 1-digit hours are prefixed with a
+                        zero.
+                      </p>
+
+
+
+                      <p>
+                        If provided hour in <varname>expression</varname> is not between 1 and
+                        12, returns an error.
+                      </p>
+
+
+
+                      <p>
+                        If no AM/PM is provided in <varname>expression</varname>, the default is
+                        <codeph>AM</codeph>.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>HH24</codeph>
+                    </entry>
+                    <entry>
+                      Hour of day (0-23)
+                    </entry>
+                    <entry>
+                      <p>
+                        In string to datetime conversions, if <codeph>HH12</codeph>,
+                        <codeph>AM</codeph>, <codeph>PM</codeph> are given in the same
+                        <varname>pattern</varname>, an error returns.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>MI</codeph>
+                    </entry>
+                    <entry>
+                      Minute of hour (0-59)
+                    </entry>
+                    <entry>
+                      <p>
+                        In datetime to string conversions, 1-digit minutes are prefixed with a
+                        zero.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>SS</codeph>
+                    </entry>
+                    <entry>
+                      Second of minute (0-59)
+                    </entry>
+                    <entry>
+                      <p>
+                        In datetime to string conversions, 1-digit seconds are prefixed with a
+                        zero.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>SSSSS</codeph>
+                    </entry>
+                    <entry>
+                      Second of Day (0-86399)
+                    </entry>
+                    <entry>
+                      In string to timestamp conversions, if <codeph>SS</codeph>,
+                      <codeph>HH</codeph>, <codeph>HH12</codeph>, <codeph>HH24</codeph>,
+                      <codeph>MI</codeph>, <codeph>AM</codeph>/<codeph>PM</codeph> are given in
+                      the same <varname>pattern</varname>, an error returns.
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>FF</codeph>
+
+                      <p>
+                        <codeph>FF1</codeph>, ..., <codeph>FF9</codeph>
+                      </p>
+                    </entry>
+                    <entry>
+                      Fractional second
+                    </entry>
+                    <entry>
+                      <p>
+                        A number, 1 to 9, can be used to indicate the number of digits.
+                      </p>
+
+
+
+                      <p>
+                        <codeph>FF</codeph> specifies a 9 digits fractional second.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>AM</codeph>
+
+                      <p>
+                        <codeph>PM</codeph>
+                      </p>
+
+
+
+                      <p>
+                        <codeph>A.M.</codeph>
+                      </p>
+
+
+
+                      <p>
+                        <codeph>P.M.</codeph>
+                      </p>
+                    </entry>
+                    <entry>
+                      Meridiem indicator
+                    </entry>
+                    <entry>
+                      <p>
+                        For datetime to string conversions, <codeph>AM</codeph> and
+                        <codeph>PM</codeph> are treated as synonyms. For example, casting
+                        <codeph>'2019-01-01 11:00 am'</codeph> to <codeph>TIMESTAMP</codeph>
+                        with the <codeph>'YYYY-MM-DD HH12:MI PM'</codeph> pattern returns
+                        <codeph>01-JAN-19 11.00.00.000000 AM</codeph>.
+                      </p>
+
+
+
+                      <p>
+                        For string to datetime conversion, <codeph>HH24</codeph> in the same
+                        <varname>pattern</varname> returns an error.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>TZH</codeph>
+                    </entry>
+                    <entry>
+                      Timezone offset hour
+                    </entry>
+                    <entry>
+                      <p>
+                        An optional sign, + or -, and 2 digits for the value of signed numbers
+                        are allowed for the source <varname>expression</varname>, e.g.
+                        <codeph>“+10”</codeph>, <codeph>“-05”</codeph>,
+                        <codeph>"04"</codeph>.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>TZM</codeph>
+                    </entry>
+                    <entry>
+                      Timezone offset minute
+                    </entry>
+                    <entry>
+                      <p>
+                        Unsigned numbers are allowed for the source
+                        <varname>expression</varname>.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>-</codeph>
+
+                      <p>
+                        <codeph>.</codeph>
+                      </p>
+
+
+
+                      <p>
+                        <codeph>/</codeph>
+                      </p>
+
+
+
+                      <p>
+                        <codeph>,</codeph>
+                      </p>
+
+
+
+                      <p>
+                        <codeph>'</codeph>
+                      </p>
+
+
+
+                      <p>
+                        <codeph>;</codeph>
+                      </p>
+
+
+
+                      <p>
+                        <codeph>:</codeph>
+                      </p>
+
+
+
+                      <p>
+                        &lt;space>
+                      </p>
+                    </entry>
+                    <entry>
+                      Separator
+                    </entry>
+                    <entry>
+                      <p>
+                        For string to datetime conversions, any separator character in the
+                        <varname>pattern</varname> string would match any separator character in
+                        the input <varname>expression</varname>.
+                      </p>
+
+
+
+                      <p>
+                        For example, <codeph>CAST(“20191010” AS DATE FORMAT
+                        “YYYY-MM-DD”)</codeph> returns an error, but <codeph>CAST("2019-.;10
+                        10" AS DATE FORMAT "YYYY-MM-DD")</codeph> succeeds.
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>T</codeph>
+                    </entry>
+                    <entry>
+                      Separates the date from the time.
+                    </entry>
+                    <entry>
+                      This pattern is used for accepting ISO 8601 datetime formats.
+
+                      <p>
+                        Example: <codeph>YYYY-MM-DD<b>T</b>HH24:MI:SS.FF9Z</codeph>
+                      </p>
+                    </entry>
+                  </row>
+                  <row>
+                    <entry>
+                      <codeph>Z</codeph>
+                    </entry>
+                    <entry>
+                      Indicates the zero hour offset from UTC.
+                    </entry>
+                    <entry>
+                      This pattern is used for accepting ISO 8601 datetime formats.
+                    </entry>
+                  </row>
+                </tbody>
+              </tgroup>
+            </table>
+          </p>
+
+          <p>
+            <b>Examples:</b>
+          </p>
+          <table frame="all" rowsep="1" colsep="1" id="table_qqg_txn_r3b">
+            <tgroup cols="2" align="left">
+              <colspec colname="c1" colnum="1" colwidth="1*"/>
+              <colspec colname="c2" colnum="2" colwidth="1*"/>
+              <thead>
+                <row>
+                  <entry>
+                    Input
+                  </entry>
+                  <entry>
+                    Output
+                  </entry>
+                </row>
+              </thead>
+              <tbody>
+                <row>
+                  <entry>
+                    <codeph>CAST("02-11-2014" AS DATE FORMAT 'dd-mm-yyyy')</codeph>
+                  </entry>
+                  <entry>
+                    2014-11-02
+                  </entry>
+                </row>
+                <row>
+                  <entry>
+                    <codeph>CAST("365 2014" AS DATE FORMAT 'DDD-YYYY')</codeph>
+                  </entry>
+                  <entry>
+                    2014-12-31
+                  </entry>
+                </row>
+                <row>
+                  <entry>
+                    <codeph>CAST("5-01-26" AS DATE FORMAT 'YY-MM-DD')</codeph>
+                  </entry>
+                  <entry>
+                    Executed at 2019-01-01 11:11:11:
+
+                    <p>
+                      2015-01-26
+                    </p>
+                  </entry>
+                </row>
+                <row>
+                  <entry>
+                    <codeph>CAST('2018-11-10T15:11:04Z' AS TIMESTAMP</codeph>
+
+                    <p>
+                      <codeph>FORMAT 'YYYY-MM-DDTHH24:MI:SSZ')</codeph>
+                    </p>
+                  </entry>
+                  <entry>
+                    2018-11-10 15:11:04
+                  </entry>
+                </row>
+                <row>
+                  <entry>
+                    <codeph>CAST("95-01-28" AS DATE FORMAT 'YYY-MM-DD')</codeph>
+                  </entry>
+                  <entry>
+                    Executed at 2019-01-01 11:11:11:
+
+                    <p>
+                      2095-01-28
+                    </p>
+                  </entry>
+                </row>
+                <row>
+                  <entry>
+                    <codeph>CAST("49-01-15" AS DATE FORMAT 'RR-MM-DD')</codeph>
+                  </entry>
+                  <entry>
+                    Round year when last 2 digits of current year is greater than 49.
+
+                    <p>
+                      Executed at 2099-01-01 11:11:11:
+                    </p>
+
+
+
+                    <p>
+                      2149-01-15
+                    </p>
+                  </entry>
+                </row>
+                <row>
+                  <entry>
+                    <codeph>CAST('2019.10.10 13:30:40.123456 +01:30' </codeph>
+
+                    <p>
+                      <codeph>AS TIMESTAMP</codeph>
+                    </p>
+
+
+
+                    <p>
+                      <codeph>FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM')</codeph>
+                    </p>
+                  </entry>
+                  <entry>
+                    2019-10-10 13:30:40.123456000
+                  </entry>
+                </row>
+              </tbody>
+            </tgroup>
+          </table>
+        </dd>
+
+      </dlentry>
+
       <dlentry rev="2.3.0" id="typeof">
 
         <dt>
-          TYPEOF(type value)
+          TYPEOF(expression)
         </dt>
 
         <dd>
-          <b>Purpose:</b> Returns the name of the data type corresponding to an expression. For
-          types with extra attributes, such as length for <codeph>CHAR</codeph> and
-          <codeph>VARCHAR</codeph>, or precision and scale for <codeph>DECIMAL</codeph>,
-          includes the full specification of the type.
-<!-- To do: How about for columns of complex types? Or fields within complex types? -->
+          <b>Purpose:</b> Returns the name of the data type corresponding to
+          <varname>expression</varname>. For types with extra attributes, such as length for
+          <codeph>CHAR</codeph> and <codeph>VARCHAR</codeph>, or precision and scale for
+          <codeph>DECIMAL</codeph>, includes the full specification of the type.
           <p>
             <b>Return type:</b> <codeph>STRING</codeph>
           </p>
@@ -184,16 +836,17 @@ FROM t1;</codeblock>
             result is can be different than that of the operands.
           </p>
 
-          <p conref="../shared/impala_common.xml#common/added_in_230"/>
+          <p
+            conref="../shared/impala_common.xml#common/example_blurb"
+              />
 
-          <p conref="../shared/impala_common.xml#common/example_blurb"/>
-<codeblock>SELECT TYPEOF(2), TYPEOF(2+2);
-+-----------+---------------+
-| typeof(2) | typeof(2 + 2) |
-+-----------+---------------+
-| TINYINT   | SMALLINT      |
-+-----------+---------------+
-</codeblock>
+          <p>
+            <codeph>TYPEOF(2)</codeph> returns <codeph>TINYINT</codeph>.
+          </p>
+
+          <p>
+            <codeph>TYPEOF(NOW())</codeph> returns <codeph>TIMESTAMP</codeph>.
+          </p>
         </dd>
 
       </dlentry>


[impala] 02/02: IMPALA-8842 part 1: (Hive3) Use 'engine' field in HMS stat API

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 60ac7c07511015de130dbd01885421459e96bc3e
Author: Attila Jeges <at...@cloudera.com>
AuthorDate: Tue Aug 6 20:55:13 2019 +0200

    IMPALA-8842 part 1: (Hive3) Use 'engine' field in HMS stat API
    
    HIVE-22046 added 'engine' column to TAB_COL_STATS and PART_COL_STATS
    HMS tables. The new column is used to differentiate among column stats
    computed by different engines. The related HMS API calls were changed
    accordingly.
    
    This change is Step 2 in a series of steps to coordinate the
    introduction of HMS API changes to Hive3 and Impala. Step 4 is also
    Impala related, it will be covered in IMPALA-8842 part 2.
    
    The steps are as follows:
    
    1. Change in Hive3.
    We push new APIs so Impala can use them. New APIs will simply call old
    existing methods so there should not be any change of functionality
    there. Since there were many incompatible changes, new APIs are tagged
    method_name_V2.
    
    2. Change in Impala.
    Push changes to use new methods *V2.
    
    3. Change in Hive3.
    Push patch with complete functionality. *V2 methods contains the new
    logic. The old existing methods are not used anymore by Impala at this
    point, hence they can be removed. For every method_name_V2, I will
    create a corresponding method method_name that calls the former one.
    
    4. Change in Impala
    Replace *V2 calls by *.
    
    5. Change in Hive3.
    Remove *V2 methods from API.
    
    This change also adds exclusions to fe/pom.xml and shaded-deps/pom.xml
    to work around the Hive3 Hadoop2 dependency.
    
    TESTING:
    Only Step 1 has been merged in into Hive3 at this point, which means
    that there's no change in Impala and Hive3 functionality yet.
    
    Change-Id: I9a73f5eeac8e84d63b22aaed5dfbcd8ea39f0af4
    Reviewed-on: http://gerrit.cloudera.org:8080/14043
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 bin/impala-config.sh                               | 12 +++---
 fe/pom.xml                                         |  4 ++
 .../org/apache/impala/compat/MetastoreShim.java    | 33 ++++++++++++++++
 .../org/apache/impala/compat/MetastoreShim.java    | 44 ++++++++++++++++++++++
 .../main/java/org/apache/impala/catalog/Table.java |  3 +-
 .../impala/catalog/local/DirectMetaProvider.java   |  3 +-
 .../apache/impala/service/CatalogOpExecutor.java   |  7 ++--
 .../org/apache/impala/catalog/CatalogTest.java     | 11 +++---
 shaded-deps/pom.xml                                |  6 +++
 9 files changed, 106 insertions(+), 17 deletions(-)

diff --git a/bin/impala-config.sh b/bin/impala-config.sh
index d795748..356eceb 100755
--- a/bin/impala-config.sh
+++ b/bin/impala-config.sh
@@ -168,19 +168,19 @@ fi
 export IMPALA_TOOLCHAIN_HOST
 export CDH_MAJOR_VERSION=6
 export CDH_BUILD_NUMBER=1173663
-export CDP_BUILD_NUMBER=1318335
+export CDP_BUILD_NUMBER=1352353
 export CDH_HADOOP_VERSION=3.0.0-cdh6.x-SNAPSHOT
-export CDP_HADOOP_VERSION=3.1.1.7.0.0.0-365
+export CDP_HADOOP_VERSION=3.1.1.7.1.0.0-33
 export IMPALA_HBASE_VERSION=2.1.0-cdh6.x-SNAPSHOT
 export IMPALA_SENTRY_VERSION=2.1.0-cdh6.x-SNAPSHOT
-export IMPALA_RANGER_VERSION=1.2.0.7.0.0.0-365
+export IMPALA_RANGER_VERSION=1.2.0.7.1.0.0-33
 export IMPALA_PARQUET_VERSION=1.9.0-cdh6.x-SNAPSHOT
 export IMPALA_AVRO_JAVA_VERSION=1.8.2-cdh6.x-SNAPSHOT
 export IMPALA_LLAMA_MINIKDC_VERSION=1.0.0
 export IMPALA_KITE_VERSION=1.0.0-cdh6.x-SNAPSHOT
 export IMPALA_KUDU_JAVA_VERSION=1.10.0-cdh6.x-SNAPSHOT
 export CDH_HIVE_VERSION=2.1.1-cdh6.x-SNAPSHOT
-export CDP_HIVE_VERSION=3.1.0.7.0.0.0-365
+export CDP_HIVE_VERSION=3.1.0.7.1.0.0-33
 
 # When IMPALA_(CDH_COMPONENT)_URL are overridden, they may contain '$(platform_label)'
 # which will be substituted for the CDH platform label in bootstrap_toolchain.py
@@ -207,8 +207,8 @@ if $USE_CDP_HIVE; then
   # When USE_CDP_HIVE is set we use the CDP hive version to build as well as deploy in
   # the minicluster
   export IMPALA_HIVE_VERSION=${CDP_HIVE_VERSION}
-  export IMPALA_TEZ_VERSION=0.9.1.7.0.0.0-365
-  export IMPALA_KNOX_VERSION=1.0.0.7.0.0.0-365
+  export IMPALA_TEZ_VERSION=0.9.1.7.1.0.0-33
+  export IMPALA_KNOX_VERSION=1.0.0.7.1.0.0-33
   export IMPALA_HADOOP_VERSION=${CDP_HADOOP_VERSION}
   export HADOOP_HOME="$CDP_COMPONENTS_HOME/hadoop-${CDP_HADOOP_VERSION}/"
 else
diff --git a/fe/pom.xml b/fe/pom.xml
index 5162b7a..945547d 100644
--- a/fe/pom.xml
+++ b/fe/pom.xml
@@ -999,6 +999,10 @@ under the License.
               <groupId>org.apache.hive</groupId>
               <artifactId>hive-metastore</artifactId>
             </exclusion>
+            <exclusion>
+              <groupId>org.apache.hive.shims</groupId>
+              <artifactId>hive-shims-0.20</artifactId>
+            </exclusion>
           </exclusions>
         </dependency>
         <dependency>
diff --git a/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java b/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
index 0c02cc1..b556774 100644
--- a/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
+++ b/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
@@ -35,9 +35,14 @@ import org.apache.hadoop.hive.metastore.IMetaStoreClient;
 import org.apache.hadoop.hive.metastore.MetaStoreUtils;
 import org.apache.hadoop.hive.metastore.TableType;
 import org.apache.hadoop.hive.metastore.Warehouse;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.InvalidInputException;
+import org.apache.hadoop.hive.metastore.api.InvalidObjectException;
 import org.apache.hadoop.hive.metastore.api.InvalidOperationException;
 import org.apache.hadoop.hive.metastore.api.LockComponent;
 import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
 import org.apache.hadoop.hive.metastore.api.Partition;
 import org.apache.hadoop.hive.metastore.api.Table;
 import org.apache.hadoop.hive.metastore.api.FieldSchema;
@@ -112,6 +117,34 @@ public class MetastoreShim {
   }
 
   /**
+   * Wrapper around IMetaStoreClient.getTableColumnStatistics() to deal with added
+   * arguments.
+   */
+  public static List<ColumnStatisticsObj> getTableColumnStatistics(
+      IMetaStoreClient client, String dbName, String tableName, List<String> colNames)
+      throws NoSuchObjectException, MetaException, TException {
+    return client.getTableColumnStatistics(dbName, tableName, colNames);
+  }
+
+  /**
+   * Wrapper around IMetaStoreClient.deleteTableColumnStatistics() to deal with added
+   * arguments.
+   */
+  public static boolean deleteTableColumnStatistics(IMetaStoreClient client,
+      String dbName, String tableName, String colName)
+      throws NoSuchObjectException, MetaException, InvalidObjectException, TException,
+             InvalidInputException {
+    return client.deleteTableColumnStatistics(dbName, tableName, colName);
+  }
+
+  /**
+   * Wrapper around ColumnStatistics c'tor to deal with the added engine property.
+   */
+  public static ColumnStatistics createNewHiveColStats() {
+    return new ColumnStatistics();
+  }
+
+  /**
    * Wrapper around MetaStoreUtils.updatePartitionStatsFast() to deal with added
    * arguments.
    */
diff --git a/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java b/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
index 7abdf22..44046b6 100644
--- a/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
+++ b/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
@@ -44,7 +44,11 @@ import org.apache.hadoop.hive.metastore.IMetaStoreClient;
 import org.apache.hadoop.hive.metastore.LockRequestBuilder;
 import org.apache.hadoop.hive.metastore.TableType;
 import org.apache.hadoop.hive.metastore.Warehouse;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
 import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.InvalidInputException;
+import org.apache.hadoop.hive.metastore.api.InvalidObjectException;
 import org.apache.hadoop.hive.metastore.api.InvalidOperationException;
 import org.apache.hadoop.hive.metastore.api.LockComponent;
 import org.apache.hadoop.hive.metastore.api.LockRequest;
@@ -52,6 +56,7 @@ import org.apache.hadoop.hive.metastore.api.LockResponse;
 import org.apache.hadoop.hive.metastore.api.LockState;
 import org.apache.hadoop.hive.metastore.api.MetaException;
 import org.apache.hadoop.hive.metastore.api.NoSuchLockException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
 import org.apache.hadoop.hive.metastore.api.NoSuchTxnException;
 import org.apache.hadoop.hive.metastore.api.Partition;
 import org.apache.hadoop.hive.metastore.api.Table;
@@ -116,6 +121,12 @@ public class MetastoreShim {
   private static final int LOCK_RETRY_WAIT_SECONDS = 3;
 
   /**
+   * Constant variable that stores engine value needed to store / access
+   * Impala column statistics.
+   */
+  protected static final String IMPALA_ENGINE = "impala";
+
+  /**
    * Wrapper around MetaStoreUtils.validateName() to deal with added arguments.
    */
   public static boolean validateName(String name) {
@@ -170,6 +181,39 @@ public class MetastoreShim {
     client.alter_partitions(dbName, tblName, partitions, null,
          validWriteIds, tblWriteId);
   }
+
+  /**
+   * Wrapper around IMetaStoreClient.getTableColumnStatistics() to deal with added
+   * arguments.
+   */
+  public static List<ColumnStatisticsObj> getTableColumnStatistics(
+      IMetaStoreClient client, String dbName, String tableName, List<String> colNames)
+      throws NoSuchObjectException, MetaException, TException {
+    return client.getTableColumnStatisticsV2(dbName, tableName, colNames,
+        /*engine*/IMPALA_ENGINE);
+  }
+
+  /**
+   * Wrapper around IMetaStoreClient.deleteTableColumnStatistics() to deal with added
+   * arguments.
+   */
+  public static boolean deleteTableColumnStatistics(IMetaStoreClient client,
+      String dbName, String tableName, String colName)
+      throws NoSuchObjectException, MetaException, InvalidObjectException, TException,
+             InvalidInputException {
+    return client.deleteTableColumnStatisticsV2(dbName, tableName, colName,
+        /*engine*/IMPALA_ENGINE);
+  }
+
+  /**
+   * Wrapper around ColumnStatistics c'tor to deal with the added engine property.
+   */
+  public static ColumnStatistics createNewHiveColStats() {
+    ColumnStatistics colStats = new ColumnStatistics();
+    colStats.setEngine(IMPALA_ENGINE);
+    return colStats;
+  }
+
   /**
    * Wrapper around MetaStoreUtils.updatePartitionStatsFast() to deal with added
    * arguments.
diff --git a/fe/src/main/java/org/apache/impala/catalog/Table.java b/fe/src/main/java/org/apache/impala/catalog/Table.java
index b0264c2..24a6b95 100644
--- a/fe/src/main/java/org/apache/impala/catalog/Table.java
+++ b/fe/src/main/java/org/apache/impala/catalog/Table.java
@@ -272,7 +272,8 @@ public abstract class Table extends CatalogObjectImpl implements FeTable {
     List<String> colNames = getColumnNamesWithHmsStats();
 
     try {
-      colStats = client.getTableColumnStatistics(db_.getName(), name_, colNames);
+      colStats = MetastoreShim.getTableColumnStatistics(client, db_.getName(), name_,
+          colNames);
     } catch (Exception e) {
       LOG.warn("Could not load column statistics for: " + getFullName(), e);
       return;
diff --git a/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java b/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
index 83a5ea8..c732d8b 100644
--- a/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
+++ b/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
@@ -278,7 +278,8 @@ class DirectMetaProvider implements MetaProvider {
       List<String> colNames) throws TException {
     Preconditions.checkArgument(table instanceof TableMetaRefImpl);
     try (MetaStoreClient c = msClientPool_.getClient()) {
-      return c.getHiveClient().getTableColumnStatistics(
+      return MetastoreShim.getTableColumnStatistics(
+          c.getHiveClient(),
           ((TableMetaRefImpl)table).dbName_,
           ((TableMetaRefImpl)table).tableName_,
           colNames);
diff --git a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
index cfc5bca..5a19f0f 100644
--- a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
+++ b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
@@ -941,10 +941,9 @@ public class CatalogOpExecutor {
     }
 
     // Update column stats.
-    ColumnStatistics colStats = null;
     numUpdatedColumns.setRef(0L);
     if (params.isSetColumn_stats()) {
-      colStats = createHiveColStats(params, table);
+      ColumnStatistics colStats = createHiveColStats(params, table);
       if (colStats.getStatsObjSize() > 0) {
         try(MetaStoreClient msClient = catalog_.getMetaStoreClient()) {
           msClient.getHiveClient().updateTableColumnStatistics(colStats);
@@ -1071,7 +1070,7 @@ public class CatalogOpExecutor {
       TAlterTableUpdateStatsParams params, Table table) {
     Preconditions.checkState(params.isSetColumn_stats());
     // Collection of column statistics objects to be returned.
-    ColumnStatistics colStats = new ColumnStatistics();
+    ColumnStatistics colStats = MetastoreShim.createNewHiveColStats();
     colStats.setStatsDesc(
         new ColumnStatisticsDesc(true, table.getDb().getName(), table.getName()));
     // Generate Hive column stats objects from the update stats params.
@@ -1364,7 +1363,7 @@ public class CatalogOpExecutor {
         if (!col.getStats().hasStats()) continue;
 
         try {
-          msClient.getHiveClient().deleteTableColumnStatistics(
+          MetastoreShim.deleteTableColumnStatistics(msClient.getHiveClient(),
               table.getDb().getName(), table.getName(), col.getName());
           ++numColsUpdated;
         } catch (NoSuchObjectException e) {
diff --git a/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java b/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
index 73c47f7..563348e 100644
--- a/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
+++ b/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
@@ -54,6 +54,7 @@ import org.apache.impala.authorization.AuthorizationPolicy;
 import org.apache.impala.catalog.MetaStoreClientPool.MetaStoreClient;
 import org.apache.impala.common.ImpalaException;
 import org.apache.impala.common.Reference;
+import org.apache.impala.compat.MetastoreShim;
 import org.apache.impala.testutil.CatalogServiceTestCatalog;
 import org.apache.impala.testutil.TestUtils;
 import org.apache.impala.thrift.TFunctionBinaryType;
@@ -560,9 +561,9 @@ public class CatalogTest {
     try (MetaStoreClient client = catalog_.getMetaStoreClient()) {
       // Load some string stats data and use it to update the stats of different
       // typed columns.
-      ColumnStatisticsData stringColStatsData = client.getHiveClient()
-          .getTableColumnStatistics("functional", "alltypesagg",
-           Lists.newArrayList("string_col")).get(0).getStatsData();
+      ColumnStatisticsData stringColStatsData = MetastoreShim.getTableColumnStatistics(
+          client.getHiveClient(), "functional", "alltypesagg",
+          Lists.newArrayList("string_col")).get(0).getStatsData();
 
       assertTrue(!table.getColumn("int_col").updateStats(stringColStatsData));
       assertStatsUnknown(table.getColumn("int_col"));
@@ -574,8 +575,8 @@ public class CatalogTest {
       assertStatsUnknown(table.getColumn("bool_col"));
 
       // Do the same thing, but apply bigint stats to a string column.
-      ColumnStatisticsData bigIntCol = client.getHiveClient()
-          .getTableColumnStatistics("functional", "alltypes",
+      ColumnStatisticsData bigIntCol = MetastoreShim.getTableColumnStatistics(
+          client.getHiveClient(), "functional", "alltypes",
           Lists.newArrayList("bigint_col")).get(0).getStatsData();
       assertTrue(!table.getColumn("string_col").updateStats(bigIntCol));
       assertStatsUnknown(table.getColumn("string_col"));
diff --git a/shaded-deps/pom.xml b/shaded-deps/pom.xml
index 579758e..5870894 100644
--- a/shaded-deps/pom.xml
+++ b/shaded-deps/pom.xml
@@ -40,6 +40,12 @@ the same dependencies
       <groupId>org.apache.hive</groupId>
       <artifactId>hive-exec</artifactId>
       <version>${hive.version}</version>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.hive.shims</groupId>
+          <artifactId>hive-shims-0.20</artifactId>
+        </exclusion>
+      </exclusions>
     </dependency>
   </dependencies>
   <build>