You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2019/10/03 23:03:53 UTC

[impala] 01/03: IMPALA-8988: [DOCS] DATE type is supported AVRO tables

This is an automated email from the ASF dual-hosted git repository.

arodoni pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit d24f868cef5cf94a3774fd2f213859bdd93bebb7
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Tue Oct 1 16:33:53 2019 -0700

    IMPALA-8988: [DOCS] DATE type is supported AVRO tables
    
    Change-Id: I95f37accddadcba436676498d5cbb34cda281846
    Reviewed-on: http://gerrit.cloudera.org:8080/14340
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Alex Rodoni <ar...@cloudera.com>
---
 docs/topics/impala_avro.xml | 126 ++++++++++++++++++++++++++++++--------------
 docs/topics/impala_date.xml |  38 ++++++-------
 2 files changed, 102 insertions(+), 62 deletions(-)

diff --git a/docs/topics/impala_avro.xml b/docs/topics/impala_avro.xml
index fd85f1a..cca2779 100644
--- a/docs/topics/impala_avro.xml
+++ b/docs/topics/impala_avro.xml
@@ -104,11 +104,6 @@ under the License.
         <p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
       </note>
 
-      <!--
-        To do: Expand these examples to show switching between impala-shell and Hive, loading some data, and then
-        doing DESCRIBE and querying the table.
-      -->
-
       <p>
         The following examples demonstrate creating an Avro table in Impala, using either an inline column
         specification or one taken from a JSON file stored in HDFS:
@@ -502,41 +497,92 @@ ALTER TABLE avro_table SET TBLPROPERTIES (
     <title>Data Type Considerations for Avro Tables</title>
 
     <conbody>
-
-      <p>
-        The Avro format defines a set of data types whose names differ from the names of the corresponding Impala
-        data types. If you are preparing Avro files using other Hadoop components such as Pig or MapReduce, you
-        might need to work with the type names defined by Avro. The following figure lists the Avro-defined types
-        and the equivalent types in Impala.
-      </p>
-
-<codeblock><![CDATA[Primitive Types (Avro -> Impala)
---------------------------------
-STRING -> STRING
-STRING -> CHAR
-STRING -> VARCHAR
-INT -> INT
-BOOLEAN -> BOOLEAN
-LONG ->  BIGINT
-FLOAT ->  FLOAT
-DOUBLE -> DOUBLE
-
-Logical Types
--------------
-BYTES + logicalType = "decimal" -> DECIMAL
-
-Avro Types with No Impala Equivalent
-------------------------------------
-RECORD, MAP, ARRAY, UNION,  ENUM, FIXED, NULL
-
-Impala Types with No Avro Equivalent
-------------------------------------
-TIMESTAMP
-]]>
-</codeblock>
-
-      <p conref="../shared/impala_common.xml#common/avro_2gb_strings"/>
-
+      <p> The Avro format defines a set of data types whose names differ from
+        the names of the corresponding Impala data types. If you are preparing
+        Avro files using other Hadoop components such as Pig or MapReduce, you
+        might need to work with the type names defined by Avro. The following
+        figure lists the Avro-defined types and the equivalent types in Impala. </p>
+      <p><b>Primitive types:</b></p>
+      <table frame="all" rowsep="1" colsep="1" id="table_uvv_plj_gjb">
+        <tgroup cols="2" align="left">
+          <colspec colname="c1" colnum="1" colwidth="143.44pt"/>
+          <colspec colname="c2" colnum="2" colwidth="165.77pt"/>
+          <thead>
+            <row>
+              <entry>Avro type</entry>
+              <entry>Impala type</entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry>STRING</entry>
+              <entry>STRING</entry>
+            </row>
+            <row>
+              <entry>STRING</entry>
+              <entry>CHAR</entry>
+            </row>
+            <row>
+              <entry>STRING</entry>
+              <entry>VARCHAR</entry>
+            </row>
+            <row>
+              <entry>INT</entry>
+              <entry>INT</entry>
+            </row>
+            <row>
+              <entry>BOOLEAN</entry>
+              <entry>BOOLEAN</entry>
+            </row>
+            <row>
+              <entry>LONG</entry>
+              <entry>BIGINT</entry>
+            </row>
+            <row>
+              <entry>FLOAT</entry>
+              <entry>FLOAT</entry>
+            </row>
+            <row>
+              <entry>DOUBLE</entry>
+              <entry>DOUBLE</entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </table>
+      <p>The Avro specification allows string values up to 2**64 bytes in
+        length. Impala queries for Avro tables use 32-bit integers to hold
+        string lengths. </p>
+      <p>In <keyword keyref="impala25_full"/> and higher, Impala truncates
+          <codeph>CHAR</codeph> and <codeph>VARCHAR</codeph> values in Avro
+        tables to (2**31)-1 bytes. If a query encounters a
+          <codeph>STRING</codeph> value longer than (2**31)-1 bytes in an Avro
+        table, the query fails. In earlier releases, encountering such long
+        values in an Avro table could cause a crash.</p>
+      <p><b>Logical types:</b></p>
+      <table frame="all" rowsep="1" colsep="1" id="table_ch2_1mj_gjb">
+        <tgroup cols="2" align="left">
+          <colspec colname="c1" colnum="1" colwidth="151.26pt"/>
+          <colspec colname="c2" colnum="2" colwidth="149.58pt"/>
+          <thead>
+            <row>
+              <entry>Avro type</entry>
+              <entry>Impala type</entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry>BYTES annotated</entry>
+              <entry>DECIMAL</entry>
+            </row>
+            <row>
+              <entry>INT32 annotated</entry>
+              <entry>DATE</entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </table>
+      <p>Impala does not support the following Avro data types: RECORD, MAP,
+        ARRAY, UNION,  ENUM, FIXED, NULL</p>
     </conbody>
   </concept>
 
diff --git a/docs/topics/impala_date.xml b/docs/topics/impala_date.xml
index 7f528d7..556936c 100644
--- a/docs/topics/impala_date.xml
+++ b/docs/topics/impala_date.xml
@@ -41,44 +41,38 @@ under the License.
 
   <conbody>
 
-    <p>
-      Use the <codeph>DATE</codeph> data type to store date values. The <codeph>DATE</codeph>
-      type is supported for HBase, Text, and Parquet.
-    </p>
+    <p> Use the <codeph>DATE</codeph> data type to store date values. The
+        <codeph>DATE</codeph> type is supported for HBase, Text, Avro, and
+      Parquet. </p>
 
     <p>
       <b>Range:</b>
     </p>
 
-    <p>
-      0000-01-01 to 9999-12-31
-    </p>
+    <p> 0001-01-01 to 9999-12-31 </p>
 
     <p>
       <b>Literals and expressions:</b>
     </p>
 
-    <p>
-      The <codeph>DATE</codeph> literals are in the form of <codeph>DATE'YYYY-MM-DD'</codeph>.
-      For example, <codeph>DATE '2013-01-01'</codeph>
+    <p> The <codeph>DATE</codeph> literals are in the form of
+        <codeph>DATE'YYYY-MM-DD'</codeph>. For examplep, <codeph>DATE
+        '2013-01-01'</codeph>
     </p>
 
     <p>
-      <b>Parquet considerations:</b>
+      <b>Parquet and Avro considerations:</b>
     </p>
 
-    <p>
-      Parquet uses <codeph>DATE</codeph> logical type for dates. The <codeph>DATE</codeph>
-      logical type annotates an <codeph>INT32</codeph> that stores the number of days from the
-      Unix epoch, January 1, 1970. This representation introduces a parquet interoperability
-      issue between Impala and older versions of Hive:
-    </p>
+    <p> Parquet and Avro use <codeph>DATE</codeph> logical type for dates. The
+        <codeph>DATE</codeph> logical type annotates an <codeph>INT32</codeph>
+      that stores the number of days from the Unix epoch, January 1, 1970. This
+      representation introduces an interoperability issue between Impala and
+      older versions of Hive: </p>
 
-    <p>
-      If Hive versions lower than 3.1 wrote dates earlier than 1582-10-15 to a parquet table,
-      those dates will be read back incorrectly by Impala and vice versa. In Hive 3.1 and
-      higher, this is no longer an issue.
-    </p>
+    <p> If Hive versions lower than 3.1 wrote dates earlier than 1582-10-15 to a
+      Parquet or Avro table, those dates would be read back incorrectly by
+      Impala and vice versa. In Hive 3.1 and higher, this is no longer an issue. </p>
 
     <p>
       <b>Explicit casting between DATE and other data types:</b>