You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2019/10/03 23:03:53 UTC
[impala] 01/03: IMPALA-8988: [DOCS] DATE type is supported AVRO
tables
This is an automated email from the ASF dual-hosted git repository.
arodoni pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit d24f868cef5cf94a3774fd2f213859bdd93bebb7
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Tue Oct 1 16:33:53 2019 -0700
IMPALA-8988: [DOCS] DATE type is supported AVRO tables
Change-Id: I95f37accddadcba436676498d5cbb34cda281846
Reviewed-on: http://gerrit.cloudera.org:8080/14340
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Alex Rodoni <ar...@cloudera.com>
---
docs/topics/impala_avro.xml | 126 ++++++++++++++++++++++++++++++--------------
docs/topics/impala_date.xml | 38 ++++++-------
2 files changed, 102 insertions(+), 62 deletions(-)
diff --git a/docs/topics/impala_avro.xml b/docs/topics/impala_avro.xml
index fd85f1a..cca2779 100644
--- a/docs/topics/impala_avro.xml
+++ b/docs/topics/impala_avro.xml
@@ -104,11 +104,6 @@ under the License.
<p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
</note>
- <!--
- To do: Expand these examples to show switching between impala-shell and Hive, loading some data, and then
- doing DESCRIBE and querying the table.
- -->
-
<p>
The following examples demonstrate creating an Avro table in Impala, using either an inline column
specification or one taken from a JSON file stored in HDFS:
@@ -502,41 +497,92 @@ ALTER TABLE avro_table SET TBLPROPERTIES (
<title>Data Type Considerations for Avro Tables</title>
<conbody>
-
- <p>
- The Avro format defines a set of data types whose names differ from the names of the corresponding Impala
- data types. If you are preparing Avro files using other Hadoop components such as Pig or MapReduce, you
- might need to work with the type names defined by Avro. The following figure lists the Avro-defined types
- and the equivalent types in Impala.
- </p>
-
-<codeblock><![CDATA[Primitive Types (Avro -> Impala)
---------------------------------
-STRING -> STRING
-STRING -> CHAR
-STRING -> VARCHAR
-INT -> INT
-BOOLEAN -> BOOLEAN
-LONG -> BIGINT
-FLOAT -> FLOAT
-DOUBLE -> DOUBLE
-
-Logical Types
--------------
-BYTES + logicalType = "decimal" -> DECIMAL
-
-Avro Types with No Impala Equivalent
-------------------------------------
-RECORD, MAP, ARRAY, UNION, ENUM, FIXED, NULL
-
-Impala Types with No Avro Equivalent
-------------------------------------
-TIMESTAMP
-]]>
-</codeblock>
-
- <p conref="../shared/impala_common.xml#common/avro_2gb_strings"/>
-
+ <p> The Avro format defines a set of data types whose names differ from
+ the names of the corresponding Impala data types. If you are preparing
+ Avro files using other Hadoop components such as Pig or MapReduce, you
+ might need to work with the type names defined by Avro. The following
+ figure lists the Avro-defined types and the equivalent types in Impala. </p>
+ <p><b>Primitive types:</b></p>
+ <table frame="all" rowsep="1" colsep="1" id="table_uvv_plj_gjb">
+ <tgroup cols="2" align="left">
+ <colspec colname="c1" colnum="1" colwidth="143.44pt"/>
+ <colspec colname="c2" colnum="2" colwidth="165.77pt"/>
+ <thead>
+ <row>
+ <entry>Avro type</entry>
+ <entry>Impala type</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>STRING</entry>
+ <entry>STRING</entry>
+ </row>
+ <row>
+ <entry>STRING</entry>
+ <entry>CHAR</entry>
+ </row>
+ <row>
+ <entry>STRING</entry>
+ <entry>VARCHAR</entry>
+ </row>
+ <row>
+ <entry>INT</entry>
+ <entry>INT</entry>
+ </row>
+ <row>
+ <entry>BOOLEAN</entry>
+ <entry>BOOLEAN</entry>
+ </row>
+ <row>
+ <entry>LONG</entry>
+ <entry>BIGINT</entry>
+ </row>
+ <row>
+ <entry>FLOAT</entry>
+ <entry>FLOAT</entry>
+ </row>
+ <row>
+ <entry>DOUBLE</entry>
+ <entry>DOUBLE</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ <p>The Avro specification allows string values up to 2**64 bytes in
+ length. Impala queries for Avro tables use 32-bit integers to hold
+ string lengths. </p>
+ <p>In <keyword keyref="impala25_full"/> and higher, Impala truncates
+ <codeph>CHAR</codeph> and <codeph>VARCHAR</codeph> values in Avro
+ tables to (2**31)-1 bytes. If a query encounters a
+ <codeph>STRING</codeph> value longer than (2**31)-1 bytes in an Avro
+ table, the query fails. In earlier releases, encountering such long
+ values in an Avro table could cause a crash.</p>
+ <p><b>Logical types:</b></p>
+ <table frame="all" rowsep="1" colsep="1" id="table_ch2_1mj_gjb">
+ <tgroup cols="2" align="left">
+ <colspec colname="c1" colnum="1" colwidth="151.26pt"/>
+ <colspec colname="c2" colnum="2" colwidth="149.58pt"/>
+ <thead>
+ <row>
+ <entry>Avro type</entry>
+ <entry>Impala type</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>BYTES annotated</entry>
+ <entry>DECIMAL</entry>
+ </row>
+ <row>
+ <entry>INT32 annotated</entry>
+ <entry>DATE</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ <p>Impala does not support the following Avro data types: RECORD, MAP,
+ ARRAY, UNION, ENUM, FIXED, NULL</p>
</conbody>
</concept>
diff --git a/docs/topics/impala_date.xml b/docs/topics/impala_date.xml
index 7f528d7..556936c 100644
--- a/docs/topics/impala_date.xml
+++ b/docs/topics/impala_date.xml
@@ -41,44 +41,38 @@ under the License.
<conbody>
- <p>
- Use the <codeph>DATE</codeph> data type to store date values. The <codeph>DATE</codeph>
- type is supported for HBase, Text, and Parquet.
- </p>
+ <p> Use the <codeph>DATE</codeph> data type to store date values. The
+ <codeph>DATE</codeph> type is supported for HBase, Text, Avro, and
+ Parquet. </p>
<p>
<b>Range:</b>
</p>
- <p>
- 0000-01-01 to 9999-12-31
- </p>
+ <p> 0001-01-01 to 9999-12-31 </p>
<p>
<b>Literals and expressions:</b>
</p>
- <p>
- The <codeph>DATE</codeph> literals are in the form of <codeph>DATE'YYYY-MM-DD'</codeph>.
- For example, <codeph>DATE '2013-01-01'</codeph>
+ <p> The <codeph>DATE</codeph> literals are in the form of
+ <codeph>DATE'YYYY-MM-DD'</codeph>. For examplep, <codeph>DATE
+ '2013-01-01'</codeph>
</p>
<p>
- <b>Parquet considerations:</b>
+ <b>Parquet and Avro considerations:</b>
</p>
- <p>
- Parquet uses <codeph>DATE</codeph> logical type for dates. The <codeph>DATE</codeph>
- logical type annotates an <codeph>INT32</codeph> that stores the number of days from the
- Unix epoch, January 1, 1970. This representation introduces a parquet interoperability
- issue between Impala and older versions of Hive:
- </p>
+ <p> Parquet and Avro use <codeph>DATE</codeph> logical type for dates. The
+ <codeph>DATE</codeph> logical type annotates an <codeph>INT32</codeph>
+ that stores the number of days from the Unix epoch, January 1, 1970. This
+ representation introduces an interoperability issue between Impala and
+ older versions of Hive: </p>
- <p>
- If Hive versions lower than 3.1 wrote dates earlier than 1582-10-15 to a parquet table,
- those dates will be read back incorrectly by Impala and vice versa. In Hive 3.1 and
- higher, this is no longer an issue.
- </p>
+ <p> If Hive versions lower than 3.1 wrote dates earlier than 1582-10-15 to a
+ Parquet or Avro table, those dates would be read back incorrectly by
+ Impala and vice versa. In Hive 3.1 and higher, this is no longer an issue. </p>
<p>
<b>Explicit casting between DATE and other data types:</b>