You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2019/08/12 18:55:22 UTC
[impala] 01/03: IMPALA-7374: [DOCS] Document the new DATE data type
in Impala
This is an automated email from the ASF dual-hosted git repository.
tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit 8eb50076c2b232b60fa5e44fb9341c752e2bf417
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Thu Aug 1 16:08:50 2019 -0700
IMPALA-7374: [DOCS] Document the new DATE data type in Impala
Change-Id: I0c28361c7f0d225708eaf4b955c6704520eaaa68
Reviewed-on: http://gerrit.cloudera.org:8080/13983
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Attila Jeges <at...@cloudera.com>
---
docs/impala.ditamap | 2 +-
docs/topics/impala_date.xml | 196 +++++++++++++++++++--------
docs/topics/impala_literals.xml | 284 ++++++++++++++++++++++++----------------
3 files changed, 311 insertions(+), 171 deletions(-)
diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index b9bdb6d..2479d72 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -94,7 +94,7 @@ under the License.
<topicref href="topics/impala_boolean.xml"/>
<topicref href="topics/impala_char.xml"/>
<topicref href="topics/impala_decimal.xml"/>
- <topicref audience="hidden" href="topics/impala_date.xml"/>
+ <topicref href="topics/impala_date.xml"/>
<topicref href="topics/impala_double.xml"/>
<topicref href="topics/impala_float.xml"/>
<topicref href="topics/impala_int.xml"/>
diff --git a/docs/topics/impala_date.xml b/docs/topics/impala_date.xml
index 228fc92..7f528d7 100644
--- a/docs/topics/impala_date.xml
+++ b/docs/topics/impala_date.xml
@@ -18,9 +18,16 @@ specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
-<concept audience="hidden" id="date" rev="2.0.0">
+<concept id="date" rev="2.0.0">
+
+ <title>DATE Data Type</title>
+
+ <titlealts audience="PDF">
+
+ <navtitle>DATE</navtitle>
+
+ </titlealts>
- <title>DATE Data Type (<keyword keyref="impala21"/> or higher only)</title>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
@@ -35,88 +42,169 @@ under the License.
<conbody>
<p>
- <indexterm audience="hidden">DATE data type</indexterm>
- A type representing the date (year, month, and day) as a single numeric value. Used to represent a broader
- date range than possible with the <codeph>TIMESTAMP</codeph> type, with fewer distinct values than
- <codeph>TIMESTAMP</codeph>, and in a more compact and efficient form than using a <codeph>STRING</codeph>
- such as <codeph>'2014-12-31'</codeph>.
+ Use the <codeph>DATE</codeph> data type to store date values. The <codeph>DATE</codeph>
+ type is supported for HBase, Text, and Parquet.
</p>
- <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
-
-<codeblock><varname>column_name</varname> DATE</codeblock>
-
<p>
- <b>Range:</b> January 1, -4712 BC .. December 31, 9999 AD.
+ <b>Range:</b>
</p>
- <p conref="../shared/impala_common.xml#common/hbase_ok"/>
-
- <p conref="../shared/impala_common.xml#common/parquet_blurb"/>
-
- <ul>
- <li>
- This type can be read from and written to Parquet files.
- </li>
-
- <li>
- There is no requirement for a particular level of Parquet.
- </li>
+ <p>
+ 0000-01-01 to 9999-12-31
+ </p>
- <li>
- Parquet files generated by Impala and containing this type can be freely interchanged with other components
- such as Hive and MapReduce.
- </li>
- </ul>
+ <p>
+ <b>Literals and expressions:</b>
+ </p>
- <p conref="../shared/impala_common.xml#common/hive_blurb"/>
+ <p>
+ The <codeph>DATE</codeph> literals are in the form of <codeph>DATE'YYYY-MM-DD'</codeph>.
+ For example, <codeph>DATE '2013-01-01'</codeph>
+ </p>
<p>
- TK.
+ <b>Parquet considerations:</b>
</p>
- <p conref="../shared/impala_common.xml#common/conversion_blurb"/>
+ <p>
+ Parquet uses <codeph>DATE</codeph> logical type for dates. The <codeph>DATE</codeph>
+ logical type annotates an <codeph>INT32</codeph> that stores the number of days from the
+ Unix epoch, January 1, 1970. This representation introduces a parquet interoperability
+ issue between Impala and older versions of Hive:
+ </p>
<p>
- TK.
+ If Hive versions lower than 3.1 wrote dates earlier than 1582-10-15 to a parquet table,
+ those dates will be read back incorrectly by Impala and vice versa. In Hive 3.1 and
+ higher, this is no longer an issue.
</p>
- <p conref="../shared/impala_common.xml#common/partitioning_blurb"/>
+ <p>
+ <b>Explicit casting between DATE and other data types:</b>
+ </p>
<p>
- This type can be used for partition key columns. Because it has less granularity (and thus fewer distinct
- values) than an equivalent <codeph>TIMESTAMP</codeph> column, and numeric columns are more efficient as
- partition keys than strings, prefer to partition by a <codeph>DATE</codeph> column rather than a
- <codeph>TIMESTAMP</codeph> column or a <codeph>STRING</codeph> representation of a date.
+ <codeph>DATE</codeph> type can only be converted to/from <codeph>DATE</codeph>,
+ <codeph>TIMESTAMP</codeph>, or <codeph>STRING</codeph> types as described below.
</p>
- <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
+ <table id="table_ovw_zt1_p3b">
+ <tgroup cols="3">
+ <colspec colnum="1" colname="col1" colwidth="1*"/>
+ <colspec colname="newCol2" colnum="2" colwidth="1.09*"/>
+ <colspec colnum="3" colname="col2" colwidth="3.46*"/>
+ <thead>
+ <row>
+ <entry>
+ Cast from
+ </entry>
+ <entry>
+ Cast to
+ </entry>
+ <entry>
+ Result
+ </entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>
+ <codeph>TIMESTAMP</codeph>
+ </entry>
+ <entry>
+ <codeph>DATE</codeph>
+ </entry>
+ <entry>
+ The date component of the <codeph>TIMESTAMP</codeph> is returned, and the time of
+ the day component of the <codeph>TIMESTAMP</codeph> is ignored.
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <codeph>STRING</codeph>
+ </entry>
+ <entry>
+ <codeph>DATE</codeph>
+ </entry>
+ <entry>
+ The <codeph>DATE</codeph> value of <codeph>yyyy-MM-dd</codeph> is returned.
+
+ <p>
+ The <codeph>STRING</codeph> value must be in the <codeph>yyyy-MM-dd</codeph> or
+ <codeph>yyyy-MM-dd HH:mm:ss.SSSSSSSSS</codeph> pattern.
+ </p>
+
+
+
+ <p>
+ If the time component is present in <codeph>STRING</codeph>, it is silently
+ ignored.
+ </p>
+
+
+
+ <p>
+ If the <codeph>STRING</codeph> value does not match the above formats, an error
+ is returned.
+ </p>
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <codeph>DATE</codeph>
+ </entry>
+ <entry>
+ <codeph>TIMESTAMP</codeph>
+ </entry>
+ <entry>
+ The year, month, and day of the <codeph>DATE</codeph> is returned along with the
+ time of day component set to <codeph>00:00:00</codeph>.
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <codeph>DATE</codeph>
+ </entry>
+ <entry>
+ <codeph>STRING</codeph>
+ </entry>
+ <entry>
+ The <codeph>STRING</codeph> value, <codeph>'yyyy-MM-dd'</codeph>, is returned.
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
<p>
- This type is available on <keyword keyref="impala20_full"/> or higher.
+ <b>Implicit casting between DATE and other types: </b>
</p>
- <p conref="../shared/impala_common.xml#common/internals_2_bytes"/>
-
- <p conref="../shared/impala_common.xml#common/added_in_20"/>
+ <p>
+ Implicit casting is supported:
+ </p>
- <p conref="../shared/impala_common.xml#common/column_stats_constant"/>
+ <ul>
+ <li>
+ From <codeph>STRING</codeph> to <codeph>DATE</codeph> if the source
+ <codeph>STRING</codeph> value is in the <codeph>yyyy-MM-dd</codeph> or
+ <codeph>yyyy-MM-dd HH:mm:ss.SSSSSSSSS</codeph> pattern.
+ </li>
- <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+ <li>
+ From <codeph>DATE</codeph> to <codeph>TIMESTAMP</codeph>.
+ </li>
+ </ul>
<p>
- Things happen when converting <codeph>TIMESTAMP</codeph> to <codeph>DATE</codeph> or <codeph>DATE</codeph> to
- <codeph>TIMESTAMP</codeph>. TK.
+ <b>Added in:</b>
</p>
- <p conref="../shared/impala_common.xml#common/example_blurb"/>
-
- <p conref="../shared/impala_common.xml#common/related_info"/>
-
<p>
- The <xref href="impala_timestamp.xml#timestamp">TIMESTAMP</xref> data type is closely related. Some functions
- from <xref href="impala_datetime_functions.xml#datetime_functions"/> accept and return <codeph>DATE</codeph>
- values.
+ The <codeph>DATE</codeph> type is available in Impala 3.3 and higher.
</p>
+
</conbody>
+
</concept>
diff --git a/docs/topics/impala_literals.xml b/docs/topics/impala_literals.xml
index 32a69da..1bdbad5 100644
--- a/docs/topics/impala_literals.xml
+++ b/docs/topics/impala_literals.xml
@@ -21,6 +21,7 @@ under the License.
<concept id="literals">
<title>Literals</title>
+
<prolog>
<metadata>
<data name="Category" value="Impala"/>
@@ -34,14 +35,16 @@ under the License.
<conbody>
<p>
- <indexterm audience="hidden">literals</indexterm>
- Each of the Impala data types has corresponding notation for literal values of that type. You specify literal
- values in SQL statements, such as in the <codeph>SELECT</codeph> list or <codeph>WHERE</codeph> clause of a
- query, or as an argument to a function call. See <xref href="impala_datatypes.xml#datatypes"/> for a complete
- list of types, ranges, and conversion rules.
+ Each of the Impala data types has corresponding notation for literal values of that type.
+ You specify literal values in SQL statements, such as in the <codeph>SELECT</codeph> list
+ or <codeph>WHERE</codeph> clause of a query, or as an argument to a function call. See
+ <xref
+ href="impala_datatypes.xml#datatypes"/> for a complete list of types,
+ ranges, and conversion rules.
</p>
<p outputclass="toc inpage"/>
+
</conbody>
<concept id="numeric_literals">
@@ -51,17 +54,18 @@ under the License.
<conbody>
<p>
- <indexterm audience="hidden">numeric literals</indexterm>
- To write literals for the integer types (<codeph>TINYINT</codeph>, <codeph>SMALLINT</codeph>,
- <codeph>INT</codeph>, and <codeph>BIGINT</codeph>), use a sequence of digits with optional leading zeros.
+ To write literals for the integer types (<codeph>TINYINT</codeph>,
+ <codeph>SMALLINT</codeph>, <codeph>INT</codeph>, and <codeph>BIGINT</codeph>), use a
+ sequence of digits with optional leading zeros.
</p>
<p rev="1.4.0">
To write literals for the floating-point types (<codeph rev="1.4.0">DECIMAL</codeph>,
- <codeph>FLOAT</codeph>, and <codeph>DOUBLE</codeph>), use a sequence of digits with an optional decimal
- point (<codeph>.</codeph> character). To preserve accuracy during arithmetic expressions, Impala interprets
- floating-point literals as the <codeph>DECIMAL</codeph> type with the smallest appropriate precision and
- scale, until required by the context to convert the result to <codeph>FLOAT</codeph> or
+ <codeph>FLOAT</codeph>, and <codeph>DOUBLE</codeph>), use a sequence of digits with an
+ optional decimal point (<codeph>.</codeph> character). To preserve accuracy during
+ arithmetic expressions, Impala interprets floating-point literals as the
+ <codeph>DECIMAL</codeph> type with the smallest appropriate precision and scale, until
+ required by the context to convert the result to <codeph>FLOAT</codeph> or
<codeph>DOUBLE</codeph>.
</p>
@@ -70,20 +74,22 @@ under the License.
</p>
<p>
- You can also use exponential notation by including an <codeph>e</codeph> character. For example,
- <codeph>1e6</codeph> is 1 times 10 to the power of 6 (1 million). A number in exponential notation is
- always interpreted as floating-point.
+ You can also use exponential notation by including an <codeph>e</codeph> character. For
+ example, <codeph>1e6</codeph> is 1 times 10 to the power of 6 (1 million). A number in
+ exponential notation is always interpreted as floating-point.
</p>
<p rev="tk">
- When Impala encounters a numeric literal, it considers the type to be the <q>smallest</q> that can
- accurately represent the value. The type is promoted to larger or more accurate types if necessary, based
- on subsequent parts of an expression.
+ When Impala encounters a numeric literal, it considers the type to be the
+ <q>smallest</q> that can accurately represent the value. The type is promoted to larger
+ or more accurate types if necessary, based on subsequent parts of an expression.
</p>
+
<p>
- For example, you can see by the types Impala defines for the following table columns
- how it interprets the corresponding numeric literals:
+ For example, you can see by the types Impala defines for the following table columns how
+ it interprets the corresponding numeric literals:
</p>
+
<codeblock>[localhost:21000] > create table ten as select 10 as x;
+-------------------+
| summary |
@@ -136,7 +142,9 @@ under the License.
| x | decimal(4,3) | |
+------+--------------+---------+
</codeblock>
+
</conbody>
+
</concept>
<concept id="string_literals">
@@ -146,15 +154,16 @@ under the License.
<conbody>
<p>
- <indexterm audience="hidden">string literals</indexterm>
- String literals are quoted using either single or double quotation marks. You can use either kind of quotes
- for string literals, even both kinds for different literals within the same statement.
+ String literals are quoted using either single or double quotation marks. You can use
+ either kind of quotes for string literals, even both kinds for different literals within
+ the same statement.
</p>
<p rev="2.0.0">
- Quoted literals are considered to be of type <codeph>STRING</codeph>. To use quoted literals in contexts
- requiring a <codeph>CHAR</codeph> or <codeph>VARCHAR</codeph> value, <codeph>CAST()</codeph> the literal to
- a <codeph>CHAR</codeph> or <codeph>VARCHAR</codeph> of the appropriate length.
+ Quoted literals are considered to be of type <codeph>STRING</codeph>. To use quoted
+ literals in contexts requiring a <codeph>CHAR</codeph> or <codeph>VARCHAR</codeph>
+ value, <codeph>CAST()</codeph> the literal to a <codeph>CHAR</codeph> or
+ <codeph>VARCHAR</codeph> of the appropriate length.
</p>
<p>
@@ -162,8 +171,8 @@ under the License.
</p>
<p>
- To encode special characters within a string literal, precede them with the backslash (<codeph>\</codeph>)
- escape character:
+ To encode special characters within a string literal, precede them with the backslash
+ (<codeph>\</codeph>) escape character:
</p>
<ul>
@@ -172,53 +181,57 @@ under the License.
</li>
<li>
- <codeph>\n</codeph> represents a newline or linefeed. This might cause extra line breaks in
- <cmdname>impala-shell</cmdname> output.
+ <codeph>\n</codeph> represents a newline or linefeed. This might cause extra line
+ breaks in <cmdname>impala-shell</cmdname> output.
</li>
<li>
- <codeph>\r</codeph> represents a carriage return. This might cause unusual formatting (making it appear
- that some content is overwritten) in <cmdname>impala-shell</cmdname> output.
+ <codeph>\r</codeph> represents a carriage return. This might cause unusual formatting
+ (making it appear that some content is overwritten) in <cmdname>impala-shell</cmdname>
+ output.
</li>
<li>
- <codeph>\b</codeph> represents a backspace. This might cause unusual formatting (making it appear that
- some content is overwritten) in <cmdname>impala-shell</cmdname> output.
+ <codeph>\b</codeph> represents a backspace. This might cause unusual formatting
+ (making it appear that some content is overwritten) in <cmdname>impala-shell</cmdname>
+ output.
</li>
<li>
- <codeph>\0</codeph> represents an ASCII <codeph>nul</codeph> character (not the same as a SQL
- <codeph>NULL</codeph>). This might not be visible in <cmdname>impala-shell</cmdname> output.
+ <codeph>\0</codeph> represents an ASCII <codeph>nul</codeph> character (not the same
+ as a SQL <codeph>NULL</codeph>). This might not be visible in
+ <cmdname>impala-shell</cmdname> output.
</li>
<li>
- <codeph>\Z</codeph> represents a DOS end-of-file character. This might not be visible in
- <cmdname>impala-shell</cmdname> output.
+ <codeph>\Z</codeph> represents a DOS end-of-file character. This might not be visible
+ in <cmdname>impala-shell</cmdname> output.
</li>
<li>
- <codeph>\%</codeph> and <codeph>\_</codeph> can be used to escape wildcard characters within the string
- passed to the <codeph>LIKE</codeph> operator.
+ <codeph>\%</codeph> and <codeph>\_</codeph> can be used to escape wildcard characters
+ within the string passed to the <codeph>LIKE</codeph> operator.
</li>
<li>
- <codeph>\</codeph> followed by 3 octal digits represents the ASCII code of a single character; for
- example, <codeph>\101</codeph> is ASCII 65, the character <codeph>A</codeph>.
+ <codeph>\</codeph> followed by 3 octal digits represents the ASCII code of a single
+ character; for example, <codeph>\101</codeph> is ASCII 65, the character
+ <codeph>A</codeph>.
</li>
<li>
- Use two consecutive backslashes (<codeph>\\</codeph>) to prevent the backslash from being interpreted as
- an escape character.
+ Use two consecutive backslashes (<codeph>\\</codeph>) to prevent the backslash from
+ being interpreted as an escape character.
</li>
<li>
- Use the backslash to escape single or double quotation mark characters within a string literal, if the
- literal is enclosed by the same type of quotation mark.
+ Use the backslash to escape single or double quotation mark characters within a string
+ literal, if the literal is enclosed by the same type of quotation mark.
</li>
<li>
- If the character following the <codeph>\</codeph> does not represent the start of a recognized escape
- sequence, the character is passed through unchanged.
+ If the character following the <codeph>\</codeph> does not represent the start of a
+ recognized escape sequence, the character is passed through unchanged.
</li>
</ul>
@@ -227,16 +240,18 @@ under the License.
</p>
<p>
- To include a single quotation character within a string value, enclose the literal with either single or
- double quotation marks, and optionally escape the single quote as a <codeph>\'</codeph> sequence. Earlier
- releases required escaping a single quote inside double quotes. Continue using escape sequences in this
- case if you also need to run your SQL code on older versions of Impala.
+ To include a single quotation character within a string value, enclose the literal with
+ either single or double quotation marks, and optionally escape the single quote as a
+ <codeph>\'</codeph> sequence. Earlier releases required escaping a single quote inside
+ double quotes. Continue using escape sequences in this case if you also need to run your
+ SQL code on older versions of Impala.
</p>
<p>
- To include a double quotation character within a string value, enclose the literal with single quotation
- marks, no escaping is necessary in this case. Or, enclose the literal with double quotation marks and
- escape the double quote as a <codeph>\"</codeph> sequence.
+ To include a double quotation character within a string value, enclose the literal with
+ single quotation marks, no escaping is necessary in this case. Or, enclose the literal
+ with double quotation marks and escape the double quote as a <codeph>\"</codeph>
+ sequence.
</p>
<codeblock>[localhost:21000] > select "What\'s happening?" as single_within_double,
@@ -261,13 +276,15 @@ under the License.
</p>
<p>
- When dealing with output that includes non-ASCII or non-printable characters such as linefeeds and
- backspaces, use the <cmdname>impala-shell</cmdname> options to save to a file, turn off pretty printing, or
- both rather than relying on how the output appears visually. See
- <xref href="impala_shell_options.xml#shell_options"/> for a list of <cmdname>impala-shell</cmdname>
- options.
+ When dealing with output that includes non-ASCII or non-printable characters such as
+ linefeeds and backspaces, use the <cmdname>impala-shell</cmdname> options to save to a
+ file, turn off pretty printing, or both rather than relying on how the output appears
+ visually. See <xref href="impala_shell_options.xml#shell_options"/> for a list of
+ <cmdname>impala-shell</cmdname> options.
</p>
+
</conbody>
+
</concept>
<concept id="boolean_literals">
@@ -277,8 +294,8 @@ under the License.
<conbody>
<p>
- For <codeph>BOOLEAN</codeph> values, the literals are <codeph>TRUE</codeph> and <codeph>FALSE</codeph>,
- with no quotation marks and case-insensitive.
+ For <codeph>BOOLEAN</codeph> values, the literals are <codeph>TRUE</codeph> and
+ <codeph>FALSE</codeph>, with no quotation marks and case-insensitive.
</p>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
@@ -286,7 +303,9 @@ under the License.
<codeblock>select true;
select * from t1 where assertion = false;
select case bool_col when true then 'yes' when false 'no' else 'null' end from t1;</codeblock>
+
</conbody>
+
</concept>
<concept id="timestamp_literals">
@@ -297,20 +316,41 @@ select case bool_col when true then 'yes' when false 'no' else 'null' end from t
<p conref="../shared/impala_common.xml#common/timestamp_conversions"
conrefend="../shared/impala_common.xml#common/cast_string_to_timestamp"/>
- <p> You can also use <codeph>INTERVAL</codeph> expressions to add or
- subtract from timestamp literal values, such as
- <codeph>CAST('1966‑07‑30' AS
- TIMESTAMP) + INTERVAL 5 YEARS + INTERVAL 3 DAYS</codeph>. See <xref
- href="impala_timestamp.xml#timestamp"/> for details. </p>
<p>
- Depending on your data pipeline, you might receive date and time data as text, in notation that does not
- exactly match the format for Impala <codeph>TIMESTAMP</codeph> literals.
- See <xref href="impala_datetime_functions.xml#datetime_functions"/> for functions that can convert
- between a variety of string literals (including different field order, separators, and timezone notation)
- and equivalent <codeph>TIMESTAMP</codeph> or numeric values.
+ You can also use <codeph>INTERVAL</codeph> expressions to add or subtract from timestamp
+ literal values, such as <codeph>CAST('1966‑07‑30' AS
+ TIMESTAMP) + INTERVAL 5 YEARS + INTERVAL 3 DAYS</codeph>. See
+ <xref
+ href="impala_timestamp.xml#timestamp"/> for details.
+ </p>
+
+ <p>
+ Depending on your data pipeline, you might receive date and time data as text, in
+ notation that does not exactly match the format for Impala <codeph>TIMESTAMP</codeph>
+ literals. See <xref href="impala_datetime_functions.xml#datetime_functions"/> for
+ functions that can convert between a variety of string literals (including different
+ field order, separators, and timezone notation) and equivalent
+ <codeph>TIMESTAMP</codeph> or numeric values.
+ </p>
+
+ </conbody>
+
+ </concept>
+
+ <concept id="date_literals">
+
+ <title>Date Literals</title>
+
+ <conbody>
+
+ <p>
+ The <codeph>DATE</codeph> literals are in the form of <codeph>DATE'YYYY-MM-DD'</codeph>.
+ For example, <codeph>DATE '2013-01-01'</codeph>
</p>
+
</conbody>
+
</concept>
<concept id="null">
@@ -320,34 +360,36 @@ select case bool_col when true then 'yes' when false 'no' else 'null' end from t
<conbody>
<p>
- <indexterm audience="hidden">NULL</indexterm>
- The notion of <codeph>NULL</codeph> values is familiar from all kinds of database systems, but each SQL
- dialect can have its own behavior and restrictions on <codeph>NULL</codeph> values. For Big Data
- processing, the precise semantics of <codeph>NULL</codeph> values are significant: any misunderstanding
- could lead to inaccurate results or misformatted data, that could be time-consuming to correct for large
- data sets.
+ The notion of <codeph>NULL</codeph> values is familiar from all kinds of database
+ systems, but each SQL dialect can have its own behavior and restrictions on
+ <codeph>NULL</codeph> values. For Big Data processing, the precise semantics of
+ <codeph>NULL</codeph> values are significant: any misunderstanding could lead to
+ inaccurate results or misformatted data, that could be time-consuming to correct for
+ large data sets.
</p>
<ul>
<li>
- <codeph>NULL</codeph> is a different value than an empty string. The empty string is represented by a
- string literal with nothing inside, <codeph>""</codeph> or <codeph>''</codeph>.
+ <codeph>NULL</codeph> is a different value than an empty string. The empty string is
+ represented by a string literal with nothing inside, <codeph>""</codeph> or
+ <codeph>''</codeph>.
</li>
<li>
- In a delimited text file, the <codeph>NULL</codeph> value is represented by the special token
- <codeph>\N</codeph>.
+ In a delimited text file, the <codeph>NULL</codeph> value is represented by the
+ special token <codeph>\N</codeph>.
</li>
<li>
- When Impala inserts data into a partitioned table, and the value of one of the partitioning columns is
- <codeph>NULL</codeph> or the empty string, the data is placed in a special partition that holds only
- these two kinds of values. When these values are returned in a query, the result is <codeph>NULL</codeph>
- whether the value was originally <codeph>NULL</codeph> or an empty string. This behavior is compatible
- with the way Hive treats <codeph>NULL</codeph> values in partitioned tables. Hive does not allow empty
- strings as partition keys, and it returns a string value such as
- <codeph>__HIVE_DEFAULT_PARTITION__</codeph> instead of <codeph>NULL</codeph> when such values are
- returned from a query. For example:
+ When Impala inserts data into a partitioned table, and the value of one of the
+ partitioning columns is <codeph>NULL</codeph> or the empty string, the data is placed
+ in a special partition that holds only these two kinds of values. When these values
+ are returned in a query, the result is <codeph>NULL</codeph> whether the value was
+ originally <codeph>NULL</codeph> or an empty string. This behavior is compatible with
+ the way Hive treats <codeph>NULL</codeph> values in partitioned tables. Hive does not
+ allow empty strings as partition keys, and it returns a string value such as
+ <codeph>__HIVE_DEFAULT_PARTITION__</codeph> instead of <codeph>NULL</codeph> when such
+ values are returned from a query. For example:
<codeblock>create table t1 (i int) partitioned by (x int, y string);
-- Select an INT column from another table, with all rows going into a special HDFS subdirectory
-- named __HIVE_DEFAULT_PARTITION__. Depending on whether one or both of the partitioning keys
@@ -359,62 +401,72 @@ insert into t1 partition(x=NULL, y) select c1, c3 from some_other_table;</codeb
</li>
<li>
- There is no <codeph>NOT NULL</codeph> clause when defining a column to prevent <codeph>NULL</codeph>
- values in that column.
+ There is no <codeph>NOT NULL</codeph> clause when defining a column to prevent
+ <codeph>NULL</codeph> values in that column.
</li>
<li>
- There is no <codeph>DEFAULT</codeph> clause to specify a non-<codeph>NULL</codeph> default value.
+ There is no <codeph>DEFAULT</codeph> clause to specify a non-<codeph>NULL</codeph>
+ default value.
</li>
<li>
- If an <codeph>INSERT</codeph> operation mentions some columns but not others, the unmentioned columns
- contain <codeph>NULL</codeph> for all inserted rows.
+ If an <codeph>INSERT</codeph> operation mentions some columns but not others, the
+ unmentioned columns contain <codeph>NULL</codeph> for all inserted rows.
</li>
<li rev="1.2.1">
<p conref="../shared/impala_common.xml#common/null_sorting_change"/>
+
<note>
- <!-- To do: Probably a bunch of similar view-related restrictions like this that should be collected, reused, or cross-referenced under the Views topic. -->
- Because the <codeph>NULLS FIRST</codeph> and <codeph>NULLS LAST</codeph> keywords are not currently
- available in Hive queries, any views you create using those keywords will not be available through
- Hive.
+<!-- To do: Probably a bunch of similar view-related restrictions like this that should be collected, reused, or cross-referenced under the Views topic. -->
+ Because the <codeph>NULLS FIRST</codeph> and <codeph>NULLS LAST</codeph> keywords
+ are not currently available in Hive queries, any views you create using those
+ keywords will not be available through Hive.
</note>
</li>
<li>
- In all other contexts besides sorting with <codeph>ORDER BY</codeph>, comparing a <codeph>NULL</codeph>
- to anything else returns <codeph>NULL</codeph>, making the comparison meaningless. For example,
- <codeph>10 > NULL</codeph> produces <codeph>NULL</codeph>, <codeph>10 < NULL</codeph> also produces
- <codeph>NULL</codeph>, <codeph>5 BETWEEN 1 AND NULL</codeph> produces <codeph>NULL</codeph>, and so on.
+ In all other contexts besides sorting with <codeph>ORDER BY</codeph>, comparing a
+ <codeph>NULL</codeph> to anything else returns <codeph>NULL</codeph>, making the
+ comparison meaningless. For example, <codeph>10 > NULL</codeph> produces
+ <codeph>NULL</codeph>, <codeph>10 < NULL</codeph> also produces
+ <codeph>NULL</codeph>, <codeph>5 BETWEEN 1 AND NULL</codeph> produces
+ <codeph>NULL</codeph>, and so on.
</li>
</ul>
<p>
Several built-in functions serve as shorthand for evaluating expressions and returning
- <codeph>NULL</codeph>, 0, or some other substitution value depending on the expression result:
- <codeph>ifnull()</codeph>, <codeph>isnull()</codeph>, <codeph>nvl()</codeph>, <codeph>nullif()</codeph>,
- <codeph>nullifzero()</codeph>, and <codeph>zeroifnull()</codeph>. See
+ <codeph>NULL</codeph>, 0, or some other substitution value depending on the expression
+ result: <codeph>ifnull()</codeph>, <codeph>isnull()</codeph>, <codeph>nvl()</codeph>,
+ <codeph>nullif()</codeph>, <codeph>nullifzero()</codeph>, and
+ <codeph>zeroifnull()</codeph>. See
<xref href="impala_conditional_functions.xml#conditional_functions"/> for details.
</p>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+
<p rev="kudu">
Columns in Kudu tables have an attribute that specifies whether or not they can contain
- <codeph>NULL</codeph> values. A column with a <codeph>NULL</codeph> attribute can contain
- nulls. A column with a <codeph>NOT NULL</codeph> attribute cannot contain any nulls, and
- an <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or <codeph>UPSERT</codeph> statement
- will skip any row that attempts to store a null in a column designated as <codeph>NOT NULL</codeph>.
- Kudu tables default to the <codeph>NULL</codeph> setting for each column, except columns that
- are part of the primary key.
+ <codeph>NULL</codeph> values. A column with a <codeph>NULL</codeph> attribute can
+ contain nulls. A column with a <codeph>NOT NULL</codeph> attribute cannot contain any
+ nulls, and an <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or
+ <codeph>UPSERT</codeph> statement will skip any row that attempts to store a null in a
+ column designated as <codeph>NOT NULL</codeph>. Kudu tables default to the
+ <codeph>NULL</codeph> setting for each column, except columns that are part of the
+ primary key.
</p>
+
<p rev="kudu">
- In addition to columns with the <codeph>NOT NULL</codeph> attribute, Kudu tables also have
- restrictions on <codeph>NULL</codeph> values in columns that are part of the primary key for
- a table. No column that is part of the primary key in a Kudu table can contain any
- <codeph>NULL</codeph> values.
+ In addition to columns with the <codeph>NOT NULL</codeph> attribute, Kudu tables also
+ have restrictions on <codeph>NULL</codeph> values in columns that are part of the
+ primary key for a table. No column that is part of the primary key in a Kudu table can
+ contain any <codeph>NULL</codeph> values.
</p>
</conbody>
+
</concept>
+
</concept>