You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jr...@apache.org on 2016/07/26 23:04:55 UTC
[03/22] incubator-impala git commit: First try at porting over the
source files necessary for the Impala SQL Reference.
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_timestamp.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_timestamp.xml b/docs/topics/impala_timestamp.xml
new file mode 100644
index 0000000..c469b54
--- /dev/null
+++ b/docs/topics/impala_timestamp.xml
@@ -0,0 +1,441 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="timestamp">
+
+ <title>TIMESTAMP Data Type</title>
+ <titlealts><navtitle>TIMESTAMP</navtitle></titlealts>
+ <prolog>
+ <metadata>
+ <data name="Category" value="Impala"/>
+ <data name="Category" value="Impala Data Types"/>
+ <data name="Category" value="SQL"/>
+ <data name="Category" value="Data Analysts"/>
+ <data name="Category" value="Developers"/>
+ <data name="Category" value="Dates and Times"/>
+ </metadata>
+ </prolog>
+
+ <conbody>
+
+ <p>
+ A data type used in <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph> statements, representing a
+ point in time.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+ <p>
+ In the column definition of a <codeph>CREATE TABLE</codeph> statement:
+ </p>
+
+<codeblock><varname>column_name</varname> TIMESTAMP</codeblock>
+
+ <p>
+ <b>Range:</b> Allowed date values range from 1400-01-01 to 9999-12-31; this range is different from the Hive
+ <codeph>TIMESTAMP</codeph> type. Internally, the resolution of the time portion of a
+ <codeph>TIMESTAMP</codeph> value is in nanoseconds.
+ </p>
+
+ <p>
+ <b>INTERVAL expressions:</b>
+ </p>
+
+ <p>
+ You can perform date arithmetic by adding or subtracting a specified number of time units, using the
+ <codeph>INTERVAL</codeph> keyword and the <codeph>+</codeph> and <codeph>-</codeph> operators or
+ <codeph>date_add()</codeph> and <codeph>date_sub()</codeph> functions. You can specify units as
+ <codeph>YEAR[S]</codeph>, <codeph>MONTH[S]</codeph>, <codeph>WEEK[S]</codeph>, <codeph>DAY[S]</codeph>,
+ <codeph>HOUR[S]</codeph>, <codeph>MINUTE[S]</codeph>, <codeph>SECOND[S]</codeph>,
+ <codeph>MILLISECOND[S]</codeph>, <codeph>MICROSECOND[S]</codeph>, and <codeph>NANOSECOND[S]</codeph>. You can
+ only specify one time unit in each interval expression, for example <codeph>INTERVAL 3 DAYS</codeph> or
+ <codeph>INTERVAL 25 HOURS</codeph>, but you can produce any granularity by adding together successive
+ <codeph>INTERVAL</codeph> values, such as <codeph><varname>timestamp_value</varname> + INTERVAL 3 WEEKS -
+ INTERVAL 1 DAY + INTERVAL 10 MICROSECONDS</codeph>.
+ </p>
+
+ <p>
+ For example:
+ </p>
+
+<codeblock>select now() + interval 1 day;
+select date_sub(now(), interval 5 minutes);
+insert into auction_details
+ select auction_id, auction_start_time, auction_start_time + interval 2 days + interval 12 hours
+ from new_auctions;</codeblock>
+
+ <p>
+ <b>Time zones:</b>
+ </p>
+
+ <p>
+ By default, Impala does not store timestamps using the local timezone, to avoid undesired results from
+ unexpected time zone issues. Timestamps are stored and interpreted relative to UTC, both when written to or
+ read from data files, or when converted to or from Unix time values through functions such as
+ <codeph>from_unixtime()</codeph> or <codeph>unix_timestamp()</codeph>. To convert such a
+ <codeph>TIMESTAMP</codeph> value to one that represents the date and time in a specific time zone, convert
+ the original value with the <codeph>from_utc_timestamp()</codeph> function.
+ </p>
+
+ <p>
+ Because Impala does not assume that <codeph>TIMESTAMP</codeph> values are in any particular time zone, you
+ must be conscious of the time zone aspects of data that you query, insert, or convert.
+ </p>
+
+ <p>
+ For consistency with Unix system calls, the <codeph>TIMESTAMP</codeph> returned by the <codeph>now()</codeph>
+ function represents the local time in the system time zone, rather than in UTC. To store values relative to
+ the current time in a portable way, convert any <codeph>now()</codeph> return values using the
+ <codeph>to_utc_timestamp()</codeph> function first. For example, the following example shows that the current
+ time in California (where Cloudera HQ is located) is shortly after 2 PM. If that value was written to a data
+ file, and shipped off to a distant server to be analyzed alongside other data from far-flung locations, the
+ dates and times would not match up precisely because of time zone differences. Therefore, the
+ <codeph>to_utc_timestamp()</codeph> function converts it using a common reference point, the UTC time zone
+ (descended from the old Greenwich Mean Time standard). The <codeph>'PDT'</codeph> argument indicates that the
+ original value is from the Pacific time zone with Daylight Saving Time in effect. When servers in all
+ geographic locations run the same transformation on any local date and time values (with the appropriate time
+ zone argument), the stored data uses a consistent representation. Impala queries can use functions such as
+ <codeph>EXTRACT()</codeph>, <codeph>MIN()</codeph>, <codeph>AVG()</codeph>, and so on to do time-series
+ analysis on those timestamps.
+ </p>
+
+<codeblock>[localhost:21000] > select now();
++-------------------------------+
+| now() |
++-------------------------------+
+| 2015-04-09 14:07:46.580465000 |
++-------------------------------+
+[localhost:21000] > select to_utc_timestamp(now(), 'PDT');
++--------------------------------+
+| to_utc_timestamp(now(), 'pdt') |
++--------------------------------+
+| 2015-04-09 21:08:07.664547000 |
++--------------------------------+
+</codeblock>
+
+ <p>
+ The converse function, <codeph>from_utc_timestamp()</codeph>, lets you take stored <codeph>TIMESTAMP</codeph>
+ data or calculated results and convert back to local date and time for processing on the application side.
+ The following example shows how you might represent some future date (such as the ending date and time of an
+ auction) in UTC, and then convert back to local time when convenient for reporting or other processing. The
+ final query in the example tests whether this arbitrary UTC date and time has passed yet, by converting it
+ back to the local time zone and comparing it against the current date and time.
+ </p>
+
+<codeblock>[localhost:21000] > select to_utc_timestamp(now() + interval 2 weeks, 'PDT');
++---------------------------------------------------+
+| to_utc_timestamp(now() + interval 2 weeks, 'pdt') |
++---------------------------------------------------+
+| 2015-04-23 21:08:34.152923000 |
++---------------------------------------------------+
+[localhost:21000] > select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT');
++------------------------------------------------------------+
+| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') |
++------------------------------------------------------------+
+| 2015-04-23 14:08:34.152923000 |
++------------------------------------------------------------+
+[localhost:21000] > select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT') < now();
++--------------------------------------------------------------------+
+| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') < now() |
++--------------------------------------------------------------------+
+| false |
++--------------------------------------------------------------------+
+</codeblock>
+
+ <p rev="2.2.0">
+ If you have data files written by Hive, those <codeph>TIMESTAMP</codeph> values represent the local timezone
+ of the host where the data was written, potentially leading to inconsistent results when processed by Impala.
+ To avoid compatibility problems or having to code workarounds, you can specify one or both of these
+ <cmdname>impalad</cmdname> startup flags: <codeph>-use_local_tz_for_unix_timestamp_conversions=true</codeph>
+ <codeph>-convert_legacy_hive_parquet_utc_timestamps=true</codeph>. Although
+ <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> is turned off by default to avoid performance overhead, Cloudera recommends
+ turning it on when processing <codeph>TIMESTAMP</codeph> columns in Parquet files written by Hive, to avoid unexpected behavior.
+ </p>
+
+ <p rev="2.2.0">
+ The <codeph>-use_local_tz_for_unix_timestamp_conversions</codeph> setting affects conversions from
+ <codeph>TIMESTAMP</codeph> to <codeph>BIGINT</codeph>, or from <codeph>BIGINT</codeph>
+ to <codeph>TIMESTAMP</codeph>. By default, Impala treats all <codeph>TIMESTAMP</codeph> values as UTC,
+ to simplify analysis of time-series data from different geographic regions. When you enable the
+ <codeph>-use_local_tz_for_unix_timestamp_conversions</codeph> setting, these operations
+ treat the input values as if they are in the local tie zone of the host doing the processing.
+ See <xref href="impala_datetime_functions.xml#datetime_functions"/> for the list of functions
+ affected by the <codeph>-use_local_tz_for_unix_timestamp_conversions</codeph> setting.
+ </p>
+
+ <p>
+ The following sequence of examples shows how the interpretation of <codeph>TIMESTAMP</codeph> values in
+ Parquet tables is affected by the setting of the <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph>
+ setting.
+ </p>
+
+ <p>
+ Regardless of the <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting,
+ <codeph>TIMESTAMP</codeph> columns in text tables can be written and read interchangeably by Impala and Hive:
+ </p>
+
+<codeblock>Impala DDL and queries for text table:
+
+[localhost:21000] > create table t1 (x timestamp);
+[localhost:21000] > insert into t1 values (now()), (now() + interval 1 day);
+[localhost:21000] > select x from t1;
++-------------------------------+
+| x |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+[localhost:21000] > select to_utc_timestamp(x, 'PDT') from t1;
++-------------------------------+
+| to_utc_timestamp(x, 'pdt') |
++-------------------------------+
+| 2015-04-07 22:43:02.892403000 |
+| 2015-04-08 22:43:02.892403000 |
++-------------------------------+
+
+Hive query for text table:
+
+hive> select * from t1;
+OK
+2015-04-07 15:43:02.892403
+2015-04-08 15:43:02.892403
+Time taken: 1.245 seconds, Fetched: 2 row(s)
+</codeblock>
+
+ <p>
+ When the table uses Parquet format, Impala expects any time zone adjustment to be applied prior to writing,
+ while <codeph>TIMESTAMP</codeph> values written by Hive are adjusted to be in the UTC time zone. When Hive
+ queries Parquet data files that it wrote, it adjusts the <codeph>TIMESTAMP</codeph> values back to the local
+ time zone, while Impala does no conversion. Hive does no time zone conversion when it queries Impala-written
+ Parquet files.
+ </p>
+
+<codeblock>Impala DDL and queries for Parquet table:
+
+[localhost:21000] > create table p1 stored as parquet as select x from t1;
++-------------------+
+| summary |
++-------------------+
+| Inserted 2 row(s) |
++-------------------+
+[localhost:21000] > select x from p1;
++-------------------------------+
+| x |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+
+Hive DDL and queries for Parquet table:
+
+hive> create table h1 (x timestamp) stored as parquet;
+OK
+hive> insert into h1 select * from p1;
+...
+OK
+Time taken: 35.573 seconds
+hive> select x from p1;
+OK
+2015-04-07 15:43:02.892403
+2015-04-08 15:43:02.892403
+Time taken: 0.324 seconds, Fetched: 2 row(s)
+hive> select x from h1;
+OK
+2015-04-07 15:43:02.892403
+2015-04-08 15:43:02.892403
+Time taken: 0.197 seconds, Fetched: 2 row(s)
+</codeblock>
+
+ <p>
+ The discrepancy arises when Impala queries the Hive-created Parquet table. The underlying values in the
+ <codeph>TIMESTAMP</codeph> column are different from the ones written by Impala, even though they were copied
+ from one table to another by an <codeph>INSERT ... SELECT</codeph> statement in Hive. Hive did an implicit
+ conversion from the local time zone to UTC as it wrote the values to Parquet.
+ </p>
+
+<codeblock>Impala query for TIMESTAMP values from Impala-written and Hive-written data:
+
+[localhost:21000] > select * from p1;
++-------------------------------+
+| x |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+Fetched 2 row(s) in 0.29s
+[localhost:21000] > select * from h1;
++-------------------------------+
+| x |
++-------------------------------+
+| 2015-04-07 22:43:02.892403000 |
+| 2015-04-08 22:43:02.892403000 |
++-------------------------------+
+Fetched 2 row(s) in 0.41s
+
+Underlying integer values for Impala-written and Hive-written data:
+
+[localhost:21000] > select cast(x as bigint) from p1;
++-------------------+
+| cast(x as bigint) |
++-------------------+
+| 1428421382 |
+| 1428507782 |
++-------------------+
+Fetched 2 row(s) in 0.38s
+[localhost:21000] > select cast(x as bigint) from h1;
++-------------------+
+| cast(x as bigint) |
++-------------------+
+| 1428446582 |
+| 1428532982 |
++-------------------+
+Fetched 2 row(s) in 0.20s
+</codeblock>
+
+ <p>
+ When the <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting is enabled, Impala recognizes
+ the Parquet data files written by Hive, and applies the same UTC-to-local-timezone conversion logic during
+ the query as Hive uses, making the contents of the Impala-written <codeph>P1</codeph> table and the
+ Hive-written <codeph>H1</codeph> table appear identical, whether represented as <codeph>TIMESTAMP</codeph>
+ values or the underlying <codeph>BIGINT</codeph> integers:
+ </p>
+
+<codeblock>[localhost:21000] > select x from p1;
++-------------------------------+
+| x |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+Fetched 2 row(s) in 0.37s
+[localhost:21000] > select x from h1;
++-------------------------------+
+| x |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+Fetched 2 row(s) in 0.19s
+[localhost:21000] > select cast(x as bigint) from p1;
++-------------------+
+| cast(x as bigint) |
++-------------------+
+| 1428446582 |
+| 1428532982 |
++-------------------+
+Fetched 2 row(s) in 0.29s
+[localhost:21000] > select cast(x as bigint) from h1;
++-------------------+
+| cast(x as bigint) |
++-------------------+
+| 1428446582 |
+| 1428532982 |
++-------------------+
+Fetched 2 row(s) in 0.22s
+</codeblock>
+
+ <p>
+ <b>Conversions:</b>
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/timestamp_conversions"/>
+
+ <p>
+ In Impala 1.3 and higher, the <codeph>FROM_UNIXTIME()</codeph> and <codeph>UNIX_TIMESTAMP()</codeph>
+ functions allow a wider range of format strings, with more flexibility in element order, repetition of letter
+ placeholders, and separator characters. In CDH 5.5 / Impala 2.3 and higher, the <codeph>UNIX_TIMESTAMP()</codeph>
+ function also allows a numeric timezone offset to be specified as part of the input string.
+ See <xref href="impala_datetime_functions.xml#datetime_functions"/> for details.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/y2k38"/>
+
+ <p>
+ <b>Partitioning:</b>
+ </p>
+
+ <p>
+ Although you cannot use a <codeph>TIMESTAMP</codeph> column as a partition key, you can extract the
+ individual years, months, days, hours, and so on and partition based on those columns. Because the partition
+ key column values are represented in HDFS directory names, rather than as fields in the data files
+ themselves, you can also keep the original <codeph>TIMESTAMP</codeph> values if desired, without duplicating
+ data or wasting storage space. See <xref href="impala_partitioning.xml#partition_key_columns"/> for more
+ details on partitioning with date and time values.
+ </p>
+
+<codeblock>[localhost:21000] > create table timeline (event string) partitioned by (happened timestamp);
+ERROR: AnalysisException: Type 'TIMESTAMP' is not supported as partition-column type in column: happened
+</codeblock>
+
+ <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>select cast('1966-07-30' as timestamp);
+select cast('1985-09-25 17:45:30.005' as timestamp);
+select cast('08:30:00' as timestamp);
+select hour('1970-01-01 15:30:00'); -- Succeeds, returns 15.
+select hour('1970-01-01 15:30'); -- Returns NULL because seconds field required.
+select hour('1970-01-01 27:30:00'); -- Returns NULL because hour value out of range.
+select dayofweek('2004-06-13'); -- Returns 1, representing Sunday.
+select dayname('2004-06-13'); -- Returns 'Sunday'.
+select date_add('2004-06-13', 365); -- Returns 2005-06-13 with zeros for hh:mm:ss fields.
+select day('2004-06-13'); -- Returns 13.
+select datediff('1989-12-31','1984-09-01'); -- How many days between these 2 dates?
+select now(); -- Returns current date and time in local timezone.
+
+create table dates_and_times (t timestamp);
+insert into dates_and_times values
+ ('1966-07-30'), ('1985-09-25 17:45:30.005'), ('08:30:00'), (now());
+</codeblock>
+
+ <p conref="../shared/impala_common.xml#common/null_bad_timestamp_cast"/>
+
+ <p conref="../shared/impala_common.xml#common/partitioning_worrisome"/>
+
+ <p conref="../shared/impala_common.xml#common/hbase_ok"/>
+
+ <p conref="../shared/impala_common.xml#common/parquet_ok"/>
+
+ <p conref="../shared/impala_common.xml#common/text_bulky"/>
+
+<!-- <p conref="/Content/impala_common_xi44078.xml#common/compatibility_blurb"/> -->
+
+ <p conref="../shared/impala_common.xml#common/internals_16_bytes"/>
+
+ <p conref="../shared/impala_common.xml#common/added_forever"/>
+
+ <p conref="../shared/impala_common.xml#common/column_stats_constant"/>
+
+ <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+ <p>
+ If you cast a <codeph>STRING</codeph> with an unrecognized format to a <codeph>TIMESTAMP</codeph>, the result
+ is <codeph>NULL</codeph> rather than an error. Make sure to test your data pipeline to be sure any textual
+ date and time values are in a format that Impala <codeph>TIMESTAMP</codeph> can recognize.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
+
+ <p conref="../shared/impala_common.xml#common/related_info"/>
+
+ <ul>
+ <li>
+<!-- The Timestamp Literals topic is pretty brief. Consider adding more examples there. -->
+ <xref href="impala_literals.xml#timestamp_literals"/>.
+ </li>
+
+ <li>
+ To convert to or from different date formats, or perform date arithmetic, use the date and time functions
+ described in <xref href="impala_datetime_functions.xml#datetime_functions"/>. In particular, the
+ <codeph>from_unixtime()</codeph> function requires a case-sensitive format string such as
+ <codeph>"yyyy-MM-dd HH:mm:ss.SSSS"</codeph>, matching one of the allowed variations of a
+ <codeph>TIMESTAMP</codeph> value (date plus time, only date, only time, optional fractional seconds).
+ </li>
+
+ <li>
+ See <xref href="impala_langref_unsupported.xml#langref_hiveql_delta"/> for details about differences in
+ <codeph>TIMESTAMP</codeph> handling between Impala and Hive.
+ </li>
+ </ul>
+
+ </conbody>
+
+</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_tinyint.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_tinyint.xml b/docs/topics/impala_tinyint.xml
new file mode 100644
index 0000000..2b1b3a8
--- /dev/null
+++ b/docs/topics/impala_tinyint.xml
@@ -0,0 +1,101 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="tinyint">
+
+ <title>TINYINT Data Type</title>
+ <titlealts><navtitle>TINYINT</navtitle></titlealts>
+ <prolog>
+ <metadata>
+ <data name="Category" value="Impala"/>
+ <data name="Category" value="Impala Data Types"/>
+ <data name="Category" value="SQL"/>
+ <data name="Category" value="Data Analysts"/>
+ <data name="Category" value="Developers"/>
+ <data name="Category" value="Schemas"/>
+ </metadata>
+ </prolog>
+
+ <conbody>
+
+ <p>
+ A 1-byte integer data type used in <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph> statements.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+ <p>
+ In the column definition of a <codeph>CREATE TABLE</codeph> statement:
+ </p>
+
+<codeblock><varname>column_name</varname> TINYINT</codeblock>
+
+ <p>
+ <b>Range:</b> -128 .. 127. There is no <codeph>UNSIGNED</codeph> subtype.
+ </p>
+
+ <p>
+ <b>Conversions:</b> Impala automatically converts to a larger integer type (<codeph>SMALLINT</codeph>,
+ <codeph>INT</codeph>, or <codeph>BIGINT</codeph>) or a floating-point type (<codeph>FLOAT</codeph> or
+ <codeph>DOUBLE</codeph>) automatically. Use <codeph>CAST()</codeph> to convert to <codeph>STRING</codeph> or
+ <codeph>TIMESTAMP</codeph>.
+ <ph conref="../shared/impala_common.xml#common/cast_int_to_timestamp"/>
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+ <p>
+ For a convenient and automated way to check the bounds of the <codeph>TINYINT</codeph> type, call the
+ functions <codeph>MIN_TINYINT()</codeph> and <codeph>MAX_TINYINT()</codeph>.
+ </p>
+
+ <p>
+ If an integer value is too large to be represented as a <codeph>TINYINT</codeph>, use a
+ <codeph>SMALLINT</codeph> instead.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/null_bad_numeric_cast"/>
+
+ <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>CREATE TABLE t1 (x TINYINT);
+SELECT CAST(100 AS TINYINT);
+</codeblock>
+
+ <p conref="../shared/impala_common.xml#common/parquet_blurb"/>
+
+<!-- Duplicated under TINYINT and SMALLINT. Turn into a conref in both places. -->
+
+ <p rev="1.4.0">
+ Physically, Parquet files represent <codeph>TINYINT</codeph> and <codeph>SMALLINT</codeph> values as 32-bit
+ integers. Although Impala rejects attempts to insert out-of-range values into such columns, if you create a
+ new table with the <codeph>CREATE TABLE ... LIKE PARQUET</codeph> syntax, any <codeph>TINYINT</codeph> or
+ <codeph>SMALLINT</codeph> columns in the original table turn into <codeph>INT</codeph> columns in the new
+ table.
+ </p>
+
+<!-- <p conref="/Content/impala_common_xi44078.xml#common/partitioning_blurb"/> -->
+
+ <p conref="../shared/impala_common.xml#common/hbase_ok"/>
+
+ <p conref="../shared/impala_common.xml#common/text_bulky"/>
+
+<!-- <p conref="/Content/impala_common_xi44078.xml#common/compatibility_blurb"/> -->
+
+ <p conref="../shared/impala_common.xml#common/internals_1_bytes"/>
+
+ <p conref="../shared/impala_common.xml#common/added_forever"/>
+
+ <p conref="../shared/impala_common.xml#common/column_stats_constant"/>
+
+<!-- <p conref="/Content/impala_common_xi44078.xml#common/restrictions_blurb"/> -->
+
+ <p conref="../shared/impala_common.xml#common/related_info"/>
+
+ <p>
+ <xref href="impala_literals.xml#numeric_literals"/>, <xref href="impala_tinyint.xml#tinyint"/>,
+ <xref href="impala_smallint.xml#smallint"/>, <xref href="impala_int.xml#int"/>,
+ <xref href="impala_bigint.xml#bigint"/>, <xref href="impala_decimal.xml#decimal"/>,
+ <xref href="impala_math_functions.xml#math_functions"/>
+ </p>
+ </conbody>
+</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_truncate_table.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_truncate_table.xml b/docs/topics/impala_truncate_table.xml
new file mode 100644
index 0000000..9f0d00b
--- /dev/null
+++ b/docs/topics/impala_truncate_table.xml
@@ -0,0 +1,151 @@
+<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.3.0 5.5.0" id="truncate_table">
+
+ <title>TRUNCATE TABLE Statement (CDH 5.5 or higher only)</title>
+ <titlealts><navtitle>TRUNCATE TABLE</navtitle></titlealts>
+ <prolog>
+ <metadata>
+ <data name="Category" value="Impala"/>
+ <data name="Category" value="SQL"/>
+ </metadata>
+ </prolog>
+
+ <conbody>
+
+ <p>
+ <indexterm audience="Cloudera">TRUNCATE TABLE statement</indexterm>
+ Removes the data from an Impala table while leaving the table itself.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<!-- <codeblock>TRUNCATE TABLE [IF EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname></codeblock> -->
+<codeblock>TRUNCATE TABLE [<varname>db_name</varname>.]<varname>table_name</varname></codeblock>
+
+ <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+ <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+ <p>
+ Often used to empty tables that are used during ETL cycles, after the data has been copied to another
+ table for the next stage of processing. This statement is a low-overhead alternative to dropping and
+ recreating the table, or using <codeph>INSERT OVERWRITE</codeph> to replace the data during the
+ next ETL cycle.
+ </p>
+
+ <p>
+ This statement removes all the data and associated data files in the table. It can remove data files from internal tables,
+ external tables, partitioned tables, and tables mapped to HBase or the Amazon Simple Storage Service (S3).
+ The data removal applies to the entire table, including all partitions of a partitioned table.
+ </p>
+
+ <p>
+ Any statistics produced by the <codeph>COMPUTE STATS</codeph> statement are reset when the data is removed.
+ </p>
+
+ <p>
+ Make sure that you are in the correct database before truncating a table, either by issuing a
+ <codeph>USE</codeph> statement first or by using a fully qualified name
+ <codeph><varname>db_name</varname>.<varname>table_name</varname></codeph>.
+ </p>
+
+<!-- IF EXISTS apparently not implemented for this first go-round. Filing a JIRA about that:
+ <p>
+ The optional <codeph>IF EXISTS</codeph> clause makes the statement succeed whether or not the table exists.
+ If the table does exist, it is truncated; if it does not exist, the statement has no effect. This capability is
+ useful in standardized setup scripts that are might be run both before and after some of the tables exist.
+ </p>
+-->
+
+ <p>
+ Any HDFS data files removed by this statement go into the HDFS trashcan, from which you can recover them
+ within a defined time interval if this operation turns out to be a mistake.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/disk_space_blurb"/>
+
+ <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+ <p rev="2.2.0">
+ Although Impala cannot write new data to a table stored in the Amazon
+ S3 filesystem, the <codeph>TRUNCATE TABLE</codeph> statement can remove data files from S3.
+ See <xref href="impala_s3.xml#s3"/> for details about working with S3 tables.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+ <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+ <p rev="CDH-19187">
+ The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+ typically the <codeph>impala</codeph> user, must have write
+ permission for all the files and directories that make up the table.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+ <p>
+ The following example shows a table containing some data and with table and column statistics.
+ After the <codeph>TRUNCATE TABLE</codeph> statement, the data is removed and the statistics
+ are reset.
+ </p>
+
+<codeblock>CREATE TABLE truncate_demo (x INT);
+INSERT INTO truncate_demo VALUES (1), (2), (4), (8);
+SELECT COUNT(*) FROM truncate_demo;
++----------+
+| count(*) |
++----------+
+| 4 |
++----------+
+COMPUTE STATS truncate_demo;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+SHOW TABLE STATS truncate_demo;
++-------+--------+------+--------------+-------------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+--------+------+--------------+-------------------+--------+-------------------+
+| 4 | 1 | 8B | NOT CACHED | NOT CACHED | TEXT | false |
++-------+--------+------+--------------+-------------------+--------+-------------------+
+SHOW COLUMN STATS truncate_demo;
++--------+------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+------+------------------+--------+----------+----------+
+| x | INT | 4 | -1 | 4 | 4 |
++--------+------+------------------+--------+----------+----------+
+
+-- After this statement, the data and the table/column stats will be gone.
+TRUNCATE TABLE truncate_demo;
+
+SELECT COUNT(*) FROM truncate_demo;
++----------+
+| count(*) |
++----------+
+| 0 |
++----------+
+SHOW TABLE STATS truncate_demo;
++-------+--------+------+--------------+-------------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+--------+------+--------------+-------------------+--------+-------------------+
+| -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT | false |
++-------+--------+------+--------------+-------------------+--------+-------------------+
+SHOW COLUMN STATS truncate_demo;
++--------+------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+------+------------------+--------+----------+----------+
+| x | INT | -1 | -1 | 4 | 4 |
++--------+------+------------------+--------+----------+----------+
+</codeblock>
+
+ <p conref="../shared/impala_common.xml#common/related_info"/>
+
+ <p>
+ <xref href="impala_tables.xml#tables"/>,
+ <xref href="impala_alter_table.xml#alter_table"/>, <xref href="impala_create_table.xml#create_table"/>,
+ <xref href="impala_partitioning.xml#partitioning"/>, <xref href="impala_tables.xml#internal_tables"/>,
+ <xref href="impala_tables.xml#external_tables"/>
+ </p>
+
+ </conbody>
+</concept>