You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jb...@apache.org on 2017/04/12 18:25:09 UTC
[05/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS]
Publish rendered Impala documentation to ASF site
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_smallint.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_smallint.html b/docs/build/html/topics/impala_smallint.html
new file mode 100644
index 0000000..cd48e90
--- /dev/null
+++ b/docs/build/html/topics/impala_smallint.html
@@ -0,0 +1,125 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="smallint"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SMALLINT Data Type</title></head><body id="smallint"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">SMALLINT Data Type</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ A 2-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> SMALLINT</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Range:</strong> -32768 .. 32767. There is no <code class="ph codeph">UNSIGNED</code> subtype.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Conversions:</strong> Impala automatically converts to a larger integer type (<code class="ph codeph">INT</code> or
+ <code class="ph codeph">BIGINT</code>) or a floating-point type (<code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>)
+ automatically. Use <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">TINYINT</code>, <code class="ph codeph">STRING</code>,
+ or <code class="ph codeph">TIMESTAMP</code>.
+ <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to
+ <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+ date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+ If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+ the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ For a convenient and automated way to check the bounds of the <code class="ph codeph">SMALLINT</code> type, call the
+ functions <code class="ph codeph">MIN_SMALLINT()</code> and <code class="ph codeph">MAX_SMALLINT()</code>.
+ </p>
+
+ <p class="p">
+ If an integer value is too large to be represented as a <code class="ph codeph">SMALLINT</code>, use an
+ <code class="ph codeph">INT</code> instead.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code>
+ value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x SMALLINT);
+SELECT CAST(1000 AS SMALLINT);
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Parquet considerations:</strong>
+ </p>
+
+
+
+ <p class="p">
+ Physically, Parquet files represent <code class="ph codeph">TINYINT</code> and <code class="ph codeph">SMALLINT</code> values as 32-bit
+ integers. Although Impala rejects attempts to insert out-of-range values into such columns, if you create a
+ new table with the <code class="ph codeph">CREATE TABLE ... LIKE PARQUET</code> syntax, any <code class="ph codeph">TINYINT</code> or
+ <code class="ph codeph">SMALLINT</code> columns in the original table turn into <code class="ph codeph">INT</code> columns in the new
+ table.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Partitioning:</strong> Prefer to use this type for a partition key column. Impala can process the numeric
+ type more efficiently than a <code class="ph codeph">STRING</code> representation of the value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+ using Parquet or other binary formats.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Internal details:</strong> Represented in memory as a 2-byte value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> Available in all versions of Impala.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+ fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+ statement.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+ <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+ <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 1.4 or higher only)</a>,
+ <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_ssl.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_ssl.html b/docs/build/html/topics/impala_ssl.html
new file mode 100644
index 0000000..a9b4d25
--- /dev/null
+++ b/docs/build/html/topics/impala_ssl.html
@@ -0,0 +1,119 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ssl"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring TLS/SSL for Impala</title></head><body id="ssl"><main role="main"><article role="article" aria-labelledby="ssl__tls">
+
+ <h1 class="title topictitle1" id="ssl__tls">Configuring TLS/SSL for Impala</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Impala supports TLS/SSL network encryption, between Impala and client
+ programs, and between the Impala-related daemons running on different nodes
+ in the cluster. This feature is important when you also use other features such as Kerberos
+ authentication or Sentry authorization, where credentials are being
+ transmitted back and forth.
+ </p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="ssl__concept_q1p_j2d_rp">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Using the Command Line</h2>
+
+ <div class="body conbody">
+ <p class="p">
+ To enable SSL for when client applications connect to Impala, add both of the following flags to the <span class="keyword cmdname">impalad</span> startup options:
+ </p>
+
+ <ul class="ul" id="concept_q1p_j2d_rp__ul_i2p_m2d_rp">
+ <li class="li">
+ <code class="ph codeph">--ssl_server_certificate</code>: the full path to the server certificate, on the local filesystem.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">--ssl_private_key</code>: the full path to the server private key, on the local filesystem.
+ </li>
+ </ul>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, Impala can also use SSL for its own internal communication between the
+ <span class="keyword cmdname">impalad</span>, <code class="ph codeph">statestored</code>, and <code class="ph codeph">catalogd</code> daemons.
+ To enable this additional SSL encryption, set the <code class="ph codeph">--ssl_server_certificate</code>
+ and <code class="ph codeph">--ssl_private_key</code> flags in the startup options for
+ <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">catalogd</span>, and <span class="keyword cmdname">statestored</span>,
+ and also add the <code class="ph codeph">--ssl_client_ca_certificate</code> flag for all three of those daemons.
+ </p>
+
+ <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span>
+ Prior to <span class="keyword">Impala 2.3.2</span>, you could enable Kerberos authentication between Impala internal components,
+ or SSL encryption between Impala internal components, but not both at the same time.
+ This restriction has now been lifted.
+ See <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2598" target="_blank">IMPALA-2598</a>
+ to see the maintenance releases for different levels of Impala where the fix has been published.
+ </div>
+
+ <p class="p">
+ If either of these flags are set, both must be set. In that case, Impala starts listening for Beeswax and HiveServer2 requests on
+ SSL-secured ports only. (The port numbers stay the same; see <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> for details.)
+ </p>
+
+ <p class="p">
+ Since Impala uses passphrase-less certificates in PEM format, you can reuse a host's existing Java keystore
+ by using the <code class="ph codeph">openssl</code> toolkit to convert it to the PEM format.
+ </p>
+
+ <section class="section" id="concept_q1p_j2d_rp__secref"><h3 class="title sectiontitle">Configuring TLS/SSL Communication for the Impala Shell</h3>
+
+
+
+ <p class="p">
+ With SSL enabled for Impala, use the following options when starting the
+ <span class="keyword cmdname">impala-shell</span> interpreter:
+ </p>
+
+ <ul class="ul" id="concept_q1p_j2d_rp__ul_kgp_m2d_rp">
+ <li class="li">
+ <code class="ph codeph">--ssl</code>: enables TLS/SSL for <span class="keyword cmdname">impala-shell</span>.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">--ca_cert</code>: the local pathname pointing to the third-party CA certificate, or to a copy of the server
+ certificate for self-signed server certificates.
+ </li>
+ </ul>
+
+ <p class="p">
+ If <code class="ph codeph">--ca_cert</code> is not set, <span class="keyword cmdname">impala-shell</span> enables TLS/SSL, but does not validate the server
+ certificate. This is useful for connecting to a known-good Impala that is only running over TLS/SSL, when a copy of the
+ certificate is not available (such as when debugging customer installations).
+ </p>
+
+ </section>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="ssl__ssl_jdbc_odbc">
+ <h2 class="title topictitle2" id="ariaid-title3">Using TLS/SSL with Business Intelligence Tools</h2>
+ <div class="body conbody">
+ <p class="p">
+ You can use Kerberos authentication, TLS/SSL encryption, or both to secure
+ connections from JDBC and ODBC applications to Impala.
+ See <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a> and <a class="xref" href="impala_odbc.html#impala_odbc">Configuring Impala to Work with ODBC</a>
+ for details.
+ </p>
+
+ <p class="p">
+ Prior to <span class="keyword">Impala 2.5</span>, the Hive JDBC driver did not support connections that use both Kerberos authentication
+ and SSL encryption. If your cluster is running an older release that has this restriction,
+ use an alternative JDBC driver that supports
+ both of these security features.
+ </p>
+ </div>
+ </article>
+
+</article></main></body></html>
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_stddev.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_stddev.html b/docs/build/html/topics/impala_stddev.html
new file mode 100644
index 0000000..4a14e14
--- /dev/null
+++ b/docs/build/html/topics/impala_stddev.html
@@ -0,0 +1,121 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="stddev"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>STDDEV, STDDEV_SAMP, STDDEV_POP Functions</title></head><body id="stddev"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">STDDEV, STDDEV_SAMP, STDDEV_POP Functions</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ An aggregate function that
+ <a class="xref" href="http://en.wikipedia.org/wiki/Standard_deviation" target="_blank">standard
+ deviation</a> of a set of numbers.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>{ STDDEV | STDDEV_SAMP | STDDEV_POP } ([DISTINCT | ALL] <var class="keyword varname">expression</var>)</code></pre>
+
+ <p class="p">
+ This function works with any numeric data type.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> in Impala 2.0 and higher; <code class="ph codeph">STRING</code> in earlier
+ releases
+ </p>
+
+ <p class="p">
+ This function is typically used in mathematical formulas related to probability distributions.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">STDDEV_POP()</code> and <code class="ph codeph">STDDEV_SAMP()</code> functions compute the population
+ standard deviation and sample standard deviation, respectively, of the input values.
+ (<code class="ph codeph">STDDEV()</code> is an alias for <code class="ph codeph">STDDEV_SAMP()</code>.) Both functions evaluate all input
+ rows matched by the query. The difference is that <code class="ph codeph">STDDEV_SAMP()</code> is scaled by
+ <code class="ph codeph">1/(N-1)</code> while <code class="ph codeph">STDDEV_POP()</code> is scaled by <code class="ph codeph">1/N</code>.
+ </p>
+
+ <p class="p">
+ If no input rows match the query, the result of any of these functions is <code class="ph codeph">NULL</code>. If a single
+ input row matches the query, the result of any of these functions is <code class="ph codeph">"0.0"</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ This example demonstrates how <code class="ph codeph">STDDEV()</code> and <code class="ph codeph">STDDEV_SAMP()</code> return the same
+ result, while <code class="ph codeph">STDDEV_POP()</code> uses a slightly different calculation to reflect that the input
+ data is considered part of a larger <span class="q">"population"</span>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select stddev(score) from test_scores;
++---------------+
+| stddev(score) |
++---------------+
+| 28.5 |
++---------------+
+[localhost:21000] > select stddev_samp(score) from test_scores;
++--------------------+
+| stddev_samp(score) |
++--------------------+
+| 28.5 |
++--------------------+
+[localhost:21000] > select stddev_pop(score) from test_scores;
++-------------------+
+| stddev_pop(score) |
++-------------------+
+| 28.4858 |
++-------------------+
+</code></pre>
+
+ <p class="p">
+ This example demonstrates that, because the return value of these aggregate functions is a
+ <code class="ph codeph">STRING</code>, you must currently convert the result with <code class="ph codeph">CAST</code>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table score_stats as select cast(stddev(score) as decimal(7,4)) `standard_deviation`, cast(variance(score) as decimal(7,4)) `variance` from test_scores;
++-------------------+
+| summary |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc score_stats;
++--------------------+--------------+---------+
+| name | type | comment |
++--------------------+--------------+---------+
+| standard_deviation | decimal(7,4) | |
+| variance | decimal(7,4) | |
++--------------------+--------------+---------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">STDDEV()</code>, <code class="ph codeph">STDDEV_POP()</code>, and <code class="ph codeph">STDDEV_SAMP()</code> functions
+ compute the standard deviation (square root of the variance) based on the results of
+ <code class="ph codeph">VARIANCE()</code>, <code class="ph codeph">VARIANCE_POP()</code>, and <code class="ph codeph">VARIANCE_SAMP()</code>
+ respectively. See <a class="xref" href="impala_variance.html#variance">VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions</a> for details about the variance property.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_string.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_string.html b/docs/build/html/topics/impala_string.html
new file mode 100644
index 0000000..60714c7
--- /dev/null
+++ b/docs/build/html/topics/impala_string.html
@@ -0,0 +1,197 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="string"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>STRING Data Type</title></head><body id="string"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">STRING Data Type</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ A data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> STRING</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Length:</strong> Maximum of 32,767 bytes. Do not use any length constraint when declaring
+ <code class="ph codeph">STRING</code> columns, as you might be familiar with from <code class="ph codeph">VARCHAR</code>,
+ <code class="ph codeph">CHAR</code>, or similar column types from relational database systems. <span class="ph">If you do
+ need to manipulate string values with precise or maximum lengths, in Impala 2.0 and higher you can declare
+ columns as <code class="ph codeph">VARCHAR(<var class="keyword varname">max_length</var>)</code> or
+ <code class="ph codeph">CHAR(<var class="keyword varname">length</var>)</code>, but for best performance use <code class="ph codeph">STRING</code>
+ where practical.</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Character sets:</strong> For full support in all Impala subsystems, restrict string values to the ASCII
+ character set. Although some UTF-8 character data can be stored in Impala and retrieved through queries, UTF-8 strings
+ containing non-ASCII characters are not guaranteed to work properly in combination with many SQL aspects,
+ including but not limited to:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ String manipulation functions.
+ </li>
+ <li class="li">
+ Comparison operators.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">ORDER BY</code> clause.
+ </li>
+ <li class="li">
+ Values in partition key columns.
+ </li>
+ </ul>
+
+ <p class="p">
+ For any national language aspects such as
+ collation order or interpreting extended ASCII variants such as ISO-8859-1 or ISO-8859-2 encodings, Impala
+ does not include such metadata with the table definition. If you need to sort, manipulate, or display data
+ depending on those national language characteristics of string data, use logic on the application side.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Conversions:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Impala does not automatically convert <code class="ph codeph">STRING</code> to any numeric type. Impala does
+ automatically convert <code class="ph codeph">STRING</code> to <code class="ph codeph">TIMESTAMP</code> if the value matches one of
+ the accepted <code class="ph codeph">TIMESTAMP</code> formats; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for
+ details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You can use <code class="ph codeph">CAST()</code> to convert <code class="ph codeph">STRING</code> values to
+ <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>,
+ <code class="ph codeph">FLOAT</code>, <code class="ph codeph">DOUBLE</code>, or <code class="ph codeph">TIMESTAMP</code>.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You cannot directly cast a <code class="ph codeph">STRING</code> value to <code class="ph codeph">BOOLEAN</code>. You can use a
+ <code class="ph codeph">CASE</code> expression to evaluate string values such as <code class="ph codeph">'T'</code>,
+ <code class="ph codeph">'true'</code>, and so on and return Boolean <code class="ph codeph">true</code> and <code class="ph codeph">false</code>
+ values as appropriate.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You can cast a <code class="ph codeph">BOOLEAN</code> value to <code class="ph codeph">STRING</code>, returning <code class="ph codeph">'1'</code>
+ for <code class="ph codeph">true</code> values and <code class="ph codeph">'0'</code> for <code class="ph codeph">false</code> values.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Partitioning:</strong>
+ </p>
+
+ <p class="p">
+ Although it might be convenient to use <code class="ph codeph">STRING</code> columns for partition keys, even when those
+ columns contain numbers, for performance and scalability it is much better to use numeric columns as
+ partition keys whenever practical. Although the underlying HDFS directory name might be the same in either
+ case, the in-memory storage for the partition key columns is more compact, and computations are faster, if
+ partition key columns such as <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, <code class="ph codeph">DAY</code> and so on
+ are declared as <code class="ph codeph">INT</code>, <code class="ph codeph">SMALLINT</code>, and so on.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Zero-length strings:</strong> For purposes of clauses such as <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">GROUP
+ BY</code>, Impala considers zero-length strings (<code class="ph codeph">""</code>), <code class="ph codeph">NULL</code>, and space
+ to all be different values.
+ </p>
+
+
+
+
+
+ <p class="p">
+ <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+ using Parquet or other binary formats.
+ </p>
+
+ <p class="p"><strong class="ph b">Avro considerations:</strong></p>
+ <p class="p">
+ The Avro specification allows string values up to 2**64 bytes in length.
+ Impala queries for Avro tables use 32-bit integers to hold string lengths.
+ In <span class="keyword">Impala 2.5</span> and higher, Impala truncates <code class="ph codeph">CHAR</code>
+ and <code class="ph codeph">VARCHAR</code> values in Avro tables to (2**31)-1 bytes.
+ If a query encounters a <code class="ph codeph">STRING</code> value longer than (2**31)-1
+ bytes in an Avro table, the query fails. In earlier releases,
+ encountering such long values in an Avro table could cause a crash.
+ </p>
+
+
+
+
+
+
+
+ <p class="p">
+ <strong class="ph b">Column statistics considerations:</strong> Because the values of this type have variable size, none of the
+ column statistics fields are filled in until you run the <code class="ph codeph">COMPUTE STATS</code> statement.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples demonstrate double-quoted and single-quoted string literals, and required escaping for
+ quotation marks within string literals:
+ </p>
+
+<pre class="pre codeblock"><code>SELECT 'I am a single-quoted string';
+SELECT "I am a double-quoted string";
+SELECT 'I\'m a single-quoted string with an apostrophe';
+SELECT "I\'m a double-quoted string with an apostrophe";
+SELECT 'I am a "short" single-quoted string containing quotes';
+SELECT "I am a \"short\" double-quoted string containing quotes";
+</code></pre>
+
+ <p class="p">
+ The following examples demonstrate calls to string manipulation functions to concatenate strings, convert
+ numbers to strings, or pull out substrings:
+ </p>
+
+<pre class="pre codeblock"><code>SELECT CONCAT("Once upon a time, there were ", CAST(3 AS STRING), ' little pigs.');
+SELECT SUBSTR("hello world",7,5);
+</code></pre>
+
+ <p class="p">
+ The following examples show how to perform operations on <code class="ph codeph">STRING</code> columns within a table:
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (s1 STRING, s2 STRING);
+INSERT INTO t1 VALUES ("hello", 'world'), (CAST(7 AS STRING), "wonders");
+SELECT s1, s2, length(s1) FROM t1 WHERE s2 LIKE 'w%';
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_literals.html#string_literals">String Literals</a>, <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>,
+ <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a>,
+ <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_string_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_string_functions.html b/docs/build/html/topics/impala_string_functions.html
new file mode 100644
index 0000000..aab1f35
--- /dev/null
+++ b/docs/build/html/topics/impala_string_functions.html
@@ -0,0 +1,1036 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="string_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala String Functions</title></head><body id="string_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala String Functions</h1>
+
+
+
+ <div class="body conbody">
+
+ <div class="p">
+ String functions are classified as those primarily accepting or returning <code class="ph codeph">STRING</code>,
+ <code class="ph codeph">VARCHAR</code>, or <code class="ph codeph">CHAR</code> data types, for example to measure the length of a string
+ or concatenate two strings together.
+ <ul class="ul">
+ <li class="li">
+ All the functions that accept <code class="ph codeph">STRING</code> arguments also accept the <code class="ph codeph">VARCHAR</code>
+ and <code class="ph codeph">CHAR</code> types introduced in Impala 2.0.
+ </li>
+
+ <li class="li">
+ Whenever <code class="ph codeph">VARCHAR</code> or <code class="ph codeph">CHAR</code> values are passed to a function that returns a
+ string value, the return type is normalized to <code class="ph codeph">STRING</code>. For example, a call to
+ <code class="ph codeph">concat()</code> with a mix of <code class="ph codeph">STRING</code>, <code class="ph codeph">VARCHAR</code>, and
+ <code class="ph codeph">CHAR</code> arguments produces a <code class="ph codeph">STRING</code> result.
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ The string functions operate mainly on these data types: <a class="xref" href="impala_string.html#string">STRING Data Type</a>,
+ <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>, and <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Function reference:</strong>
+ </p>
+
+ <p class="p">
+ Impala supports the following string functions:
+ </p>
+
+ <dl class="dl">
+
+
+ <dt class="dt dlterm" id="string_functions__ascii">
+ <code class="ph codeph">ascii(string str)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the numeric ASCII code of the first character of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__btrim">
+ <code class="ph codeph">btrim(string a)</code>,
+ <code class="ph codeph">btrim(string a, string chars_to_trim)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Removes all instances of one or more characters
+ from the start and end of a <code class="ph codeph">STRING</code> value.
+ By default, removes only spaces.
+ If a non-<code class="ph codeph">NULL</code> optional second argument is specified, the function removes all
+ occurrences of characters in that second argument from the beginning and
+ end of the string.
+ <p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show the default <code class="ph codeph">btrim()</code> behavior,
+ and what changes when you specify the optional second argument.
+ All the examples bracket the output value with <code class="ph codeph">[ ]</code>
+ so that you can see any leading or trailing spaces in the <code class="ph codeph">btrim()</code> result.
+ By default, the function removes and number of both leading and trailing spaces.
+ When the second argument is specified, any number of occurrences of any
+ character in the second argument are removed from the start and end of the
+ input string; in this case, spaces are not removed (unless they are part of the second
+ argument) and any instances of the characters are not removed if they do not come
+ right at the beginning or end of the string.
+ </p>
+<pre class="pre codeblock"><code>-- Remove multiple spaces before and one space after.
+select concat('[',btrim(' hello '),']');
++---------------------------------------+
+| concat('[', btrim(' hello '), ']') |
++---------------------------------------+
+| [hello] |
++---------------------------------------+
+
+-- Remove any instances of x or y or z at beginning or end. Leave spaces alone.
+select concat('[',btrim('xy hello zyzzxx','xyz'),']');
++------------------------------------------------------+
+| concat('[', btrim('xy hello zyzzxx', 'xyz'), ']') |
++------------------------------------------------------+
+| [ hello ] |
++------------------------------------------------------+
+
+-- Remove any instances of x or y or z at beginning or end.
+-- Leave x, y, z alone in the middle of the string.
+select concat('[',btrim('xyhelxyzlozyzzxx','xyz'),']');
++----------------------------------------------------+
+| concat('[', btrim('xyhelxyzlozyzzxx', 'xyz'), ']') |
++----------------------------------------------------+
+| [helxyzlo] |
++----------------------------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__char_length">
+ <code class="ph codeph">char_length(string a), <span class="ph" id="string_functions__character_length">character_length(string a)</span></code>
+ </dt>
+
+ <dd class="dd">
+
+
+ <strong class="ph b">Purpose:</strong> Returns the length in characters of the argument string. Aliases for the
+ <code class="ph codeph">length()</code> function.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__chr">
+ <code class="ph codeph">chr(int character_code)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a character specified by a decimal code point value.
+ The interpretation and display of the resulting character depends on your system locale.
+ Because consistent processing of Impala string values is only guaranteed
+ for values within the ASCII range, only use this function for values
+ corresponding to ASCII characters.
+ In particular, parameter values greater than 255 return an empty string.
+ <p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Can be used as the inverse of the <code class="ph codeph">ascii()</code> function, which
+ converts a character to its numeric ASCII code.
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>SELECT chr(65);
++---------+
+| chr(65) |
++---------+
+| A |
++---------+
+
+SELECT chr(97);
++---------+
+| chr(97) |
++---------+
+| a |
++---------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__concat">
+ <code class="ph codeph">concat(string a, string b...)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a single string representing all the argument values joined together.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+ concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+ joins together values from different rows.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__concat_ws">
+ <code class="ph codeph">concat_ws(string sep, string a, string b...)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a single string representing the second and following argument values joined
+ together, delimited by a specified separator.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+ concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+ joins together values from different rows.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__find_in_set">
+ <code class="ph codeph">find_in_set(string str, string strList)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a specified string
+ within a comma-separated string. Returns <code class="ph codeph">NULL</code> if either argument is
+ <code class="ph codeph">NULL</code>, 0 if the search string is not found, or 0 if the search string contains a comma.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__group_concat">
+ <code class="ph codeph">group_concat(string s [, string sep])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a single string representing the argument value concatenated together for each
+ row of the result set. If the optional separator string is specified, the separator is added between each
+ pair of concatenated values.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+ concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+ joins together values from different rows.
+ </p>
+ <p class="p">
+ By default, returns a single string covering the whole result set. To include other columns or values
+ in the result set, or to produce multiple concatenated strings for subsets of rows, include a
+ <code class="ph codeph">GROUP BY</code> clause in the query.
+ </p>
+ <p class="p">
+ Strictly speaking, <code class="ph codeph">group_concat()</code> is an aggregate function, not a scalar
+ function like the others in this list.
+ For additional details and examples, see <a class="xref" href="impala_group_concat.html#group_concat">GROUP_CONCAT Function</a>.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__initcap">
+ <code class="ph codeph">initcap(string str)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the input string with the first letter capitalized.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__instr">
+ <code class="ph codeph">instr(string str, string substr)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a substring within a
+ longer string.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__length">
+ <code class="ph codeph">length(string a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the length in characters of the argument string.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__locate">
+ <code class="ph codeph">locate(string substr, string str[, int pos])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a substring within a
+ longer string, optionally after a particular position.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__lower">
+ <code class="ph codeph">lower(string a), <span class="ph" id="string_functions__lcase">lcase(string a)</span> </code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the argument string converted to all-lowercase.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, you can simplify queries that
+ use many <code class="ph codeph">UPPER()</code> and <code class="ph codeph">LOWER()</code> calls
+ to do case-insensitive comparisons, by using the <code class="ph codeph">ILIKE</code>
+ or <code class="ph codeph">IREGEXP</code> operators instead. See
+ <a class="xref" href="../shared/../topics/impala_operators.html#ilike">ILIKE Operator</a> and
+ <a class="xref" href="../shared/../topics/impala_operators.html#iregexp">IREGEXP Operator</a> for details.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__lpad">
+ <code class="ph codeph">lpad(string str, int len, string pad)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a string of a specified length, based on the first argument string. If the
+ specified string is too short, it is padded on the left with a repeating sequence of the characters from
+ the pad string. If the specified string is too long, it is truncated on the right.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__ltrim">
+ <code class="ph codeph">ltrim(string a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the argument string with any leading spaces removed from the left side.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__parse_url">
+ <code class="ph codeph">parse_url(string urlString, string partToExtract [, string keyToExtract])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the portion of a URL corresponding to a specified part. The part argument can be
+ <code class="ph codeph">'PROTOCOL'</code>, <code class="ph codeph">'HOST'</code>, <code class="ph codeph">'PATH'</code>, <code class="ph codeph">'REF'</code>,
+ <code class="ph codeph">'AUTHORITY'</code>, <code class="ph codeph">'FILE'</code>, <code class="ph codeph">'USERINFO'</code>, or
+ <code class="ph codeph">'QUERY'</code>. Uppercase is required for these literal values. When requesting the
+ <code class="ph codeph">QUERY</code> portion of the URL, you can optionally specify a key to retrieve just the
+ associated value from the key-value pairs in the query string.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> This function is important for the traditional Hadoop use case of interpreting web
+ logs. For example, if the web traffic data features raw URLs not divided into separate table columns,
+ you can count visitors to a particular page by extracting the <code class="ph codeph">'PATH'</code> or
+ <code class="ph codeph">'FILE'</code> field, or analyze search terms by extracting the corresponding key from the
+ <code class="ph codeph">'QUERY'</code> field.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__regexp_extract">
+ <code class="ph codeph">regexp_extract(string subject, string pattern, int index)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the specified () group from a string based on a regular expression pattern. Group
+ 0 refers to the entire extracted string, while group 1, 2, and so on refers to the first, second, and so
+ on <code class="ph codeph">(...)</code> portion.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+ Expression syntax used by the Google RE2 library. For details, see
+ <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+ has most idioms familiar from regular expressions in Perl, Python, and so on, including
+ <code class="ph codeph">.*?</code> for non-greedy matches.
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+ way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+ adjust the expression patterns if necessary. See
+ <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+ </p>
+ <p class="p">
+ Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+ use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+ that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+ class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+ escape as <code class="ph codeph">\\d</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ This example shows how group 0 matches the full pattern string, including the portion outside any
+ <code class="ph codeph">()</code> group:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',0);
++------------------------------------------------------+
+| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 0) |
++------------------------------------------------------+
+| abcdef123ghi456 |
++------------------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+ <p class="p">
+ This example shows how group 1 matches just the contents inside the first <code class="ph codeph">()</code> group in
+ the pattern string:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',1);
++------------------------------------------------------+
+| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 1) |
++------------------------------------------------------+
+| 456 |
++------------------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+ <p class="p">
+ Unlike in earlier Impala releases, the regular expression library used in Impala 2.0 and later supports
+ the <code class="ph codeph">.*?</code> idiom for non-greedy matches. This example shows how a pattern string starting
+ with <code class="ph codeph">.*?</code> matches the shortest possible portion of the source string, returning the
+ rightmost set of lowercase letters. A pattern string both starting and ending with <code class="ph codeph">.*?</code>
+ finds two potential matches of equal length, and returns the first one found (the leftmost set of
+ lowercase letters).
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+)',1);
++--------------------------------------------------------+
+| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+)', 1) |
++--------------------------------------------------------+
+| def |
++--------------------------------------------------------+
+[localhost:21000] > select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+).*?',1);
++-----------------------------------------------------------+
+| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+).*?', 1) |
++-----------------------------------------------------------+
+| bcd |
++-----------------------------------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__regexp_like">
+ <code class="ph codeph">regexp_like(string source, string pattern[, string options])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns <code class="ph codeph">true</code> or <code class="ph codeph">false</code> to indicate
+ whether the source string contains anywhere inside it the regular expression given by the pattern.
+ The optional third argument consists of letter flags that change how the match is performed,
+ such as <code class="ph codeph">i</code> for case-insensitive matching.
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+ <p class="p">
+ The flags that you can include in the optional third argument are:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">c</code>: Case-sensitive matching (the default).
+ </li>
+ <li class="li">
+ <code class="ph codeph">i</code>: Case-insensitive matching. If multiple instances of <code class="ph codeph">c</code> and <code class="ph codeph">i</code>
+ are included in the third argument, the last such option takes precedence.
+ </li>
+ <li class="li">
+ <code class="ph codeph">m</code>: Multi-line matching. The <code class="ph codeph">^</code> and <code class="ph codeph">$</code>
+ operators match the start or end of any line within the source string, not the
+ start and end of the entire string.
+ </li>
+ <li class="li">
+ <code class="ph codeph">n</code>: Newline matching. The <code class="ph codeph">.</code> operator can match the
+ newline character. A repetition operator such as <code class="ph codeph">.*</code> can
+ match a portion of the source string that spans multiple lines.
+ </li>
+ </ul>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">boolean</code>
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+ Expression syntax used by the Google RE2 library. For details, see
+ <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+ has most idioms familiar from regular expressions in Perl, Python, and so on, including
+ <code class="ph codeph">.*?</code> for non-greedy matches.
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+ way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+ adjust the expression patterns if necessary. See
+ <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+ </p>
+ <p class="p">
+ Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+ use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+ that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+ class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+ escape as <code class="ph codeph">\\d</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ This example shows how <code class="ph codeph">regexp_like()</code> can test for the existence
+ of various kinds of regular expression patterns within a source string:
+ </p>
+<pre class="pre codeblock"><code>
+-- Matches because the 'f' appears somewhere in 'foo'.
+select regexp_like('foo','f');
++-------------------------+
+| regexp_like('foo', 'f') |
++-------------------------+
+| true |
++-------------------------+
+
+-- Does not match because the comparison is case-sensitive by default.
+select regexp_like('foo','F');
++-------------------------+
+| regexp_like('foo', 'f') |
++-------------------------+
+| false |
++-------------------------+
+
+-- The 3rd argument can change the matching logic, such as 'i' meaning case-insensitive.
+select regexp_like('foo','F','i');
++------------------------------+
+| regexp_like('foo', 'f', 'i') |
++------------------------------+
+| true |
++------------------------------+
+
+-- The familiar regular expression notations work, such as ^ and $ anchors...
+select regexp_like('foo','f$');
++--------------------------+
+| regexp_like('foo', 'f$') |
++--------------------------+
+| false |
++--------------------------+
+
+select regexp_like('foo','o$');
++--------------------------+
+| regexp_like('foo', 'o$') |
++--------------------------+
+| true |
++--------------------------+
+
+-- ...and repetition operators such as * and +
+select regexp_like('foooooobar','fo+b');
++-----------------------------------+
+| regexp_like('foooooobar', 'fo+b') |
++-----------------------------------+
+| true |
++-----------------------------------+
+
+select regexp_like('foooooobar','fx*y*o*b');
++---------------------------------------+
+| regexp_like('foooooobar', 'fx*y*o*b') |
++---------------------------------------+
+| true |
++---------------------------------------+
+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__regexp_replace">
+ <code class="ph codeph">regexp_replace(string initial, string pattern, string replacement)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the initial argument with the regular expression pattern replaced by the final
+ argument string.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+ Expression syntax used by the Google RE2 library. For details, see
+ <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+ has most idioms familiar from regular expressions in Perl, Python, and so on, including
+ <code class="ph codeph">.*?</code> for non-greedy matches.
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+ way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+ adjust the expression patterns if necessary. See
+ <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+ </p>
+ <p class="p">
+ Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+ use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+ that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+ class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+ escape as <code class="ph codeph">\\d</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ These examples show how you can replace parts of a string matching a pattern with replacement text,
+ which can include backreferences to any <code class="ph codeph">()</code> groups in the pattern string. The
+ backreference numbers start at 1, and any <code class="ph codeph">\</code> characters must be escaped as
+ <code class="ph codeph">\\</code>.
+ </p>
+ <p class="p">
+ Replace a character pattern with new text:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select regexp_replace('aaabbbaaa','b+','xyz');
++------------------------------------------+
+| regexp_replace('aaabbbaaa', 'b+', 'xyz') |
++------------------------------------------+
+| aaaxyzaaa |
++------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+ <p class="p">
+ Replace a character pattern with substitution text that includes the original matching text:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select regexp_replace('aaabbbaaa','(b+)','<\\1>');
++----------------------------------------------+
+| regexp_replace('aaabbbaaa', '(b+)', '<\\1>') |
++----------------------------------------------+
+| aaa<bbb>aaa |
++----------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+ <p class="p">
+ Remove all characters that are not digits:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select regexp_replace('123-456-789','[^[:digit:]]','');
++---------------------------------------------------+
+| regexp_replace('123-456-789', '[^[:digit:]]', '') |
++---------------------------------------------------+
+| 123456789 |
++---------------------------------------------------+
+Returned 1 row(s) in 0.12s</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__repeat">
+ <code class="ph codeph">repeat(string str, int n)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the argument string repeated a specified number of times.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__reverse">
+ <code class="ph codeph">reverse(string a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the argument string with characters in reversed order.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__rpad">
+ <code class="ph codeph">rpad(string str, int len, string pad)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a string of a specified length, based on the first argument string. If the
+ specified string is too short, it is padded on the right with a repeating sequence of the characters from
+ the pad string. If the specified string is too long, it is truncated on the right.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__rtrim">
+ <code class="ph codeph">rtrim(string a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the argument string with any trailing spaces removed from the right side.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__space">
+ <code class="ph codeph">space(int n)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a concatenated string of the specified number of spaces. Shorthand for
+ <code class="ph codeph">repeat(' ',<var class="keyword varname">n</var>)</code>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__split_part">
+ <code class="ph codeph">split_part(string source, string delimiter, bigint n)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the nth field within a delimited string.
+ The fields are numbered starting from 1.
+ The delimiter can consist of multiple characters, not just a
+ single character. All matching of the delimiter is done exactly, not using any
+ regular expression patterns.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+ Expression syntax used by the Google RE2 library. For details, see
+ <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+ has most idioms familiar from regular expressions in Perl, Python, and so on, including
+ <code class="ph codeph">.*?</code> for non-greedy matches.
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+ way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+ adjust the expression patterns if necessary. See
+ <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+ </p>
+ <p class="p">
+ Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+ use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+ that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+ class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+ escape as <code class="ph codeph">\\d</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ These examples show how to retrieve the nth field from a delimited string:
+ </p>
+<pre class="pre codeblock"><code>
+select split_part('x,y,z',',',1);
++-----------------------------+
+| split_part('x,y,z', ',', 1) |
++-----------------------------+
+| x |
++-----------------------------+
+
+select split_part('x,y,z',',',2);
++-----------------------------+
+| split_part('x,y,z', ',', 2) |
++-----------------------------+
+| y |
++-----------------------------+
+
+select split_part('x,y,z',',',3);
++-----------------------------+
+| split_part('x,y,z', ',', 3) |
++-----------------------------+
+| z |
++-----------------------------+
+</codeblock>
+
+ <p>
+ These examples show what happens for out-of-range field positions.
+ Specifying a value less than 1 produces an error. Specifying a value
+ greater than the number of fields returns a zero-length string
+ (which is not the same as <codeph>NULL</codeph>).
+ </p>
+<codeblock><![CDATA[
+select split_part('x,y,z',',',0);
+ERROR: Invalid field position: 0
+
+with t1 as (select split_part('x,y,z',',',4) nonexistent_field)
+ select
+ nonexistent_field
+ , concat('[',nonexistent_field,']')
+ , length(nonexistent_field);
+from t1
++-------------------+-------------------------------------+---------------------------+
+| nonexistent_field | concat('[', nonexistent_field, ']') | length(nonexistent_field) |
++-------------------+-------------------------------------+---------------------------+
+| | [] | 0 |
++-------------------+-------------------------------------+---------------------------+
+</codeblock>
+
+ <p>
+ These examples show how the delimiter can be a multi-character value:
+ </p>
+<codeblock><![CDATA[
+select split_part('one***two***three','***',2);
++-------------------------------------------+
+| split_part('one***two***three', '***', 2) |
++-------------------------------------------+
+| two |
++-------------------------------------------+
+
+select split_part('one\|/two\|/three','\|/',3);
++-------------------------------------------+
+| split_part('one\|/two\|/three', '\|/', 3) |
++-------------------------------------------+
+| three |
++-------------------------------------------+
+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__strleft">
+ <code class="ph codeph">strleft(string a, int num_chars)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the leftmost characters of the string. Shorthand for a call to
+ <code class="ph codeph">substr()</code> with 2 arguments.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__strright">
+ <code class="ph codeph">strright(string a, int num_chars)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the rightmost characters of the string. Shorthand for a call to
+ <code class="ph codeph">substr()</code> with 2 arguments.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__substr">
+ <code class="ph codeph">substr(string a, int start [, int len]), <span class="ph" id="string_functions__substring">substring(string a, int start [, int
+ len])</span></code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the portion of the string starting at a specified point, optionally with a
+ specified maximum length. The characters in the string are indexed starting at 1.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__translate">
+ <code class="ph codeph">translate(string input, string from, string to)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the input string with a set of characters replaced by another set of characters.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__trim">
+ <code class="ph codeph">trim(string a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the input string with both leading and trailing spaces removed. The same as
+ passing the string through both <code class="ph codeph">ltrim()</code> and <code class="ph codeph">rtrim()</code>.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Often used during data cleansing operations during the ETL cycle, if input values might still have surrounding spaces.
+ For a more general-purpose function that can remove other leading and trailing characters besides spaces, see <code class="ph codeph">btrim()</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__upper">
+ <code class="ph codeph">upper(string a), <span class="ph" id="string_functions__ucase">ucase(string a)</span></code>
+ </dt>
+
+ <dd class="dd">
+
+
+ <strong class="ph b">Purpose:</strong> Returns the argument string converted to all-uppercase.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, you can simplify queries that
+ use many <code class="ph codeph">UPPER()</code> and <code class="ph codeph">LOWER()</code> calls
+ to do case-insensitive comparisons, by using the <code class="ph codeph">ILIKE</code>
+ or <code class="ph codeph">IREGEXP</code> operators instead. See
+ <a class="xref" href="../shared/../topics/impala_operators.html#ilike">ILIKE Operator</a> and
+ <a class="xref" href="../shared/../topics/impala_operators.html#iregexp">IREGEXP Operator</a> for details.
+ </p>
+ </dd>
+
+
+ </dl>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file