You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/09 21:10:57 UTC
[48/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_analytic_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_analytic_functions.html b/docs/build3x/html/topics/impala_analytic_functions.html
new file mode 100644
index 0000000..607633d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_analytic_functions.html
@@ -0,0 +1,1785 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Im
pala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="an
alytic_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Analytic Functions</title></head><body id="analytic_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Analytic Functions</h1>
+
+
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ Analytic functions (also known as window functions) are a special category of built-in functions. Like
+ aggregate functions, they examine the contents of multiple input rows to compute each output value. However,
+ rather than being limited to one result value per <code class="ph codeph">GROUP BY</code> group, they operate on
+ <dfn class="term">windows</dfn> where the input rows are ordered and grouped using flexible conditions expressed through
+ an <code class="ph codeph">OVER()</code> clause.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+
+
+ <p class="p">
+ Some functions, such as <code class="ph codeph">LAG()</code> and <code class="ph codeph">RANK()</code>, can only be used in this analytic
+ context. Some aggregate functions do double duty: when you call the aggregation functions such as
+ <code class="ph codeph">MAX()</code>, <code class="ph codeph">SUM()</code>, <code class="ph codeph">AVG()</code>, and so on with an
+ <code class="ph codeph">OVER()</code> clause, they produce an output value for each row, based on computations across other
+ rows in the window.
+ </p>
+
+ <p class="p">
+ Although analytic functions often compute the same value you would see from an aggregate function in a
+ <code class="ph codeph">GROUP BY</code> query, the analytic functions produce a value for each row in the result set rather
+ than a single value for each group. This flexibility lets you include additional columns in the
+ <code class="ph codeph">SELECT</code> list, offering more opportunities for organizing and filtering the result set.
+ </p>
+
+ <p class="p">
+ Analytic function calls are only allowed in the <code class="ph codeph">SELECT</code> list and in the outermost
+ <code class="ph codeph">ORDER BY</code> clause of the query. During query processing, analytic functions are evaluated
+ after other query stages such as joins, <code class="ph codeph">WHERE</code>, and <code class="ph codeph">GROUP BY</code>,
+ </p>
+
+
+
+
+
+
+
+
+
+ <p class="p">
+ The rows that are part of each partition are analyzed by computations across an ordered or unordered set of
+ rows. For example, <code class="ph codeph">COUNT()</code> and <code class="ph codeph">SUM()</code> might be applied to all the rows in
+ the partition, in which case the order of analysis does not matter. The <code class="ph codeph">ORDER BY</code> clause
+ might be used inside the <code class="ph codeph">OVER()</code> clause to defines the ordering that applies to functions
+ such as <code class="ph codeph">LAG()</code> and <code class="ph codeph">FIRST_VALUE()</code>.
+ </p>
+
+
+
+
+
+ <p class="p">
+ Analytic functions are frequently used in fields such as finance and science to provide trend, outlier, and
+ bucketed analysis for large data sets. You might also see the term <span class="q">"window functions"</span> in database
+ literature, referring to the sequence of rows (the <span class="q">"window"</span>) that the function call applies to,
+ particularly when the <code class="ph codeph">OVER</code> clause includes a <code class="ph codeph">ROWS</code> or <code class="ph codeph">RANGE</code>
+ keyword.
+ </p>
+
+ <p class="p">
+ The following sections describe the analytic query clauses and the pure analytic functions provided by
+ Impala. For usage information about aggregate functions in an analytic context, see
+ <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="analytic_functions__over">
+
+ <h2 class="title topictitle2" id="ariaid-title2">OVER Clause</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">OVER</code> clause is required for calls to pure analytic functions such as
+ <code class="ph codeph">LEAD()</code>, <code class="ph codeph">RANK()</code>, and <code class="ph codeph">FIRST_VALUE()</code>. When you include an
+ <code class="ph codeph">OVER</code> clause with calls to aggregate functions such as <code class="ph codeph">MAX()</code>,
+ <code class="ph codeph">COUNT()</code>, or <code class="ph codeph">SUM()</code>, they operate as analytic functions.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>function(<var class="keyword varname">args</var>) OVER([<var class="keyword varname">partition_by_clause</var>] [<var class="keyword varname">order_by_clause</var> [<var class="keyword varname">window_clause</var>]])
+
+partition_by_clause ::= PARTITION BY <var class="keyword varname">expr</var> [, <var class="keyword varname">expr</var> ...]
+order_by_clause ::= ORDER BY <var class="keyword varname">expr</var> [ASC | DESC] [NULLS FIRST | NULLS LAST] [, <var class="keyword varname">expr</var> [ASC | DESC] [NULLS FIRST | NULLS LAST] ...]
+window_clause: See <a class="xref" href="#window_clause">Window Clause</a>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">PARTITION BY clause:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">PARTITION BY</code> clause acts much like the <code class="ph codeph">GROUP BY</code> clause in the
+ outermost block of a query. It divides the rows into groups containing identical values in one or more
+ columns. These logical groups are known as <dfn class="term">partitions</dfn>. Throughout the discussion of analytic
+ functions, <span class="q">"partitions"</span> refers to the groups produced by the <code class="ph codeph">PARTITION BY</code> clause, not
+ to partitioned tables. However, note the following limitation that applies specifically to analytic function
+ calls involving partitioned tables.
+ </p>
+
+ <p class="p">
+ In queries involving both analytic functions and partitioned tables, partition pruning only occurs for columns named in the <code class="ph codeph">PARTITION BY</code>
+ clause of the analytic function call. For example, if an analytic function query has a clause such as <code class="ph codeph">WHERE year=2016</code>,
+ the way to make the query prune all other <code class="ph codeph">YEAR</code> partitions is to include <code class="ph codeph">PARTITION BY year</code> in the analytic function call;
+ for example, <code class="ph codeph">OVER (PARTITION BY year,<var class="keyword varname">other_columns</var> <var class="keyword varname">other_analytic_clauses</var>)</code>.
+
+ </p>
+
+ <p class="p">
+ The sequence of results from an analytic function <span class="q">"resets"</span> for each new partition in the result set.
+ That is, the set of preceding or following rows considered by the analytic function always come from a
+ single partition. Any <code class="ph codeph">MAX()</code>, <code class="ph codeph">SUM()</code>, <code class="ph codeph">ROW_NUMBER()</code>, and so
+ on apply to each partition independently. Omit the <code class="ph codeph">PARTITION BY</code> clause to apply the
+ analytic operation to all the rows in the table.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">ORDER BY clause:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause works much like the <code class="ph codeph">ORDER BY</code> clause in the outermost
+ block of a query. It defines the order in which rows are evaluated for the entire input set, or for each
+ group produced by a <code class="ph codeph">PARTITION BY</code> clause. You can order by one or multiple expressions, and
+ for each expression optionally choose ascending or descending order and whether nulls come first or last in
+ the sort order. Because this <code class="ph codeph">ORDER BY</code> clause only defines the order in which rows are
+ evaluated, if you want the results to be output in a specific order, also include an <code class="ph codeph">ORDER
+ BY</code> clause in the outer block of the query.
+ </p>
+
+ <p class="p">
+ When the <code class="ph codeph">ORDER BY</code> clause is omitted, the analytic function applies to all items in the
+ group produced by the <code class="ph codeph">PARTITION BY</code> clause. When the <code class="ph codeph">ORDER BY</code> clause is
+ included, the analysis can apply to all or a subset of the items in the group, depending on the optional
+ window clause.
+ </p>
+
+ <p class="p">
+ The order in which the rows are analyzed is only defined for those columns specified in <code class="ph codeph">ORDER
+ BY</code> clauses.
+ </p>
+
+ <p class="p">
+ One difference between the analytic and outer uses of the <code class="ph codeph">ORDER BY</code> clause: inside the
+ <code class="ph codeph">OVER</code> clause, <code class="ph codeph">ORDER BY 1</code> or other integer value is interpreted as a
+ constant sort value (effectively a no-op) rather than referring to column 1.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Window clause:</strong>
+ </p>
+
+ <p class="p">
+ The window clause is only allowed in combination with an <code class="ph codeph">ORDER BY</code> clause. If the
+ <code class="ph codeph">ORDER BY</code> clause is specified but the window clause is not, the default window is
+ <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>. See
+ <a class="xref" href="impala_analytic_functions.html#window_clause">Window Clause</a> for full details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong>
+ </p>
+
+ <p class="p">
+ Because HBase tables are optimized for single-row lookups rather than full scans, analytic functions using
+ the <code class="ph codeph">OVER()</code> clause are not recommended for HBase tables. Although such queries work, their
+ performance is lower than on comparable tables using HDFS data files.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Parquet considerations:</strong>
+ </p>
+
+ <p class="p">
+ Analytic functions are very efficient for Parquet tables. The data that is examined during evaluation of
+ the <code class="ph codeph">OVER()</code> clause comes from a specified set of columns, and the values for each column
+ are arranged sequentially within each data file.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Text table considerations:</strong>
+ </p>
+
+ <p class="p">
+ Analytic functions are convenient to use with text tables for exploratory business intelligence. When the
+ volume of data is substantial, prefer to use Parquet tables for performance-critical analytic queries.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows how to synthesize a numeric sequence corresponding to all the rows in a table.
+ The new table has the same columns as the old one, plus an additional column <code class="ph codeph">ID</code> containing
+ the integers 1, 2, 3, and so on, corresponding to the order of a <code class="ph codeph">TIMESTAMP</code> column in the
+ original table.
+ </p>
+
+
+
+<pre class="pre codeblock"><code>CREATE TABLE events_with_id AS
+ SELECT
+ row_number() OVER (ORDER BY date_and_time) AS id,
+ c1, c2, c3, c4
+ FROM events;
+</code></pre>
+
+ <p class="p">
+ The following example shows how to determine the number of rows containing each value for a column. Unlike
+ a corresponding <code class="ph codeph">GROUP BY</code> query, this one can analyze a single column and still return all
+ values (not just the distinct ones) from the other columns.
+ </p>
+
+
+
+<pre class="pre codeblock"><code>SELECT x, y, z,
+ count() OVER (PARTITION BY x) AS how_many_x
+FROM t1;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ You cannot directly combine the <code class="ph codeph">DISTINCT</code> operator with analytic function calls. You can
+ put the analytic function call in a <code class="ph codeph">WITH</code> clause or an inline view, and apply the
+ <code class="ph codeph">DISTINCT</code> operator to its result set.
+ </p>
+
+<pre class="pre codeblock"><code>WITH t1 AS (SELECT x, sum(x) OVER (PARTITION BY x) AS total FROM t1)
+ SELECT DISTINCT x, total FROM t1;
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="analytic_functions__window_clause">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Window Clause</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Certain analytic functions accept an optional <dfn class="term">window clause</dfn>, which makes the function analyze
+ only certain rows <span class="q">"around"</span> the current row rather than all rows in the partition. For example, you can
+ get a moving average by specifying some number of preceding and following rows, or a running count or
+ running total by specifying all rows up to the current position. This clause can result in different
+ analytic results for rows within the same partition.
+ </p>
+
+ <p class="p">
+ The window clause is supported with the <code class="ph codeph">AVG()</code>, <code class="ph codeph">COUNT()</code>,
+ <code class="ph codeph">FIRST_VALUE()</code>, <code class="ph codeph">LAST_VALUE()</code>, and <code class="ph codeph">SUM()</code> functions.
+
+ For <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>, the window clause only allowed if the start bound is
+ <code class="ph codeph">UNBOUNDED PRECEDING</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>ROWS BETWEEN [ { <var class="keyword varname">m</var> | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | <var class="keyword varname">n</var> } FOLLOWING] ]
+RANGE BETWEEN [ {<var class="keyword varname">m</var> | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | <var class="keyword varname">n</var> } FOLLOWING] ]</code></pre>
+
+ <p class="p">
+ <code class="ph codeph">ROWS BETWEEN</code> defines the size of the window in terms of the indexes of the rows in the
+ result set. The size of the window is predictable based on the clauses the position within the result set.
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">RANGE BETWEEN</code> does not currently support numeric arguments to define a variable-size
+ sliding window.
+
+ </p>
+
+
+
+ <p class="p">
+ Currently, Impala supports only some combinations of arguments to the <code class="ph codeph">RANGE</code> clause:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code> (the default when <code class="ph codeph">ORDER
+ BY</code> is specified and the window clause is omitted)
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING</code>
+ </li>
+ </ul>
+
+ <p class="p">
+ When <code class="ph codeph">RANGE</code> is used, <code class="ph codeph">CURRENT ROW</code> includes not just the current row but all
+ rows that are tied with the current row based on the <code class="ph codeph">ORDER BY</code> expressions.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples show financial data for a fictional stock symbol <code class="ph codeph">JDR</code>. The closing
+ price moves up and down each day.
+ </p>
+
+<pre class="pre codeblock"><code>create table stock_ticker (stock_symbol string, closing_price decimal(8,2), closing_date timestamp);
+...load some data...
+select * from stock_ticker order by stock_symbol, closing_date
++--------------+---------------+---------------------+
+| stock_symbol | closing_price | closing_date |
++--------------+---------------+---------------------+
+| JDR | 12.86 | 2014-10-02 00:00:00 |
+| JDR | 12.89 | 2014-10-03 00:00:00 |
+| JDR | 12.94 | 2014-10-04 00:00:00 |
+| JDR | 12.55 | 2014-10-05 00:00:00 |
+| JDR | 14.03 | 2014-10-06 00:00:00 |
+| JDR | 14.75 | 2014-10-07 00:00:00 |
+| JDR | 13.98 | 2014-10-08 00:00:00 |
++--------------+---------------+---------------------+
+</code></pre>
+
+ <p class="p">
+ The queries use analytic functions with window clauses to compute moving averages of the closing price. For
+ example, <code class="ph codeph">ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING</code> produces an average of the value from a
+ 3-day span, producing a different value for each row. The first row, which has no preceding row, only gets
+ averaged with the row following it. If the table contained more than one stock symbol, the
+ <code class="ph codeph">PARTITION BY</code> clause would limit the window for the moving average to only consider the
+ prices for a single stock.
+ </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+ avg(closing_price) over (partition by stock_symbol order by closing_date
+ rows between 1 preceding and 1 following) as moving_average
+ from stock_ticker;
++--------------+---------------------+---------------+----------------+
+| stock_symbol | closing_date | closing_price | moving_average |
++--------------+---------------------+---------------+----------------+
+| JDR | 2014-10-02 00:00:00 | 12.86 | 12.87 |
+| JDR | 2014-10-03 00:00:00 | 12.89 | 12.89 |
+| JDR | 2014-10-04 00:00:00 | 12.94 | 12.79 |
+| JDR | 2014-10-05 00:00:00 | 12.55 | 13.17 |
+| JDR | 2014-10-06 00:00:00 | 14.03 | 13.77 |
+| JDR | 2014-10-07 00:00:00 | 14.75 | 14.25 |
+| JDR | 2014-10-08 00:00:00 | 13.98 | 14.36 |
++--------------+---------------------+---------------+----------------+
+</code></pre>
+
+ <p class="p">
+ The clause <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code> produces a cumulative moving
+ average, from the earliest data up to the value for each day.
+ </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+ avg(closing_price) over (partition by stock_symbol order by closing_date
+ rows between unbounded preceding and current row) as moving_average
+ from stock_ticker;
++--------------+---------------------+---------------+----------------+
+| stock_symbol | closing_date | closing_price | moving_average |
++--------------+---------------------+---------------+----------------+
+| JDR | 2014-10-02 00:00:00 | 12.86 | 12.86 |
+| JDR | 2014-10-03 00:00:00 | 12.89 | 12.87 |
+| JDR | 2014-10-04 00:00:00 | 12.94 | 12.89 |
+| JDR | 2014-10-05 00:00:00 | 12.55 | 12.81 |
+| JDR | 2014-10-06 00:00:00 | 14.03 | 13.05 |
+| JDR | 2014-10-07 00:00:00 | 14.75 | 13.33 |
+| JDR | 2014-10-08 00:00:00 | 13.98 | 13.42 |
++--------------+---------------------+---------------+----------------+
+</code></pre>
+
+
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="analytic_functions__avg_analytic">
+
+ <h2 class="title topictitle2" id="ariaid-title4">AVG Function - Analytic Context</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+ function. See <a class="xref" href="impala_avg.html#avg">AVG Function</a> for details and examples.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="analytic_functions__count_analytic">
+
+ <h2 class="title topictitle2" id="ariaid-title5">COUNT Function - Analytic Context</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+ function. See <a class="xref" href="impala_count.html#count">COUNT Function</a> for details and examples.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="analytic_functions__cume_dist">
+
+ <h2 class="title topictitle2" id="ariaid-title6">CUME_DIST Function (<span class="keyword">Impala 2.3</span> or higher only)</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns the cumulative distribution of a value. The value for each row in the result set is greater than 0
+ and less than or equal to 1.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CUME_DIST (<var class="keyword varname">expr</var>)
+ OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)
+</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Within each partition of the result set, the <code class="ph codeph">CUME_DIST()</code> value represents an ascending
+ sequence that ends at 1. Each value represents the proportion of rows in the partition whose values are
+ less than or equal to the value in the current row.
+ </p>
+
+ <p class="p">
+ If the sequence of input values contains ties, the <code class="ph codeph">CUME_DIST()</code> results are identical for the
+ tied values.
+ </p>
+
+ <p class="p">
+ Impala only supports the <code class="ph codeph">CUME_DIST()</code> function in an analytic context, not as a regular
+ aggregate function.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ This example uses a table with 9 rows. The <code class="ph codeph">CUME_DIST()</code>
+ function evaluates the entire table because there is no <code class="ph codeph">PARTITION BY</code> clause,
+ with the rows ordered by the weight of the animal.
+ the sequence of values shows that 1/9 of the values are less than or equal to the lightest
+ animal (mouse), 2/9 of the values are less than or equal to the second-lightest animal,
+ and so on up to the heaviest animal (elephant), where 9/9 of the rows are less than or
+ equal to its weight.
+ </p>
+
+<pre class="pre codeblock"><code>create table animals (name string, kind string, kilos decimal(9,3));
+insert into animals values
+ ('Elephant', 'Mammal', 4000), ('Giraffe', 'Mammal', 1200), ('Mouse', 'Mammal', 0.020),
+ ('Condor', 'Bird', 15), ('Horse', 'Mammal', 500), ('Owl', 'Bird', 2.5),
+ ('Ostrich', 'Bird', 145), ('Polar bear', 'Mammal', 700), ('Housecat', 'Mammal', 5);
+
+select name, cume_dist() over (order by kilos) from animals;
++------------+-----------------------+
+| name | cume_dist() OVER(...) |
++------------+-----------------------+
+| Elephant | 1 |
+| Giraffe | 0.8888888888888888 |
+| Polar bear | 0.7777777777777778 |
+| Horse | 0.6666666666666666 |
+| Ostrich | 0.5555555555555556 |
+| Condor | 0.4444444444444444 |
+| Housecat | 0.3333333333333333 |
+| Owl | 0.2222222222222222 |
+| Mouse | 0.1111111111111111 |
++------------+-----------------------+
+</code></pre>
+
+ <p class="p">
+ Using a <code class="ph codeph">PARTITION BY</code> clause produces a separate sequence for each partition
+ group, in this case one for mammals and one for birds. Because there are 3 birds and 6 mammals,
+ the sequence illustrates how 1/3 of the <span class="q">"Bird"</span> rows have a <code class="ph codeph">kilos</code> value that is less than or equal to
+ the lightest bird, 1/6 of the <span class="q">"Mammal"</span> rows have a <code class="ph codeph">kilos</code> value that is less than or equal to
+ the lightest mammal, and so on until both the heaviest bird and heaviest mammal have a <code class="ph codeph">CUME_DIST()</code>
+ value of 1.
+ </p>
+
+<pre class="pre codeblock"><code>select name, kind, cume_dist() over (partition by kind order by kilos) from animals
++------------+--------+-----------------------+
+| name | kind | cume_dist() OVER(...) |
++------------+--------+-----------------------+
+| Ostrich | Bird | 1 |
+| Condor | Bird | 0.6666666666666666 |
+| Owl | Bird | 0.3333333333333333 |
+| Elephant | Mammal | 1 |
+| Giraffe | Mammal | 0.8333333333333334 |
+| Polar bear | Mammal | 0.6666666666666666 |
+| Horse | Mammal | 0.5 |
+| Housecat | Mammal | 0.3333333333333333 |
+| Mouse | Mammal | 0.1666666666666667 |
++------------+--------+-----------------------+
+</code></pre>
+
+ <p class="p">
+ We can reverse the ordering within each partition group by using an <code class="ph codeph">ORDER BY ... DESC</code>
+ clause within the <code class="ph codeph">OVER()</code> clause. Now the lightest (smallest value of <code class="ph codeph">kilos</code>)
+ animal of each kind has a <code class="ph codeph">CUME_DIST()</code> value of 1.
+ </p>
+
+<pre class="pre codeblock"><code>select name, kind, cume_dist() over (partition by kind order by kilos desc) from animals
++------------+--------+-----------------------+
+| name | kind | cume_dist() OVER(...) |
++------------+--------+-----------------------+
+| Owl | Bird | 1 |
+| Condor | Bird | 0.6666666666666666 |
+| Ostrich | Bird | 0.3333333333333333 |
+| Mouse | Mammal | 1 |
+| Housecat | Mammal | 0.8333333333333334 |
+| Horse | Mammal | 0.6666666666666666 |
+| Polar bear | Mammal | 0.5 |
+| Giraffe | Mammal | 0.3333333333333333 |
+| Elephant | Mammal | 0.1666666666666667 |
++------------+--------+-----------------------+
+</code></pre>
+
+ <p class="p">
+ The following example manufactures some rows with identical values in the <code class="ph codeph">kilos</code> column,
+ to demonstrate how the results look in case of tie values. For simplicity, it only shows the <code class="ph codeph">CUME_DIST()</code>
+ sequence for the <span class="q">"Bird"</span> rows. Now with 3 rows all with a value of 15, all of those rows have the same
+ <code class="ph codeph">CUME_DIST()</code> value. 4/5 of the rows have a value for <code class="ph codeph">kilos</code> that is less than or
+ equal to 15.
+ </p>
+
+<pre class="pre codeblock"><code>insert into animals values ('California Condor', 'Bird', 15), ('Andean Condor', 'Bird', 15)
+
+select name, kind, cume_dist() over (order by kilos) from animals where kind = 'Bird';
++-------------------+------+-----------------------+
+| name | kind | cume_dist() OVER(...) |
++-------------------+------+-----------------------+
+| Ostrich | Bird | 1 |
+| Condor | Bird | 0.8 |
+| California Condor | Bird | 0.8 |
+| Andean Condor | Bird | 0.8 |
+| Owl | Bird | 0.2 |
++-------------------+------+-----------------------+
+</code></pre>
+
+ <p class="p">
+ The following example shows how to use an <code class="ph codeph">ORDER BY</code> clause in the outer block
+ to order the result set in case of ties. Here, all the <span class="q">"Bird"</span> rows are together, then in descending order
+ by the result of the <code class="ph codeph">CUME_DIST()</code> function, and all tied <code class="ph codeph">CUME_DIST()</code>
+ values are ordered by the animal name.
+ </p>
+
+<pre class="pre codeblock"><code>select name, kind, cume_dist() over (partition by kind order by kilos) as ordering
+ from animals
+where
+ kind = 'Bird'
+order by kind, ordering desc, name;
++-------------------+------+----------+
+| name | kind | ordering |
++-------------------+------+----------+
+| Ostrich | Bird | 1 |
+| Andean Condor | Bird | 0.8 |
+| California Condor | Bird | 0.8 |
+| Condor | Bird | 0.8 |
+| Owl | Bird | 0.2 |
++-------------------+------+----------+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="analytic_functions__dense_rank">
+
+ <h2 class="title topictitle2" id="ariaid-title7">DENSE_RANK Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns an ascending sequence of integers, starting with 1. The output sequence produces duplicate integers
+ for duplicate values of the <code class="ph codeph">ORDER BY</code> expressions. After generating duplicate output values
+ for the <span class="q">"tied"</span> input values, the function continues the sequence with the next higher integer.
+ Therefore, the sequence contains duplicates but no gaps when the input contains duplicates. Starts the
+ sequence over for each group produced by the <code class="ph codeph">PARTITIONED BY</code> clause.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>DENSE_RANK() OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+ window clause is not allowed.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Often used for top-N and bottom-N queries. For example, it could produce a <span class="q">"top 10"</span> report including
+ all the items with the 10 highest values, even if several items tied for 1st place.
+ </p>
+
+ <p class="p">
+ Similar to <code class="ph codeph">ROW_NUMBER</code> and <code class="ph codeph">RANK</code>. These functions differ in how they treat
+ duplicate combinations of values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example demonstrates how the <code class="ph codeph">DENSE_RANK()</code> function identifies where each
+ value <span class="q">"places"</span> in the result set, producing the same result for duplicate values, but with a strict
+ sequence from 1 to the number of groups. For example, when results are ordered by the <code class="ph codeph">X</code>
+ column, both <code class="ph codeph">1</code> values are tied for first; both <code class="ph codeph">2</code> values are tied for
+ second; and so on.
+ </p>
+
+<pre class="pre codeblock"><code>select x, dense_rank() over(order by x) as rank, property from int_t;
++----+------+----------+
+| x | rank | property |
++----+------+----------+
+| 1 | 1 | square |
+| 1 | 1 | odd |
+| 2 | 2 | even |
+| 2 | 2 | prime |
+| 3 | 3 | prime |
+| 3 | 3 | odd |
+| 4 | 4 | even |
+| 4 | 4 | square |
+| 5 | 5 | odd |
+| 5 | 5 | prime |
+| 6 | 6 | even |
+| 6 | 6 | perfect |
+| 7 | 7 | lucky |
+| 7 | 7 | lucky |
+| 7 | 7 | lucky |
+| 7 | 7 | odd |
+| 7 | 7 | prime |
+| 8 | 8 | even |
+| 9 | 9 | square |
+| 9 | 9 | odd |
+| 10 | 10 | round |
+| 10 | 10 | even |
++----+------+----------+
+</code></pre>
+
+ <p class="p">
+ The following examples show how the <code class="ph codeph">DENSE_RANK()</code> function is affected by the
+ <code class="ph codeph">PARTITION</code> property within the <code class="ph codeph">ORDER BY</code> clause.
+ </p>
+
+ <p class="p">
+ Partitioning by the <code class="ph codeph">PROPERTY</code> column groups all the even, odd, and so on values together,
+ and <code class="ph codeph">DENSE_RANK()</code> returns the place of each value within the group, producing several
+ ascending sequences.
+ </p>
+
+<pre class="pre codeblock"><code>select x, dense_rank() over(partition by property order by x) as rank, property from int_t;
++----+------+----------+
+| x | rank | property |
++----+------+----------+
+| 2 | 1 | even |
+| 4 | 2 | even |
+| 6 | 3 | even |
+| 8 | 4 | even |
+| 10 | 5 | even |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 1 | 1 | odd |
+| 3 | 2 | odd |
+| 5 | 3 | odd |
+| 7 | 4 | odd |
+| 9 | 5 | odd |
+| 6 | 1 | perfect |
+| 2 | 1 | prime |
+| 3 | 2 | prime |
+| 5 | 3 | prime |
+| 7 | 4 | prime |
+| 10 | 1 | round |
+| 1 | 1 | square |
+| 4 | 2 | square |
+| 9 | 3 | square |
++----+------+----------+
+</code></pre>
+
+ <p class="p">
+ Partitioning by the <code class="ph codeph">X</code> column groups all the duplicate numbers together and returns the
+ place each value within the group; because each value occurs only 1 or 2 times,
+ <code class="ph codeph">DENSE_RANK()</code> designates each <code class="ph codeph">X</code> value as either first or second within its
+ group.
+ </p>
+
+<pre class="pre codeblock"><code>select x, dense_rank() over(partition by x order by property) as rank, property from int_t;
++----+------+----------+
+| x | rank | property |
++----+------+----------+
+| 1 | 1 | odd |
+| 1 | 2 | square |
+| 2 | 1 | even |
+| 2 | 2 | prime |
+| 3 | 1 | odd |
+| 3 | 2 | prime |
+| 4 | 1 | even |
+| 4 | 2 | square |
+| 5 | 1 | odd |
+| 5 | 2 | prime |
+| 6 | 1 | even |
+| 6 | 2 | perfect |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 7 | 2 | odd |
+| 7 | 3 | prime |
+| 8 | 1 | even |
+| 9 | 1 | odd |
+| 9 | 2 | square |
+| 10 | 1 | even |
+| 10 | 2 | round |
++----+------+----------+
+</code></pre>
+
+ <p class="p">
+ The following example shows how <code class="ph codeph">DENSE_RANK()</code> produces a continuous sequence while still
+ allowing for ties. In this case, Croesus and Midas both have the second largest fortune, while Crassus has
+ the third largest. (In <a class="xref" href="impala_analytic_functions.html#rank">RANK Function</a>, you see a similar query with the
+ <code class="ph codeph">RANK()</code> function that shows that while Crassus has the third largest fortune, he is the
+ fourth richest person.)
+ </p>
+
+<pre class="pre codeblock"><code>select dense_rank() over (order by net_worth desc) as placement, name, net_worth from wealth order by placement, name;
++-----------+---------+---------------+
+| placement | name | net_worth |
++-----------+---------+---------------+
+| 1 | Solomon | 2000000000.00 |
+| 2 | Croesus | 1000000000.00 |
+| 2 | Midas | 1000000000.00 |
+| 3 | Crassus | 500000000.00 |
+| 4 | Scrooge | 80000000.00 |
++-----------+---------+---------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#rank">RANK Function</a>, <a class="xref" href="impala_analytic_functions.html#row_number">ROW_NUMBER Function</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="analytic_functions__first_value">
+
+ <h2 class="title topictitle2" id="ariaid-title8">FIRST_VALUE Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns the expression value from the first row in the window. The return value is <code class="ph codeph">NULL</code> if
+ the input expression is <code class="ph codeph">NULL</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>FIRST_VALUE(<var class="keyword varname">expr</var>) OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var> [<var class="keyword varname">window_clause</var>])</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+ window clause is optional.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ If any duplicate values occur in the tuples evaluated by the <code class="ph codeph">ORDER BY</code> clause, the result
+ of this function is not deterministic. Consider adding additional <code class="ph codeph">ORDER BY</code> columns to
+ ensure consistent ordering.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows a table with a wide variety of country-appropriate greetings. For consistency,
+ we want to standardize on a single greeting for each country. The <code class="ph codeph">FIRST_VALUE()</code> function
+ helps to produce a mail merge report where every person from the same country is addressed with the same
+ greeting.
+ </p>
+
+<pre class="pre codeblock"><code>select name, country, greeting from mail_merge
++---------+---------+--------------+
+| name | country | greeting |
++---------+---------+--------------+
+| Pete | USA | Hello |
+| John | USA | Hi |
+| Boris | Germany | Guten tag |
+| Michael | Germany | Guten morgen |
+| Bjorn | Sweden | Hej |
+| Mats | Sweden | Tja |
++---------+---------+--------------+
+
+select country, name,
+ first_value(greeting)
+ over (partition by country order by name, greeting) as greeting
+ from mail_merge;
++---------+---------+-----------+
+| country | name | greeting |
++---------+---------+-----------+
+| Germany | Boris | Guten tag |
+| Germany | Michael | Guten tag |
+| Sweden | Bjorn | Hej |
+| Sweden | Mats | Hej |
+| USA | John | Hi |
+| USA | Pete | Hi |
++---------+---------+-----------+
+</code></pre>
+
+ <p class="p">
+ Changing the order in which the names are evaluated changes which greeting is applied to each group.
+ </p>
+
+<pre class="pre codeblock"><code>select country, name,
+ first_value(greeting)
+ over (partition by country order by name desc, greeting) as greeting
+ from mail_merge;
++---------+---------+--------------+
+| country | name | greeting |
++---------+---------+--------------+
+| Germany | Michael | Guten morgen |
+| Germany | Boris | Guten morgen |
+| Sweden | Mats | Tja |
+| Sweden | Bjorn | Tja |
+| USA | Pete | Hello |
+| USA | John | Hello |
++---------+---------+--------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#last_value">LAST_VALUE Function</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="analytic_functions__lag">
+
+ <h2 class="title topictitle2" id="ariaid-title9">LAG Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ This function returns the value of an expression using column values from a preceding row. You specify an
+ integer offset, which designates a row position some number of rows previous to the current row. Any column
+ references in the expression argument refer to column values from that prior row. Typically, the table
+ contains a time sequence or numeric sequence column that clearly distinguishes the ordering of the rows.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>LAG (<var class="keyword varname">expr</var> [, <var class="keyword varname">offset</var>] [, <var class="keyword varname">default</var>])
+ OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Sometimes used an an alternative to doing a self-join.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example uses the same stock data created in <a class="xref" href="#window_clause">Window Clause</a>. For each day, the
+ query prints the closing price alongside the previous day's closing price. The first row for each stock
+ symbol has no previous row, so that <code class="ph codeph">LAG()</code> value is <code class="ph codeph">NULL</code>.
+ </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+ lag(closing_price,1) over (partition by stock_symbol order by closing_date) as "yesterday closing"
+ from stock_ticker
+ order by closing_date;
++--------------+---------------------+---------------+-------------------+
+| stock_symbol | closing_date | closing_price | yesterday closing |
++--------------+---------------------+---------------+-------------------+
+| JDR | 2014-09-13 00:00:00 | 12.86 | NULL |
+| JDR | 2014-09-14 00:00:00 | 12.89 | 12.86 |
+| JDR | 2014-09-15 00:00:00 | 12.94 | 12.89 |
+| JDR | 2014-09-16 00:00:00 | 12.55 | 12.94 |
+| JDR | 2014-09-17 00:00:00 | 14.03 | 12.55 |
+| JDR | 2014-09-18 00:00:00 | 14.75 | 14.03 |
+| JDR | 2014-09-19 00:00:00 | 13.98 | 14.75 |
++--------------+---------------------+---------------+-------------------+
+</code></pre>
+
+ <p class="p">
+ The following example does an arithmetic operation between the current row and a value from the previous
+ row, to produce a delta value for each day. This example also demonstrates how <code class="ph codeph">ORDER BY</code>
+ works independently in the different parts of the query. The <code class="ph codeph">ORDER BY closing_date</code> in the
+ <code class="ph codeph">OVER</code> clause makes the query analyze the rows in chronological order. Then the outer query
+ block uses <code class="ph codeph">ORDER BY closing_date DESC</code> to present the results with the most recent date
+ first.
+ </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+ cast(
+ closing_price - lag(closing_price,1) over
+ (partition by stock_symbol order by closing_date)
+ as decimal(8,2)
+ )
+ as "change from yesterday"
+ from stock_ticker
+ order by closing_date desc;
++--------------+---------------------+---------------+-----------------------+
+| stock_symbol | closing_date | closing_price | change from yesterday |
++--------------+---------------------+---------------+-----------------------+
+| JDR | 2014-09-19 00:00:00 | 13.98 | -0.76 |
+| JDR | 2014-09-18 00:00:00 | 14.75 | 0.72 |
+| JDR | 2014-09-17 00:00:00 | 14.03 | 1.47 |
+| JDR | 2014-09-16 00:00:00 | 12.55 | -0.38 |
+| JDR | 2014-09-15 00:00:00 | 12.94 | 0.04 |
+| JDR | 2014-09-14 00:00:00 | 12.89 | 0.03 |
+| JDR | 2014-09-13 00:00:00 | 12.86 | NULL |
++--------------+---------------------+---------------+-----------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ This function is the converse of <a class="xref" href="impala_analytic_functions.html#lead">LEAD Function</a>.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="analytic_functions__last_value">
+
+ <h2 class="title topictitle2" id="ariaid-title10">LAST_VALUE Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns the expression value from the last row in the window. This same value is repeated for all result
+ rows for the group. The return value is <code class="ph codeph">NULL</code> if the input expression is
+ <code class="ph codeph">NULL</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>LAST_VALUE(<var class="keyword varname">expr</var>) OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var> [<var class="keyword varname">window_clause</var>])</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+ window clause is optional.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ If any duplicate values occur in the tuples evaluated by the <code class="ph codeph">ORDER BY</code> clause, the result
+ of this function is not deterministic. Consider adding additional <code class="ph codeph">ORDER BY</code> columns to
+ ensure consistent ordering.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example uses the same <code class="ph codeph">MAIL_MERGE</code> table as in the example for
+ <a class="xref" href="impala_analytic_functions.html#first_value">FIRST_VALUE Function</a>. Because the default window when <code class="ph codeph">ORDER
+ BY</code> is used is <code class="ph codeph">BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>, the query requires the
+ <code class="ph codeph">UNBOUNDED FOLLOWING</code> to look ahead to subsequent rows and find the last value for each
+ country.
+ </p>
+
+<pre class="pre codeblock"><code>select country, name,
+ last_value(greeting) over (
+ partition by country order by name, greeting
+ rows between unbounded preceding and unbounded following
+ ) as greeting
+ from mail_merge
++---------+---------+--------------+
+| country | name | greeting |
++---------+---------+--------------+
+| Germany | Boris | Guten morgen |
+| Germany | Michael | Guten morgen |
+| Sweden | Bjorn | Tja |
+| Sweden | Mats | Tja |
+| USA | John | Hello |
+| USA | Pete | Hello |
++---------+---------+--------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#first_value">FIRST_VALUE Function</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="analytic_functions__lead">
+
+ <h2 class="title topictitle2" id="ariaid-title11">LEAD Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ This function returns the value of an expression using column values from a following row. You specify an
+ integer offset, which designates a row position some number of rows after to the current row. Any column
+ references in the expression argument refer to column values from that later row. Typically, the table
+ contains a time sequence or numeric sequence column that clearly distinguishes the ordering of the rows.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>LEAD (<var class="keyword varname">expr</var> [, <var class="keyword varname">offset</var>] [, <var class="keyword varname">default</var>])
+ OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Sometimes used an an alternative to doing a self-join.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example uses the same stock data created in <a class="xref" href="#window_clause">Window Clause</a>. The query analyzes
+ the closing price for a stock symbol, and for each day evaluates if the closing price for the following day
+ is higher or lower.
+ </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+ case
+ (lead(closing_price,1)
+ over (partition by stock_symbol order by closing_date)
+ - closing_price) > 0
+ when true then "higher"
+ when false then "flat or lower"
+ end as "trending"
+from stock_ticker
+ order by closing_date;
++--------------+---------------------+---------------+---------------+
+| stock_symbol | closing_date | closing_price | trending |
++--------------+---------------------+---------------+---------------+
+| JDR | 2014-09-13 00:00:00 | 12.86 | higher |
+| JDR | 2014-09-14 00:00:00 | 12.89 | higher |
+| JDR | 2014-09-15 00:00:00 | 12.94 | flat or lower |
+| JDR | 2014-09-16 00:00:00 | 12.55 | higher |
+| JDR | 2014-09-17 00:00:00 | 14.03 | higher |
+| JDR | 2014-09-18 00:00:00 | 14.75 | flat or lower |
+| JDR | 2014-09-19 00:00:00 | 13.98 | NULL |
++--------------+---------------------+---------------+---------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ This function is the converse of <a class="xref" href="impala_analytic_functions.html#lag">LAG Function</a>.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="analytic_functions__max_analytic">
+
+ <h2 class="title topictitle2" id="ariaid-title12">MAX Function - Analytic Context</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+ function. See <a class="xref" href="impala_max.html#max">MAX Function</a> for details and examples.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="analytic_functions__min_analytic">
+
+ <h2 class="title topictitle2" id="ariaid-title13">MIN Function - Analytic Context</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+ function. See <a class="xref" href="impala_min.html#min">MIN Function</a> for details and examples.
+ </p>
+
+ </div>
+
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="analytic_functions__ntile">
+
+ <h2 class="title topictitle2" id="ariaid-title14">NTILE Function (<span class="keyword">Impala 2.3</span> or higher only)</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns the <span class="q">"bucket number"</span> associated with each row, between 1 and the value of an expression. For
+ example, creating 100 buckets puts the lowest 1% of values in the first bucket, while creating 10 buckets
+ puts the lowest 10% of values in the first bucket. Each partition can have a different number of buckets.
+
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>NTILE (<var class="keyword varname">expr</var> [, <var class="keyword varname">offset</var> ...]
+ OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ The <span class="q">"ntile"</span> name is derived from the practice of dividing result sets into fourths (quartile), tenths
+ (decile), and so on. The <code class="ph codeph">NTILE()</code> function divides the result set based on an arbitrary
+ percentile value.
+ </p>
+
+ <p class="p">
+ The number of buckets must be a positive integer.
+ </p>
+
+ <p class="p">
+ The number of items in each bucket is identical or almost so, varying by at most 1. If the number of items
+ does not divide evenly between the buckets, the remaining N items are divided evenly among the first N
+ buckets.
+ </p>
+
+ <p class="p">
+ If the number of buckets N is greater than the number of input rows in the partition, then the first N
+ buckets each contain one item, and the remaining buckets are empty.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows divides groups of animals into 4 buckets based on their weight. The
+ <code class="ph codeph">ORDER BY ... DESC</code> clause in the <code class="ph codeph">OVER()</code> clause means that the heaviest 25%
+ are in the first group, and the lightest 25% are in the fourth group. (The <code class="ph codeph">ORDER BY</code> in the
+ outermost part of the query shows how you can order the final result set independently from the order in
+ which the rows are evaluated by the <code class="ph codeph">OVER()</code> clause.) Because there are 9 rows in the group,
+ divided into 4 buckets, the first bucket receives the extra item.
+ </p>
+
+<pre class="pre codeblock"><code>create table animals (name string, kind string, kilos decimal(9,3));
+
+insert into animals values
+ ('Elephant', 'Mammal', 4000), ('Giraffe', 'Mammal', 1200), ('Mouse', 'Mammal', 0.020),
+ ('Condor', 'Bird', 15), ('Horse', 'Mammal', 500), ('Owl', 'Bird', 2.5),
+ ('Ostrich', 'Bird', 145), ('Polar bear', 'Mammal', 700), ('Housecat', 'Mammal', 5);
+
+select name, ntile(4) over (order by kilos desc) as quarter
+ from animals
+order by quarter desc;
++------------+---------+
+| name | quarter |
++------------+---------+
+| Owl | 4 |
+| Mouse | 4 |
+| Condor | 3 |
+| Housecat | 3 |
+| Horse | 2 |
+| Ostrich | 2 |
+| Elephant | 1 |
+| Giraffe | 1 |
+| Polar bear | 1 |
++------------+---------+
+</code></pre>
+
+ <p class="p">
+ The following examples show how the <code class="ph codeph">PARTITION</code> clause works for the
+ <code class="ph codeph">NTILE()</code> function. Here, we divide each kind of animal (mammal or bird) into 2 buckets,
+ the heavier half and the lighter half.
+ </p>
+
+<pre class="pre codeblock"><code>select name, kind, ntile(2) over (partition by kind order by kilos desc) as half
+ from animals
+order by kind;
++------------+--------+------+
+| name | kind | half |
++------------+--------+------+
+| Ostrich | Bird | 1 |
+| Condor | Bird | 1 |
+| Owl | Bird | 2 |
+| Elephant | Mammal | 1 |
+| Giraffe | Mammal | 1 |
+| Polar bear | Mammal | 1 |
+| Horse | Mammal | 2 |
+| Housecat | Mammal | 2 |
+| Mouse | Mammal | 2 |
++------------+--------+------+
+</code></pre>
+
+ <p class="p">
+ Again, the result set can be ordered independently
+ from the analytic evaluation. This next example lists all the animals heaviest to lightest,
+ showing that elephant and giraffe are in the <span class="q">"top half"</span> of mammals by weight, while
+ housecat and mouse are in the <span class="q">"bottom half"</span>.
+ </p>
+
+<pre class="pre codeblock"><code>select name, kind, ntile(2) over (partition by kind order by kilos desc) as half
+ from animals
+order by kilos desc;
++------------+--------+------+
+| name | kind | half |
++------------+--------+------+
+| Elephant | Mammal | 1 |
+| Giraffe | Mammal | 1 |
+| Polar bear | Mammal | 1 |
+| Horse | Mammal | 2 |
+| Ostrich | Bird | 1 |
+| Condor | Bird | 1 |
+| Housecat | Mammal | 2 |
+| Owl | Bird | 2 |
+| Mouse | Mammal | 2 |
++------------+--------+------+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="analytic_functions__percent_rank">
+
+ <h2 class="title topictitle2" id="ariaid-title15">PERCENT_RANK Function (<span class="keyword">Impala 2.3</span> or higher only)</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>PERCENT_RANK (<var class="keyword varname">expr</var>)
+ OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)
+</code></pre>
+
+ <p class="p">
+ Calculates the rank, expressed as a percentage, of each row within a group of rows.
+ If <code class="ph codeph">rank</code> is the value for that same row from the <code class="ph codeph">RANK()</code> function (from 1 to the total number of rows in the partition group),
+ then the <code class="ph codeph">PERCENT_RANK()</code> value is calculated as <code class="ph codeph">(<var class="keyword varname">rank</var> - 1) / (<var class="keyword varname">rows_in_group</var> - 1)</code> .
+ If there is only a single item in the partition group, its <code class="ph codeph">PERCENT_RANK()</code> value is 0.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ This function is similar to the <code class="ph codeph">RANK</code> and <code class="ph codeph">CUME_DIST()</code> functions: it returns an ascending sequence representing the position of each
+ row within the rows of the same partition group. The actual numeric sequence is calculated differently,
+ and the handling of duplicate (tied) values is different.
+ </p>
+
+ <p class="p">
+ The return values range from 0 to 1 inclusive.
+ The first row in each partition group always has the value 0.
+ A <code class="ph codeph">NULL</code> value is considered the lowest possible value.
+ In the case of duplicate input values, all the corresponding rows in the result set
+ have an identical value: the lowest <code class="ph codeph">PERCENT_RANK()</code> value of those
+ tied rows. (In contrast to <code class="ph codeph">CUME_DIST()</code>, where all tied rows have
+ the highest <code class="ph codeph">CUME_DIST()</code> value.)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example uses the same <code class="ph codeph">ANIMALS</code> table as the examples for <code class="ph codeph">CUME_DIST()</code>
+ and <code class="ph codeph">NTILE()</code>, with a few additional rows to illustrate the results where some values are
+ <code class="ph codeph">NULL</code> or there is only a single row in a partition group.
+ </p>
+
+<pre class="pre codeblock"><code>insert into animals values ('Komodo dragon', 'Reptile', 70);
+insert into animals values ('Unicorn', 'Mythical', NULL);
+insert into animals values ('Fire-breathing dragon', 'Mythical', NULL);
+</code></pre>
+
+ <p class="p">
+ As with <code class="ph codeph">CUME_DIST()</code>, there is an ascending sequence for each kind of animal.
+ For example, the <span class="q">"Birds"</span> and <span class="q">"Mammals"</span> rows each have a <code class="ph codeph">PERCENT_RANK()</code> sequence
+ that ranges from 0 to 1.
+ The <span class="q">"Reptile"</span> row has a <code class="ph codeph">PERCENT_RANK()</code> of 0 because that partition group contains only a single item.
+ Both <span class="q">"Mythical"</span> animals have a <code class="ph codeph">PERCENT_RANK()</code> of 0 because
+ a <code class="ph codeph">NULL</code> is considered the lowest value within its partition group.
+ </p>
+
+<pre class="pre codeblock"><code>select name, kind, percent_rank() over (partition by kind order by kilos) from animals;
++-----------------------+----------+--------------------------+
+| name | kind | percent_rank() OVER(...) |
++-----------------------+----------+--------------------------+
+| Mouse | Mammal | 0 |
+| Housecat | Mammal | 0.2 |
+| Horse | Mammal | 0.4 |
+| Polar bear | Mammal | 0.6 |
+| Giraffe | Mammal | 0.8 |
+| Elephant | Mammal | 1 |
+| Komodo dragon | Reptile | 0 |
+| Owl | Bird | 0 |
+| California Condor | Bird | 0.25 |
+| Andean Condor | Bird | 0.25 |
+| Condor | Bird | 0.25 |
+| Ostrich | Bird | 1 |
+| Fire-breathing dragon | Mythical | 0 |
+| Unicorn | Mythical | 0 |
++-----------------------+----------+--------------------------+
+</code></pre>
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title16" id="analytic_functions__rank">
+
+ <h2 class="title topictitle2" id="ariaid-title16">RANK Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns an ascending sequence of integers, starting with 1. The output sequence produces duplicate integers
+ for duplicate values of the <code class="ph codeph">ORDER BY</code> expressions. After generating duplicate output values
+ for the <span class="q">"tied"</span> input values, the function increments the sequence by the number of tied values.
+ Therefore, the sequence contains both duplicates and gaps when the input contains duplicates. Starts the
+ sequence over for each group produced by the <code class="ph codeph">PARTITIONED BY</code> clause.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>RANK() OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+
+
+ <p class="p">
+ Often used for top-N and bottom-N queries. For example, it could produce a <span class="q">"top 10"</span> report including
+ several items that were tied for 10th place.
+ </p>
+
+ <p class="p">
+ Similar to <code class="ph codeph">ROW_NUMBER</code> and <code class="ph codeph">DENSE_RANK</code>. These functions differ in how they
+ treat duplicate combinations of values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example demonstrates how the <code class="ph codeph">RANK()</code> function identifies where each value
+ <span class="q">"places"</span> in the result set, producing the same result for duplicate values, and skipping values in the
+ sequence to account for the number of duplicates. For example, when results are ordered by the
+ <code class="ph codeph">X</code> column, both <code class="ph codeph">1</code> values are tied for first; both <code class="ph codeph">2</code>
+ values are tied for third; and so on.
+ </p>
+
+<pre class="pre codeblock"><code>select x, rank() over(order by x) as rank, property from int_t;
++----+------+----------+
+| x | rank | property |
++----+------+----------+
+| 1 | 1 | square |
+| 1 | 1 | odd |
+| 2 | 3 | even |
+| 2 | 3 | prime |
+| 3 | 5 | prime |
+| 3 | 5 | odd |
+| 4 | 7 | even |
+| 4 | 7 | square |
+| 5 | 9 | odd |
+| 5 | 9 | prime |
+| 6 | 11 | even |
+| 6 | 11 | perfect |
+| 7 | 13 | lucky |
+| 7 | 13 | lucky |
+| 7 | 13 | lucky |
+| 7 | 13 | odd |
+| 7 | 13 | prime |
+| 8 | 18 | even |
+| 9 | 19 | square |
+| 9 | 19 | odd |
+| 10 | 21 | round |
+| 10 | 21 | even |
++----+------+----------+
+</code></pre>
+
+ <p class="p">
+ The following examples show how the <code class="ph codeph">RANK()</code> function is affected by the
+ <code class="ph codeph">PARTITION</code> property within the <code class="ph codeph">ORDER BY</code> clause.
+ </p>
+
+ <p class="p">
+ Partitioning by the <code class="ph codeph">PROPERTY</code> column groups all the even, odd, and so on values together,
+ and <code class="ph codeph">RANK()</code> returns the place of each value within the group, producing several ascending
+ sequences.
+ </p>
+
+<pre class="pre codeblock"><code>select x, rank() over(partition by property order by x) as rank, property from int_t;
++----+------+----------+
+| x | rank | property |
++----+------+----------+
+| 2 | 1 | even |
+| 4 | 2 | even |
+| 6 | 3 | even |
+| 8 | 4 | even |
+| 10 | 5 | even |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 1 | 1 | odd |
+| 3 | 2 | odd |
+| 5 | 3 | odd |
+| 7 | 4 | odd |
+| 9 | 5 | odd |
+| 6 | 1 | perfect |
+| 2 | 1 | prime |
+| 3 | 2 | prime |
+| 5 | 3 | prime |
+| 7 | 4 | prime |
+| 10 | 1 | round |
+| 1 | 1 | square |
+| 4 | 2 | square |
+| 9 | 3 | square |
++----+------+----------+
+</code></pre>
+
+ <p class="p">
+ Partitioning by the <code class="ph codeph">X</code> column groups all the duplicate numbers together and returns the
+ place each value within the group; because each value occurs only 1 or 2 times,
+ <code class="ph codeph">RANK()</code> designates each <code class="ph codeph">X</code> value as either first or second within its
+ group.
+ </p>
+
+<pre class="pre codeblock"><code>select x, rank() over(partition by x order by property) as rank, property from int_t;
++----+------+----------+
+| x | rank | property |
++----+------+----------+
+| 1 | 1 | odd |
+| 1 | 2 | square |
+| 2 | 1 | even |
+| 2 | 2 | prime |
+| 3 | 1 | odd |
+| 3 | 2 | prime |
+| 4 | 1 | even |
+| 4 | 2 | square |
+| 5 | 1 | odd |
+| 5 | 2 | prime |
+| 6 | 1 | even |
+| 6 | 2 | perfect |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 7 | 4 | odd |
+| 7 | 5 | prime |
+| 8 | 1 | even |
+| 9 | 1 | odd |
+| 9 | 2 | square |
+| 10 | 1 | even |
+| 10 | 2 | round |
++----+------+----------+
+</code></pre>
+
+ <p class="p">
+ The following example shows how a magazine might prepare a list of history's wealthiest people. Croesus and
+ Midas are tied for second, then Crassus is fourth.
+ </p>
+
+<pre class="pre codeblock"><code>select rank() over (order by net_worth desc) as rank, name, net_worth from wealth order by rank, name;
++------+---------+---------------+
+| rank | name | net_worth |
++------+---------+---------------+
+| 1 | Solomon | 2000000000.00 |
+| 2 | Croesus | 1000000000.00 |
+| 2 | Midas | 1000000000.00 |
+| 4 | Crassus | 500000000.00 |
+| 5 | Scrooge | 80000000.00 |
++------+---------+---------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#dense_rank">DENSE_RANK Function</a>,
+ <a class="xref" href="impala_analytic_functions.html#row_number">ROW_NUMBER Function</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title17" id="analytic_functions__row_number">
+
+ <h2 class="title topictitle2" id="ariaid-title17">ROW_NUMBER Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns an ascending sequence of integers, starting with 1. Starts the sequence over for each group
+ produced by the <code class="ph codeph">PARTITIONED BY</code> clause. The output sequence includes different values for
+ duplicate input values. Therefore, the sequence never contains any duplicates or gaps, regardless of
+ duplicate input values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>ROW_NUMBER() OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Often used for top-N and bottom-N queries where the input values are known to be unique, or precisely N
+ rows are needed regardless of duplicate values.
+ </p>
+
+ <p class="p">
+ Because its result value is different for each row in the result set (when used without a <code class="ph codeph">PARTITION
+ BY</code> clause), <code class="ph codeph">ROW_NUMBER()</code> can be used to synthesize unique numeric ID values, for
+ example for result sets involving unique values or tuples.
+ </p>
+
+ <p class="p">
+ Similar to <code class="ph codeph">RANK</code> and <code class="ph codeph">DENSE_RANK</code>. These functions differ in how they treat
+ duplicate combinations of values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example demonstrates how <code class="ph codeph">ROW_NUMBER()</code> produces a continuous numeric
+ sequence, even though some values of <code class="ph codeph">X</code> are repeated.
+ </p>
+
+<pre class="pre codeblock"><code>select x, row_number() over(order by x, property) as row_number, property from int_t;
++----+------------+----------+
+| x | row_number | property |
++----+------------+----------+
+| 1 | 1 | odd |
+| 1 | 2 | square |
+| 2 | 3 | even |
+| 2 | 4 | prime |
+| 3 | 5 | odd |
+| 3 | 6 | prime |
+| 4 | 7 | even |
+| 4 | 8 | square |
+| 5 | 9 | odd |
+| 5 | 10 | prime |
+| 6 | 11 | even |
+| 6 | 12 | perfect |
+| 7 | 13 | lucky |
+| 7 | 14 | lucky |
+| 7 | 15 | lucky |
+| 7 | 16 | odd |
+| 7 | 17 | prime |
+| 8 | 18 | even |
+| 9 | 19 | odd |
+| 9 | 20 | square |
+| 10 | 21 | even |
+| 10 | 22 | round |
++----+------------+----------+
+</code></pre>
+
+ <p class="p">
+ The following example shows how a financial institution might assign customer IDs to some of history's
+ wealthiest figures. Although two of the people have identical net worth figures, unique IDs are required
+ for this purpose. <code class="ph codeph">ROW_NUMBER()</code> produces a sequence of five different values for the five
+ input rows.
+ </p>
+
+<pre class="pre codeblock"><code>select row_number() over (order by net_worth desc) as account_id, name, net_worth
+ from wealth order by account_id, name;
++------------+---------+---------------+
+| account_id | name | net_worth |
++------------+---------+---------------+
+| 1 | Solomon | 2000000000.00 |
+| 2 | Croesus | 1000000000.00 |
+| 3 | Midas | 1000000000.00 |
+| 4 | Crassus | 500000000.00 |
+| 5 | Scrooge | 80000000.00 |
++------------+---------+---------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#rank">RANK Function</a>, <a class="xref" href="impala_analytic_functions.html#dense_rank">DENSE_RANK Function</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title18" id="analytic_functions__sum_analytic">
+
+ <h2 class="title topictitle2" id="ariaid-title18">SUM Function - Analytic Context</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+ function. See <a class="xref" href="impala_sum.html#sum">SUM Function</a> for details and examples.
+ </p>
+
+ </div>
+
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_appx_count_distinct.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_appx_count_distinct.html b/docs/build3x/html/topics/impala_appx_count_distinct.html
new file mode 100644
index 0000000..c42c2ca
--- /dev/null
+++ b/docs/build3x/html/topics/impala_appx_count_distinct.html
@@ -0,0 +1,82 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="appx_count_distinct"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</title></head><body id="appx_count_distinct"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">APPX_COUNT_DISTINCT Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Allows multiple <code class="ph codeph">COUNT(DISTINCT)</code> operations within a single query, by internally rewriting
+ each <code class="ph codeph">COUNT(DISTINCT)</code> to use the <code class="ph codeph">NDV()</code> function. The resulting count is
+ approximate rather than precise.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples show how the <code class="ph codeph">APPX_COUNT_DISTINCT</code> lets you work around the restriction
+ where a query can only evaluate <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">col_name</var>)</code> for a single
+ column. By default, you can count the distinct values of one column or another, but not both in a single
+ query:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select count(distinct x) from int_t;
++-------------------+
+| count(distinct x) |
++-------------------+
+| 10 |
++-------------------+
+[localhost:21000] > select count(distinct property) from int_t;
++--------------------------+
+| count(distinct property) |
++--------------------------+
+| 7 |
++--------------------------+
+[localhost:21000] > select count(distinct x), count(distinct property) from int_t;
+ERROR: AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters
+as count(DISTINCT x); deviating function: count(DISTINCT property)
+</code></pre>
+
+ <p class="p">
+ When you enable the <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option, now the query with multiple
+ <code class="ph codeph">COUNT(DISTINCT)</code> works. The reason this behavior requires a query option is that each
+ <code class="ph codeph">COUNT(DISTINCT)</code> is rewritten internally to use the <code class="ph codeph">NDV()</code> function instead,
+ which provides an approximate result rather than a precise count.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > set APPX_COUNT_DISTINCT=true;
+[localhost:21000] > select count(distinct x), count(distinct property) from int_t;
++-------------------+--------------------------+
+| count(distinct x) | count(distinct property) |
++-------------------+--------------------------+
+| 10 | 7 |
++-------------------+--------------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_count.html#count">COUNT Function</a>,
+ <a class="xref" href="impala_distinct.html#distinct">DISTINCT Operator</a>,
+ <a class="xref" href="impala_ndv.html#ndv">NDV Function</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>