You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@asterixdb.apache.org by mb...@apache.org on 2019/10/15 14:44:38 UTC

[asterixdb] 01/06: [NO ISSUE][DOC] Improve window functions documentation

This is an automated email from the ASF dual-hosted git repository.

mblow pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/asterixdb.git

commit 7b9ac77706e0ca3ce2fb4fd7b729a89e1fdc4ae9
Author: Dmitry Lychagin <dm...@couchbase.com>
AuthorDate: Thu Oct 3 14:45:14 2019 -0700

    [NO ISSUE][DOC] Improve window functions documentation
    
    - user model changes: no
    - storage format changes: no
    - interface changes: no
    
    Details:
    - Minor improvements in the documentation
      for window functions and OVER clause
    
    Change-Id: I349d9b93edca3f3a7ffb340b016402e9ae8320fc
    Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/3606
    Reviewed-by: Dmitry Lychagin <dm...@couchbase.com>
    Reviewed-by: Till Westmann <ti...@apache.org>
    Tested-by: Jenkins <je...@fulliautomatix.ics.uci.edu>
    Integration-Tests: Jenkins <je...@fulliautomatix.ics.uci.edu>
---
 .../src/main/markdown/builtins/14_window.md        | 122 ++++++++++++---------
 .../asterix-doc/src/main/markdown/sqlpp/3_query.md |  39 ++++---
 2 files changed, 89 insertions(+), 72 deletions(-)

diff --git a/asterixdb/asterix-doc/src/main/markdown/builtins/14_window.md b/asterixdb/asterix-doc/src/main/markdown/builtins/14_window.md
index adc5daa..d88d5e8 100644
--- a/asterixdb/asterix-doc/src/main/markdown/builtins/14_window.md
+++ b/asterixdb/asterix-doc/src/main/markdown/builtins/14_window.md
@@ -39,13 +39,17 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
 * Syntax:
 
-        CUME_DIST() OVER ([window-partition-clause] window-order-clause)
+        CUME_DIST() OVER ([window-partition-clause] [window-order-clause])
 
 * Returns the percentile rank of the current tuple as part of the cumulative
   distribution – that is, the number of tuples ranked lower than or equal to
   the current tuple, including the current tuple, divided by the total number
   of tuples in the window partition.
 
+    The window order clause determines the sort order of the tuples.
+    If the window order clause is omitted, the function returns the same
+    result (1.0) for each tuple.
+
 * Arguments:
 
     * None.
@@ -54,7 +58,7 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
     * (Optional) [Window Partition Clause](manual.html#Window_partition_clause).
 
-    * (Required) [Window Order Clause](manual.html#Window_order_clause).
+    * (Optional) [Window Order Clause](manual.html#Window_order_clause).
 
 * Return Value:
 
@@ -116,7 +120,7 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
 * Syntax:
 
-        DENSE_RANK() OVER ([window-partition-clause] window-order-clause)
+        DENSE_RANK() OVER ([window-partition-clause] [window-order-clause])
 
 * Returns the dense rank of the current tuple – that is, the number of
   distinct tuples preceding this tuple in the current window partition, plus
@@ -124,6 +128,8 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
     The tuples are ordered by the window order clause.
     If any tuples are tied, they will have the same rank.
+    If the window order clause is omitted, the function returns the same
+    result (1) for each tuple.
 
     For this function, when any tuples have the same rank, the rank of the next
     tuple will be consecutive, so there will not be a gap in the sequence of
@@ -138,7 +144,7 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
     * (Optional) [Window Partition Clause](manual.html#Window_partition_clause).
 
-    * (Required) [Window Order Clause](manual.html#Window_order_clause).
+    * (Optional) [Window Order Clause](manual.html#Window_order_clause).
 
 * Return Value:
 
@@ -219,7 +225,7 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 * Modifiers:
 
     * [Nulls Treatment](manual.html#Nulls_treatment): (Optional) Determines how
-      NULL or MISSING values are treated when finding the first tuple in the
+      NULL or MISSING values are treated when finding the first value in the
       window frame.
 
         - `IGNORE NULLS`: If the values for any tuples evaluate to NULL or
@@ -245,7 +251,8 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
     * The specified value from the first tuple.
       The order of the tuples is determined by the window order clause.
 
-    * If all values are NULL or MISSING it returns NULL.
+    * NULL, if the frame was empty or if all values were NULL or MISSING and
+      the `IGNORE NULLS` modifier was specified.
 
     * In the following cases, this function may return unpredictable results.
 
@@ -326,36 +333,38 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
 * Syntax:
 
-        LAG(expr[, offset[, default]]) [nulls-treatment] OVER ([window-partition-clause] window-order-clause)
+        LAG(expr[, offset[, default]]) [nulls-treatment] OVER ([window-partition-clause] [window-order-clause])
 
-* Returns the value of a tuple at a given offset prior to the current tuple
+* Returns the value from a tuple at a given offset prior to the current tuple
   position.
 
+    The window order clause determines the sort order of the tuples.
+    If the window order clause is omitted, the return values may be
+    unpredictable.
+
 * Arguments:
 
     * `expr`: The value that you want to return from the offset
       tuple. <sup>\[[1](#fn_1)\]</sup>
 
-    * `offset`: (Optional) A positive integer greater than 0.
+    * `offset`: (Optional) A positive integer.
       If omitted, the default is 1.
 
     * `default`: (Optional) The value to return when the offset goes out of
-      window scope.
+      partition scope.
       If omitted, the default is NULL.
 
 * Modifiers:
 
     * [Nulls Treatment](manual.html#Nulls_treatment): (Optional) Determines how
-      NULL or MISSING values are treated when finding the first tuple in the
-      window frame.
+      NULL or MISSING values are treated when finding the offset tuple in the
+      window partition.
 
         - `IGNORE NULLS`: If the values for any tuples evaluate to NULL or
-          MISSING, those tuples are ignored when finding the first tuple.
-          In this case, the function returns the first non-NULL, non-MISSING
-          value.
+          MISSING, those tuples are ignored when finding the offset tuple.
 
         - `RESPECT NULLS`: If the values for any tuples evaluate to NULL or
-          MISSING, those tuples are included when finding the first tuple.
+          MISSING, those tuples are included when finding the offset tuple.
 
         If this modifier is omitted, the default is `RESPECT NULLS`.
 
@@ -363,13 +372,13 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
     * (Optional) [Window Partition Clause](manual.html#Window_partition_clause).
 
-    * (Required) [Window Order Clause](manual.html#Window_order_clause).
+    * (Optional) [Window Order Clause](manual.html#Window_order_clause).
 
 * Return Value:
 
     * The specified value from the offset tuple.
 
-    * If the offset tuple is out of scope, it returns the default value,
+    * If the offset tuple is out of partition scope, it returns the default value,
       or NULL if no default is specified.
 
 * Example:
@@ -449,16 +458,16 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 * Modifiers:
 
     * [Nulls Treatment](manual.html#Nulls_treatment): (Optional) Determines how
-      NULL or MISSING values are treated when finding the first tuple in the
+      NULL or MISSING values are treated when finding the last tuple in the
       window frame.
 
         - `IGNORE NULLS`: If the values for any tuples evaluate to NULL or
-          MISSING, those tuples are ignored when finding the first tuple.
-          In this case, the function returns the first non-NULL, non-MISSING
+          MISSING, those tuples are ignored when finding the last tuple.
+          In this case, the function returns the last non-NULL, non-MISSING
           value.
 
         - `RESPECT NULLS`: If the values for any tuples evaluate to NULL or
-          MISSING, those tuples are included when finding the first tuple.
+          MISSING, those tuples are included when finding the last tuple.
 
         If this modifier is omitted, the default is `RESPECT NULLS`.
 
@@ -475,7 +484,8 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
     * The specified value from the last tuple.
       The order of the tuples is determined by the window order clause.
 
-    * If all values are NULL or MISSING it returns NULL.
+    * NULL, if the frame was empty or if all values were NULL or MISSING and
+      the `IGNORE NULLS` modifier was specified.
 
     * In the following cases, this function may return unpredictable results.
 
@@ -566,36 +576,38 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
 * Syntax:
 
-        LEAD(expr[, offset[, default]]) [nulls-treatment] OVER ([window-partition-clause] window-order-clause)
+        LEAD(expr[, offset[, default]]) [nulls-treatment] OVER ([window-partition-clause] [window-order-clause])
 
-* Returns the value of a tuple at a given offset ahead of the current tuple
+* Returns the value from a tuple at a given offset ahead of the current tuple
   position.
 
+    The window order clause determines the sort order of the tuples.
+    If the window order clause is omitted, the return values may be
+    unpredictable.
+
 * Arguments:
 
     * `expr`: The value that you want to return from the offset
       tuple. <sup>\[[1](#fn_1)\]</sup>
 
-    * `offset`: (Optional) A positive integer greater than 0. If omitted, the
+    * `offset`: (Optional) A positive integer. If omitted, the
       default is 1.
 
     * `default`: (Optional) The value to return when the offset goes out of
-      window scope.
+      window partition scope.
       If omitted, the default is NULL.
 
 * Modifiers:
 
     * [Nulls Treatment](manual.html#Nulls_treatment): (Optional) Determines how
-      NULL or MISSING values are treated when finding the first tuple in the
-      window frame.
+      NULL or MISSING values are treated when finding the offset tuple in the
+      window partition.
 
         - `IGNORE NULLS`: If the values for any tuples evaluate to NULL or
-          MISSING, those tuples are ignored when finding the first tuple.
-          In this case, the function returns the first non-NULL, non-MISSING
-          value.
+          MISSING, those tuples are ignored when finding the offset tuple.
 
         - `RESPECT NULLS`: If the values for any tuples evaluate to NULL or
-          MISSING, those tuples are included when finding the first tuple.
+          MISSING, those tuples are included when finding the offset tuple.
 
         If this modifier is omitted, the default is `RESPECT NULLS`.
 
@@ -603,13 +615,13 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
     * (Optional) [Window Partition Clause](manual.html#Window_partition_clause).
 
-    * (Required) [Window Order Clause](manual.html#Window_order_clause).
+    * (Optional) [Window Order Clause](manual.html#Window_order_clause).
 
 * Return Value:
 
     * The specified value from the offset tuple.
 
-    * If the offset tuple is out of scope, it returns the default value, or
+    * If the offset tuple is out of partition scope, it returns the default value, or
       NULL if no default is specified.
 
 * Example:
@@ -706,16 +718,14 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
         If this modifier is omitted, the default is `FROM FIRST`.
 
     * [Nulls Treatment](manual.html#Nulls_treatment): (Optional) Determines how
-      NULL or MISSING values are treated when finding the first tuple in the
+      NULL or MISSING values are treated when finding the offset tuple in the
       window frame.
 
         - `IGNORE NULLS`: If the values for any tuples evaluate to NULL or
-          MISSING, those tuples are ignored when finding the first tuple.
-          In this case, the function returns the first non-NULL, non-MISSING
-          value.
+          MISSING, those tuples are ignored when finding the offset tuple.
 
         - `RESPECT NULLS`: If the values for any tuples evaluate to NULL or
-          MISSING, those tuples are included when finding the first tuple.
+          MISSING, those tuples are included when finding the offset tuple.
 
         If this modifier is omitted, the default is `RESPECT NULLS`.
 
@@ -735,8 +745,6 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
         - If the window order clause is omitted.
 
-        - If the window frame clause is omitted.
-
         - If the window frame is defined by `ROWS`, and there are tied tuples
           in the window frame.
 
@@ -893,7 +901,7 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
 * Syntax:
 
-        NTILE(num_tiles) OVER ([window-partition-clause] window-order-clause)
+        NTILE(num_tiles) OVER ([window-partition-clause] [window-order-clause])
 
 * Divides the window partition into the specified number of tiles, and
   allocates each tuple in the window partition to a tile, so that as far as
@@ -903,20 +911,22 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
   For each tuple, the function returns the number of the tile into which that
   tuple was placed.
 
+    The window order clause determines the sort order of the tuples.
+    If the window order clause is omitted then the tuples are processed in
+    an undefined order.
+
 * Arguments:
 
     * `num_tiles`: The number of tiles into which you want to divide
       the window partition.
       This argument can be an expression and must evaluate to a number.
       If the number is not an integer, it will be truncated.
-      If the expression depends on a tuple, it evaluates from the first
-      tuple in the window partition.
 
 * Clauses:
 
     * (Optional) [Window Partition Clause](manual.html#Window_partition_clause).
 
-    * (Required) [Window Order Clause](manual.html#Window_order_clause).
+    * (Optional) [Window Order Clause](manual.html#Window_order_clause).
 
 * Return Value:
 
@@ -977,12 +987,16 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
 * Syntax:
 
-        PERCENT_RANK() OVER ([window-partition-clause] window-order-clause)
+        PERCENT_RANK() OVER ([window-partition-clause] [window-order-clause])
 
 * Returns the percentile rank of the current tuple – that is, the rank of the
   tuples minus one, divided by the total number of tuples in the window
   partition minus one.
 
+    The window order clause determines the sort order of the tuples.
+    If the window order clause is omitted, the function returns the same
+    result (0) for each tuple.
+
 * Arguments:
 
     * None.
@@ -991,7 +1005,7 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
     * (Optional) [Window Partition Clause](manual.html#Window_partition_clause).
 
-    * (Required) [Window Order Clause](manual.html#Window_order_clause).
+    * (Optional) [Window Order Clause](manual.html#Window_order_clause).
 
 * Return Value:
 
@@ -1053,19 +1067,23 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
 * Syntax:
 
-        RANK() OVER ([window-partition-clause] window-order-clause)
+        RANK() OVER ([window-partition-clause] [window-order-clause])
 
 * Returns the rank of the current tuple – that is, the number of distinct
   tuples preceding this tuple in the current window partition, plus one.
 
     The tuples are ordered by the window order clause.
     If any tuples are tied, they will have the same rank.
+    If the window order clause is omitted, the function returns the same
+    result (1) for each tuple.
 
     When any tuples have the same rank, the rank of the next tuple will include
     all preceding tuples, so there may be a gap in the sequence of returned
     values.
     For example, if there are three tuples ranked 2, the next rank is 5.
 
+    To avoid gaps in the returned values, use the DENSE_RANK() function instead.
+
 * Arguments:
 
     * None.
@@ -1074,7 +1092,7 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
 
     * (Optional) [Window Partition Clause](manual.html#Window_partition_clause).
 
-    * (Required) [Window Order Clause](manual.html#Window_order_clause).
+    * (Optional) [Window Order Clause](manual.html#Window_order_clause).
 
 * Return Value:
 
@@ -1145,7 +1163,7 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
         RATIO_TO_REPORT(expr) OVER (window-definition)
 
 * Returns the fractional ratio of the specified value for each tuple to the
-  sum of values for all tuples in the window partition.
+  sum of values for all tuples in the window frame.
 
 * Arguments:
 
@@ -1165,7 +1183,7 @@ described in the section on [SELECT Statements](manual.html#SELECT_statements).
     * A number between 0 and 1, representing the fractional ratio of the value
       for the current tuple to the sum of values for all tuples in the
       current window frame.
-      The sum of values for all tuples in the current window frame is 1.
+      The sum of returned values for all tuples in the current window frame is 1.
 
     * If the input expression does not evaluate to a number, or the sum of
       values for all tuples is zero, it returns NULL.
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
index f4b5f3d..e19e2ec 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
@@ -1698,11 +1698,11 @@ with an OVER clause.
 ### <a id="Window_function_call">Window Function Call</a> ###
 
     WindowFunctionCall ::= WindowFunctionType "(" WindowFunctionArguments ")"
-    (WindowFunctionOptions)? <OVER> (Variable <AS>)? "(" WindowClause ")"
+    (WindowFunctionOptions)? <OVER> (Variable <AS>)? "(" WindowDefinition ")"
 
 #### <a id="Window_function_type">Window Function Type</a> ####
 
-    WindowFunctionType ::= AggregateFunctions | WindowFunctions
+    WindowFunctionType ::= AggregateFunction | WindowFunction
 
 Refer to the [Aggregate Functions](builtins.html#AggregateFunctions) section
 for a list of aggregate functions.
@@ -1763,9 +1763,9 @@ When using a built-in [aggregate function](builtins.html#AggregateFunctions) as
 a window function, the function’s argument must be a subquery which refers to
 this alias, for example:
 
-    FROM source AS src
     SELECT ARRAY_COUNT(DISTINCT (FROM alias SELECT VALUE alias.src.field))
     OVER alias AS (PARTITION BY … ORDER BY …)
+    FROM source AS src
 
 The alias is not necessary when using a [window function](builtins.html#WindowFunctions),
 or when using a standard SQL aggregate function with the OVER clause.
@@ -1776,15 +1776,14 @@ A standard SQL aggregate function with an OVER clause is rewritten by the
 query compiler using a built-in aggregate function over a frame variable.
 For example, the following query with the `sum()` function:
 
+    SELECT SUM(field) OVER (PARTITION BY … ORDER BY …)
     FROM source AS src
-    SELECT SUM(field)
-    OVER (PARTITION BY … ORDER BY …)
 
 Is rewritten as the following query using the `array_sum()` function:
 
+    SELECT ARRAY_SUM( (SELECT VALUE alias.src.field FROM alias) )
+      OVER alias AS (PARTITION BY … ORDER BY …)
     FROM source AS src
-    SELECT ARRAY_SUM( (FROM alias SELECT VALUE alias.src.field) )
-    OVER alias AS (PARTITION BY … ORDER BY …)
 
 This is similar to the way that standard SQL aggregate functions are rewritten
 as built-in aggregate functions in the presence of the GROUP BY clause.
@@ -1801,8 +1800,8 @@ window functions.
 
     WindowPartitionClause ::= <PARTITION> <BY> Expression ("," Expression)*
 
-The **window partition clause** divides the tuples into partitions using
-one or more expressions.
+The **window partition clause** divides the tuples into logical partitions
+using one or more expressions.
 
 This clause may be used with any [window function](builtins.html#WindowFunctions),
 or any [aggregate function](builtins.html#AggregateFunctions) used as a window
@@ -1823,13 +1822,8 @@ This clause may be used with any [window function](builtins.html#WindowFunctions
 or any [aggregate function](builtins.html#AggregateFunctions) used as a window
 function.
 
-This clause is optional for some functions, and required for others.
-Refer to the [Aggregate Functions](builtins.html#AggregateFunctions) section or
-the [Window Functions](builtins.html#WindowFunctions) section for details of
-the syntax of individual functions.
-
-If this clause is omitted, all tuples are considered peers, i.e. their order
-is tied.
+This clause is optional.
+If omitted, all tuples are considered peers, i.e. their order is tied.
 When tuples in the window partition are tied, each window function behaves
 differently.
 
@@ -1904,11 +1898,15 @@ The window frame can be defined in the following ways:
 If this clause uses `RANGE` with either `Expression PRECEDING` or
 `Expression FOLLOWING`, the [window order clause](#Window_order_clause) must
 have only a single ordering term.
+
+The ordering term expression must evaluate to a number.
+<!--
 The ordering term expression must evaluate to a number, a date, a time, or a
 datetime.
 If the ordering term expression evaluates to a date, a time, or a datetime, the
 expression in `Expression PRECEDING` or `Expression FOLLOWING` must evaluate to
 a duration.
+-->
 
 If these conditions are not met, the window frame will be empty,
 which means the window function will return its default
@@ -1930,10 +1928,11 @@ data type that can be added to the ordering expression.
 
 #### <a id="Window_frame_extent">Window Frame Extent</a> ####
 
-    WindowFrameExtent ::= ( <UNBOUNDED> <PRECEDING> | <CURRENT> <ROW> |
-    Expression <FOLLOWING> ) | <BETWEEN> ( <UNBOUNDED> <PRECEDING> | <CURRENT>
-    <ROW> | Expression ( <PRECEDING> | <FOLLOWING> ) ) <AND> ( <UNBOUNDED>
-    <FOLLOWING> | <CURRENT> <ROW> | Expression ( <PRECEDING> | <FOLLOWING> ) )
+    WindowFrameExtent ::= ( ( <UNBOUNDED> | Expression ) <PRECEDING> | <CURRENT> <ROW> ) |
+    <BETWEEN>
+      ( <UNBOUNDED> <PRECEDING> | <CURRENT> <ROW> | Expression ( <PRECEDING> | <FOLLOWING> ) )
+    <AND>
+      ( <UNBOUNDED> <FOLLOWING> | <CURRENT> <ROW> | Expression ( <PRECEDING> | <FOLLOWING> ) )
 
 The **window frame extent clause** specifies the start point and end point of
 the window frame.