You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@asterixdb.apache.org by mb...@apache.org on 2021/08/13 14:14:00 UTC

[asterixdb] 03/05: [NO ISSUE][DOC] Documentation for ROLLUP and CUBE

This is an automated email from the ASF dual-hosted git repository.

mblow pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/asterixdb.git

commit 71cd417ee6c279b20968a9f1067765a7e00f78f2
Author: Simon Dew <Si...@couchbase.com>
AuthorDate: Thu Aug 12 12:06:02 2021 +0100

    [NO ISSUE][DOC] Documentation for ROLLUP and CUBE
    
      - Add documentation for ROLLUP subclause
      - Add documentation for CUBE subclause
    
    Change-Id: I67a63f17ca459e313321bce569056a7f07f3a17f
    Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/12783
    Integration-Tests: Jenkins <je...@fulliautomatix.ics.uci.edu>
    Tested-by: Jenkins <je...@fulliautomatix.ics.uci.edu>
    Reviewed-by: Dmitry Lychagin <dm...@couchbase.com>
---
 .../asterix-doc/src/main/markdown/sqlpp/3_query.md | 323 ++++++++++++++++++++-
 1 file changed, 322 insertions(+), 1 deletion(-)

diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
index 3e70922..c54467f 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
@@ -742,7 +742,7 @@ Of course, a grouping expression need not be a simple field-name. In Q3.18, orde
     WHERE get_year(date(o.order_date)) = 2020
     GROUP BY get_month(date(o.order_date)) AS month
     SELECT month, COUNT(*) AS order_count
-    ORDER BY order_count desc
+    ORDER BY order_count DESC, month DESC
     LIMIT 3;
 
 Result:
@@ -794,6 +794,327 @@ Result:
         }
     ]
 
+#### <a id="Rollup">ROLLUP</a>
+
+The `ROLLUP` subclause is an aggregation feature that extends the functionality of the `GROUP BY` clause.
+It returns extra _super-aggregate_ items in the query results, giving subtotals and a grand total for the aggregate
+functions in the query.
+To illustrate, first consider the following query.
+
+##### Example
+
+(Q3.R1) List the number of orders, grouped by customer region and city.
+
+    SELECT customer_region AS Region,
+           customer_city AS City,
+           COUNT(o.orderno) AS `Order Count`
+    FROM customers AS c LEFT OUTER JOIN orders AS o ON c.custid = o.custid
+    LET address_line = SPLIT(c.address.city, ","),
+        customer_city = TRIM(address_line[0]),
+        customer_region = TRIM(address_line[1])
+    GROUP BY customer_region, customer_city
+    ORDER BY customer_region ASC, customer_city ASC, `Order Count` DESC;
+
+Result:
+
+    [
+      {
+        "Region": "Italy",
+        "City": "Rome",
+        "Order Count": 0
+      },
+      {
+        "Region": "MA",
+        "City": "Boston",
+        "Order Count": 2
+      },
+      {
+        "Region": "MA",
+        "City": "Hanover",
+        "Order Count": 0
+      },
+      {
+        "Region": "MO",
+        "City": "St. Louis",
+        "Order Count": 7
+      }
+    ]
+
+This query uses string functions to split each customer's address into city and region.
+The query then counts the total number of orders placed by each customer, and groups the results first by customer
+region, then by customer city.
+The aggregate results (labeled `Order Count`) are only shown by city, and there are no subtotals or grand total.
+We can add these using the `ROLLUP` subclause, as in the following example.
+
+##### Example
+
+(Q3.R2) List the number of orders by customer region and city, including subtotals and a grand total.
+
+    SELECT customer_region AS Region,
+           customer_city AS City,
+           COUNT(o.orderno) AS `Order Count`
+    FROM customers AS c LEFT OUTER JOIN orders AS o ON c.custid = o.custid
+    LET address_line = SPLIT(c.address.city, ","),
+        customer_city = TRIM(address_line[0]),
+        customer_region = TRIM(address_line[1])
+    GROUP BY ROLLUP(customer_region, customer_city)
+    ORDER BY customer_region ASC, customer_city ASC, `Order Count` DESC;
+
+Result:
+
+    [
+      {
+        "Region": null,
+        "City": null,
+        "Order Count": 9
+      },
+      {
+        "Region": "Italy",
+        "City": null,
+        "Order Count": 0
+      },
+      {
+        "Region": "Italy",
+        "City": "Rome",
+        "Order Count": 0
+      },
+      {
+        "Region": "MA",
+        "City": null,
+        "Order Count": 2
+      },
+      {
+        "Region": "MA",
+        "City": "Boston",
+        "Order Count": 2
+      },
+      {
+        "Region": "MA",
+        "City": "Hanover",
+        "Order Count": 0
+      },
+      {
+        "Region": "MO",
+        "City": null,
+        "Order Count": 7
+      },
+      {
+        "Region": "MO",
+        "City": "St. Louis",
+        "Order Count": 7
+      }
+    ]
+
+With the addition of the `ROLLUP` subclause, the results now include an extra item at the start of each region,
+giving the subtotal for that region.
+There is also another extra item at the very start of the results, giving the grand total for all regions.
+
+The order of the fields specified by the `ROLLUP` subclause determines the hierarchy of the super-aggregate items.
+The customer region is specified first, followed by the customer city; so the results are aggregated by region first,
+and then by city within each region.
+
+The grand total returns `null` as a value for the city and the region, and the subtotals return `null` as the
+value for the city, which may make the results hard to understand at first glance.
+A workaround for this is given in the next example.
+
+##### Example
+
+(Q3.R3) List the number of orders by customer region and city, with meaningful subtotals and grand total.
+
+    SELECT IFNULL(customer_region, "All regions") AS Region,
+           IFNULL(customer_city, "All cities") AS City,
+           COUNT(o.orderno) AS `Order Count`
+    FROM customers AS c LEFT OUTER JOIN orders AS o ON c.custid = o.custid
+    LET address_line = SPLIT(c.address.city, ","),
+        customer_city = TRIM(address_line[0]),
+        customer_region = TRIM(address_line[1])
+    GROUP BY ROLLUP(customer_region, customer_city)
+    ORDER BY customer_region ASC, customer_city ASC, `Order Count` DESC;
+
+Result:
+
+    [
+      {
+        "Region": "All regions",
+        "City": "All cities",
+        "Order Count": 9
+      },
+      {
+        "Region": "Italy",
+        "City": "All cities",
+        "Order Count": 0
+      },
+      {
+        "Region": "Italy",
+        "City": "Rome",
+        "Order Count": 0
+      },
+      {
+        "Region": "MA",
+        "City": "All cities",
+        "Order Count": 2
+      },
+      {
+        "Region": "MA",
+        "City": "Boston",
+        "Order Count": 2
+      },
+      {
+        "Region": "MA",
+        "City": "Hanover",
+        "Order Count": 0
+      },
+      {
+        "Region": "MO",
+        "City": "All cities",
+        "Order Count": 7
+      },
+      {
+        "Region": "MO",
+        "City": "St. Louis",
+        "Order Count": 7
+      }
+    ]
+
+This query uses the `IFNULL` function to populate the region and city fields with meaningful values for the
+super-aggregate items.
+This makes the results clearer and more readable.
+
+#### <a id="Cube">CUBE</a>
+
+The `CUBE` subclause is similar to the `ROLLUP` subclause, in that it returns extra super-aggregate items in the query
+results, giving subtotals and a grand total for the aggregate functions.
+Whereas `ROLLUP` returns a grand total and a hierarchy of subtotals based on the specified fields,
+the `CUBE` subclause returns a grand total and subtotals for every possible combination of the specified fields.
+
+The following example is a modification of Q3.R3 which illustrates the `CUBE` subclause.
+
+##### Example
+
+(Q3.C) List the number of orders by customer region and order date, with all possible subtotals and a grand total.
+
+    SELECT IFNULL(customer_region, "All regions") AS Region,
+           IFNULL(order_month, "All months") AS Month,
+           COUNT(o.orderno) AS `Order Count`
+    FROM customers AS c INNER JOIN orders AS o ON c.custid = o.custid
+    LET address_line = SPLIT(c.address.city, ","),
+        customer_region = TRIM(address_line[1]),
+        order_month = get_month(date(o.order_date))
+    GROUP BY CUBE(customer_region, order_month)
+    ORDER BY customer_region ASC, order_month ASC;
+
+Result:
+
+    [
+      {
+        "Region": "All regions",
+        "Order Count": 9,
+        "Month": "All months"
+      },
+      {
+        "Region": "All regions",
+        "Order Count": 1,
+        "Month": 4
+      },
+      {
+        "Region": "All regions",
+        "Order Count": 1,
+        "Month": 5
+      },
+      {
+        "Region": "All regions",
+        "Order Count": 1,
+        "Month": 6
+      },
+      {
+        "Region": "All regions",
+        "Order Count": 1,
+        "Month": 7
+      },
+      {
+        "Region": "All regions",
+        "Order Count": 1,
+        "Month": 8
+      },
+      {
+        "Region": "All regions",
+        "Order Count": 2,
+        "Month": 9
+      },
+      {
+        "Region": "All regions",
+        "Order Count": 2,
+        "Month": 10
+      },
+      {
+        "Region": "MA",
+        "Order Count": 2,
+        "Month": "All months"
+      },
+      {
+        "Region": "MA",
+        "Order Count": 1,
+        "Month": 7
+      },
+      {
+        "Region": "MA",
+        "Order Count": 1,
+        "Month": 8
+      },
+      {
+        "Region": "MO",
+        "Order Count": 7,
+        "Month": "All months"
+      },
+      {
+        "Region": "MO",
+        "Order Count": 1,
+        "Month": 4
+      },
+      {
+        "Region": "MO",
+        "Order Count": 1,
+        "Month": 5
+      },
+      {
+        "Region": "MO",
+        "Order Count": 1,
+        "Month": 6
+      },
+      {
+        "Region": "MO",
+        "Order Count": 2,
+        "Month": 9
+      },
+      {
+        "Region": "MO",
+        "Order Count": 2,
+        "Month": 10
+      }
+    ]
+
+To simplify the results, this query uses an inner join, so that customers who have not placed an order are not included
+in the totals.
+The query uses string functions to extract the region from each customer's address,
+and a temporal function to extract the year from the order date.
+
+The query uses the `CUBE` subclause with customer region and order month.
+This means that there are four possible aggregates to calculate:
+
+* All regions, all months
+* All regions, each month
+* Each region, all months
+* Each region, each month
+
+The results start with the grand total, showing the total number of orders across all regions for all months.
+This is followed by date subtotals, showing the number of orders across all regions for each month.
+Following that are the regional subtotals, showing the total number of orders for all months in each region;
+and the result items, giving the number of orders for each month in each region.
+
+The query also uses the `IFNULL` function to populate the region and date fields with meaningful values for the
+super-aggregate items.
+This makes the results clearer and more readable.
+
 ### <a id="Having_clauses">HAVING Clause</a>
 
 ##### HavingClause