You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@age.apache.org by jo...@apache.org on 2021/10/21 23:04:48 UTC

[incubator-age-website] branch master updated: Updated Aggregation documentation

This is an automated email from the ASF dual-hosted git repository.

joshinnis pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-age-website.git


The following commit(s) were added to refs/heads/master by this push:
     new bf32017  Updated Aggregation documentation
bf32017 is described below

commit bf32017e0dfc2804fa9da06deaf36481ed84aadb
Author: Josh Innis <Jo...@gmail.com>
AuthorDate: Thu Oct 21 16:03:27 2021 -0700

    Updated Aggregation documentation
---
 docs/conf.py                          |   2 +-
 docs/functions/aggregate_functions.md |  23 ++-
 docs/intro/aggregation.md             | 308 +++++++++++++++++++++++++++++++++-
 3 files changed, 325 insertions(+), 8 deletions(-)

diff --git a/docs/conf.py b/docs/conf.py
index b5d4e69..684e72a 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -57,7 +57,7 @@ html_theme = 'sphinx_rtd_theme'
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ['_static', 'css']
+html_static_path = ['_static', 'css', 'scripts']
 
 html_logo = 'images/logo.png'
 html_favicon = 'images/favicon.ico'
diff --git a/docs/functions/aggregate_functions.md b/docs/functions/aggregate_functions.md
index d08f0cf..2c6bdd0 100644
--- a/docs/functions/aggregate_functions.md
+++ b/docs/functions/aggregate_functions.md
@@ -1,6 +1,27 @@
 # Aggregation Functions
 
-<a href="../scripts/functions/aggregation_setup.sql">Data Setup Script</a>
+Functions that activate [auto aggregation](../intro/aggregation.html#).
+
+## Data Setup
+```
+LOAD 'age';
+SET search_path TO ag_catalog;
+
+SELECT create_graph('graph_name');
+
+SELECT * FROM cypher('graph_name', $$
+	CREATE (a:Person {name: 'A', age: 13}),
+	(b:Person {name: 'B', age: 33, eyes: "blue"}),
+	(c:Person {name: 'C', age: 44, eyes: "blue"}),
+	(d1:Person {name: 'D', eyes: "brown"}),
+	(d2:Person {name: 'D'}),
+	(a)-[:KNOWS]->(b),
+	(a)-[:KNOWS]->(c),
+	(a)-[:KNOWS]->(d1),
+	(b)-[:KNOWS]->(d2),
+	(c)-[:KNOWS]->(d2)
+$$) as (a agtype);
+```
 
 ## min()
 
diff --git a/docs/intro/aggregation.md b/docs/intro/aggregation.md
index 2c2d80d..d49c8f1 100644
--- a/docs/intro/aggregation.md
+++ b/docs/intro/aggregation.md
@@ -1,13 +1,309 @@
 # Aggregation
 
-Generally an aggregation aggr(expr) processes all matching rows for each aggregation key found in an incoming record (keys are compared using equivalence).
 
-For a fixed aggregation key and each matching record, expr is evaluated to a value. This yields a listof  candidate  values.  Generally  the  order  of  candidate  values  is  unspecified.  If  the  aggregation happens  in  a  projection  with  an  associated  ORDER  BY  subclause,  the  list  of  candidate  values  isordered  in  the  same  way  as  the  underlying  records  and  as  specified  by  the  associated  ORDER  BY subclause.
+## Introduction 
+
+Generally an aggregation aggr(expr) processes all matching rows for each aggregation key found in an incoming record (keys are compared using [equivalence](./comparability.html#)).
+
+In a regular aggregation (i.e. of the form aggr(expr)), the list of aggregated values is the list of candidate values with all null values removed from it.
+
+## Data Setup
+
+```
+SELECT * FROM cypher('graph_name', $$
+	CREATE (a:Person {name: 'A', age: 13}),
+	(b:Person {name: 'B', age: 33, eyes: "blue"}),
+	(c:Person {name: 'C', age: 44, eyes: "blue"}),
+	(d1:Person {name: 'D', eyes: "brown"}),
+	(d2:Person {name: 'D'}),
+	(a)-[:KNOWS]->(b),
+	(a)-[:KNOWS]->(c),
+	(a)-[:KNOWS]->(d1),
+	(b)-[:KNOWS]->(d2),
+	(c)-[:KNOWS]->(d2)
+$$) as (a agtype);
+```
+
+## Auto Group By
+To calculate aggregated data, Cypher offers aggregation, analogous to SQL’s GROUP BY.
+
+Aggregating functions take a  set of values and calculate An aggregated value over them. Examples are [avg()](../functions/aggregate_functions.html#avg) that calculates the average of multiple numeric values, or [min()](../functions/aggregate_functions.html#min) that finds the smallest numeric or string value in a set of values. When we say below that an aggregating function operates on a set of values, we mean these to be the result of the application of the inner expression(such as n.a [...]
+
+Aggregation can be computed over all the matching subgraphs, or it can be further divided by introducing grouping keys. These are non-aggregate expressions, that are used to group the valuesgoing into the aggregate functions.
+
+Assume we have the following return statement:
+```
+SELECT * FROM cypher('graph_name', $$
+	MATCH (v:Person)
+	RETURN v.name, count(*)
+$$) as (grouping_key agtype, count agtype);
+```
+
+<table>
+  <tr>
+   <td>count</td>
+   <td>key</td>
+  </tr>
+  <tr>
+   <td>"A"</td>
+   <td>1</td>
+  </tr>
+  <tr>
+   <td>"B"</td>
+   <td>1</td>
+  </tr>
+  <tr>
+   <td>"C"</td>
+   <td>1</td>
+  </tr>
+  <tr>
+   <td>"D"</td>
+   <td>2</td>
+  </tr>
+  <tr>
+   <td colspan="2">1 row</td>
+  </tr>
+</table>
+
+
+We have two return expressions: grouping_key, and count(*). The first, grouping_key, is not an aggregate function, and so it will  be  the  grouping  key. The latter, count(*) is an aggregate expression. The matching subgraphs will be divided into different  buckets, depending on the grouping key. The aggregate function will then be run on these buckets, calculating an aggregate value per bucket. 
+
+## Sorting on aggregate functions
+
+To use aggregations to sort the result set, the aggregation must be included in the RETURN to be used in the ORDER BY.
+
+```
+SELECT *
+FROM cypher('graph_name', $$
+	MATCH (me:Person)-[]->(friend:Person)
+	RETURN count(friend), me
+	ORDER BY count(friend)
+$$) as (friends agtype, me agtype);
+```
+
+## Distinct aggregation
+In a distinct aggregation (i.e. of the form aggr(DISTINCT expr)), the list of aggregated values is the list of candidate values with all null values  removed from it. Furthermore, in a distinct aggregation, only one of all equivalent candidate values is included in the list of aggregated values, i.e. duplicates under equivalence are  removed. 
+
+
+The DISTINCT operator works in conjunction with aggregation. It is used to make all values unique before running them  through an aggregate function.
+
+```
+SELECT *
+FROM cypher('graph_name', $$
+	MATCH (v:Person)
+	RETURN count(DISTINCT v.eyes), count(v.eyes)
+$$) as (distinct_eyes agtype, eyes agtype);
+```
+
+<table>
+  <tr>
+   <td>distinct_eyes</td>
+   <td>eyes</td>
+  </tr>
+  <tr>
+   <td>2</td>
+   <td>3</td>
+  </tr>
+  <tr>
+   <td colspan="2">1 row</td>
+  </tr>
+</table>
+
+## Ambiguous Grouping Statements
+
+This feature of not requiring the user to specifiy their grouping keys for a query allows for ambiguity on what Cypher should qualify as their grouping keys. For more details [click here.](https://opencypher.org/articles/2017/07/27/ocig1-aggregations-article/)
+
+Data Setup 
+```
+SELECT * FROM cypher('graph_name', $$
+CREATE (:L {a: 1, b: 2, c: 3}),
+       (:L {a: 2, b: 3, c: 1}),
+       (:L {a: 3, b: 1, c: 2})
+$$) as (a agtype);
+```
+
+### Invalid Query in AGE
+AGE's solution to this problem is to not allow a WITH or RETURN column to combine aggregate functions with variables that are not explicitly listed in another column of the same WITH or RETURN clause.
+
+
+
+Query:
+```
+SELECT * FROM cypher('graph_name', $$
+	MATCH (x:L)
+	RETURN x.a + count(*) + x.b + count(*) + x.c
+$$) as (a agtype);
+```
+
+Result:
+```
+ERROR:  "x" must be either part of an explicitly listed key or used inside an aggregate function
+LINE 3: RETURN x.a + count(*) + x.b + count(*) + x.c
+```
+
+
+### Valid Query in AGE
+Columns that do not include an aggregate function in AGE are considered to be the grouping keys for that WITH or RETURN clause. 
+
+For the above query, the user could rewrite the query is several ways that will return results
+
+Query:
+```
+SELECT * FROM cypher('graph_name', $$
+	MATCH (x:L)
+	RETURN (x.a + x.b + x.c) + count(*) + count(*), x.a + x.b + x.c
+$$) as (count agtype, key agtype);
+```
+
+x.a + x.b + x.c is the grouping key. Grouping keys created like this must include parenthesis.
+
+Results
+<table>
+  <tr>
+   <td>count</td>
+   <td>key</td>
+  </tr>
+  <tr>
+   <td>12</td>
+   <td>6</td>
+  </tr>
+  <tr>
+   <td colspan="2">1 row</td>
+  </tr>
+</table>
+
+
+
+Query
+```
+SELECT * FROM cypher('graph_name', $$
+	MATCH (x:L)
+	RETURN x.a + count(*) + x.b + count(*) + x.c, x.a, x.b, x.c
+$$) as (count agtype, a agtype, b agtype, c agtype);
+```
+
+x.a, x.b, and x.c will be considered different grouping keys
+
+Results:
+
+<table>
+  <thead>
+  <tr>
+   <td>count</td>
+   </td>a<td>
+   </td>b<td>
+   </td>c<td>
+  </tr>
+  </thead>
+  <tr>
+   <td>8</td>
+   <td>3</td>
+   <td>1</td>
+   <td>2</td>
+  </tr>
+  <tr>
+   <td>8</td>
+   <td>2</td>
+   <td>3</td>
+   <td>1</td>
+  </tr>
+  <tr>
+   <td>8</td>
+   <td>1</td>
+   <td>2</td>
+   <td>3</td>
+  </tr>
+  <tr>
+   <td colspan="4">3 rows</td>
+  </tr>
+</table>
+
+### Vertices and edges in ambiguous grouping
+
+Alternatively, the grouping key can be a vertex or edge, and then any properties of the vertex or edge can be specified without being explicitly stated in a WITH or RETURN column.
+
+```
+SELECT * FROM cypher('graph_name', $$
+	MATCH (x:L)
+	RETURN count(*) + count(*) + x.a + x.b + x.c, x
+$$) as (count agtype, key agtype);
+```
+
+Results will be grouped on x, because it is safe to assume that properties be considered unecessary for grouping to be unambiguous.
+
+Results
+<table>
+  <thead>
+  <tr>
+   <td>count</td>
+   </td>key<td>
+  </tr>
+  </thead>
+  <tr>
+   <td>8</td>
+   <td>{"id": 1407374883553283, "label": "L", "properties": {"a": 3, "b": 1, "c": 2}}::vertex</td>
+  </tr>
+  <tr>
+   <td>8</td>
+   <td>{"id": 1407374883553281, "label": "L", "properties": {"a": 1, "b": 2, "c": 3}}::vertex</td>
+  </tr>
+  <tr>
+   <td>8</td>
+   <td>{"id": 1407374883553282, "label": "L", "properties": {"a": 2, "b": 3, "c": 1}}::vertex</td>
+  </tr>
+  <tr>
+   <td colspan="4">3 rows</td>
+  </tr>
+</table>
+
+
+### Hiding unwanted grouping keys
+
+If the grouping key is considered unecessary for the query output, the aggregation can be done in a WITH clause then passing information to the RETURN clause.
+
+SELECT * FROM cypher('graph_name', $$
+	MATCH (x:L)
+	WITH count(*) + count(*) + x.a + x.b + x.c as column, x
+	RETURN column
+$$) as (a agtype);
+
+Results
+<table>
+  <thead>
+  <tr>
+   <td>a</td>
+  </tr>
+  </thead>
+  <tr>
+   <td>8</td>
+  </tr>
+  <tr>
+   <td>8</td>
+  </tr>
+  <tr>
+   <td>8</td>
+  </tr>
+  <tr>
+   <td colspan="1">3 rows</td>
+  </tr>
+</table>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
 
-In  a  regular  aggregation  (i.e.  of  the  form  aggr(expr)),  the  list  of  aggregated  values  is  the  list  of candidate values with all null values removed from it.In a distinct aggregation (i.e. of the form aggr(DISTINCT expr)), the list of aggregated values is the listof  candidate  values  with  all  null  values  removed  from  it.  Furthermore,  in  a  distinct  aggregation,only  one  of  all  equivalent  candidate  values  is  included  in  the  list  of  aggregated  values [...]
 
-Finally, the remaining aggregated values are processed by the actual aggregation function. If the list of aggregated  values  is  empty,  the  aggregation  function  returns  a  default  value  (null  unlessspecified  otherwise  below).  Aggregating  values  of  different  types  (like  summing  a  number  and  astring) may lead to runtime errors.
 
-See 
 
-aggregation functions for more details.