You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@unomi.apache.org by sh...@apache.org on 2019/10/11 13:10:38 UTC
[unomi] 01/01: UNOMI-252 Add documentation for query counts, query metrics and aggregations - Added the suggested documentation

This is an automated email from the ASF dual-hosted git repository.

shuber pushed a commit to branch UNOMI-252-aggregation-doc
in repository https://gitbox.apache.org/repos/asf/unomi.git

commit c8cf4392ff76683e77ce79b370930629d79e472a
Author: Serge Huber <sh...@apache.org>
AuthorDate: Fri Oct 11 15:10:31 2019 +0200

    UNOMI-252 Add documentation for query counts, query metrics and aggregations
    - Added the suggested documentation
    
    Signed-off-by: Serge Huber <sh...@apache.org>
---
 manual/src/main/asciidoc/index.adoc                |   4 +
 .../main/asciidoc/queries-and-aggregations.adoc    | 344 +++++++++++++++++++++
 2 files changed, 348 insertions(+)

diff --git a/manual/src/main/asciidoc/index.adoc b/manual/src/main/asciidoc/index.adoc
index 79b2a6d..4f1f2b2 100644
--- a/manual/src/main/asciidoc/index.adoc
+++ b/manual/src/main/asciidoc/index.adoc
@@ -48,6 +48,10 @@ include::useful-unomi-urls.adoc[]
 
 include::how-profile-tracking-works.adoc[]
 
+== Queries and aggregations
+
+include::queries-and-aggregations.adoc[]
+
 == Profile import & export
 
 include::profile-import-export.adoc[]
diff --git a/manual/src/main/asciidoc/queries-and-aggregations.adoc b/manual/src/main/asciidoc/queries-and-aggregations.adoc
new file mode 100644
index 0000000..269ef1d
--- /dev/null
+++ b/manual/src/main/asciidoc/queries-and-aggregations.adoc
@@ -0,0 +1,344 @@
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//      http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+//
+Apache Unomi contains a `query` endpoint that is quite powerful. It provides ways to perform queries that can quickly
+get result counts, apply metrics such as sum/min/max/avg or even use powerful aggregations.
+
+In this section we will show examples of requests that may be built using this API.
+
+=== Query counts
+
+Query counts are highly optimized queries that will count the number of objects that match a certain condition without
+retrieving the results. This can be used for example to quickly figure out how many objects will match a given condition
+before actually retrieving the results. It uses ElasticSearch/Lucene optimizations to avoid the cost of loading all the
+resulting objects.
+
+Here's an example of a query:
+
+[source,bash]
+----
+curl -X POST http://localhost:8181/cxs/query/profile/count \
+--user karaf:karaf \
+-H "Content-Type: application/json" \
+-d @- <<'EOF'
+{
+  "parameterValues": {
+    "subConditions": [
+      {
+        "type": "profilePropertyCondition",
+        "parameterValues": {
+          "propertyName": "systemProperties.isAnonymousProfile",
+          "comparisonOperator": "missing"
+        }
+      },
+      {
+        "type": "profilePropertyCondition",
+        "parameterValues": {
+          "propertyName": "properties.nbOfVisits",
+          "comparisonOperator": "equals",
+          "propertyValueInteger": 1
+        }
+      }
+    ],
+    "operator": "and"
+  },
+  "type": "booleanCondition"
+}
+EOF
+----
+
+The above result will return the profile count of all the profiles
+
+Result will be something like this:
+
+    2084
+
+=== Metrics
+
+Metric queries make it possible to apply functions to the resulting property. The supported metrics are:
+
+- sum
+- avg
+- min
+- max
+
+It is also possible to request more than one metric in a single request by concatenating them with a "/" in the URL.
+Here's an example request that uses the `sum` and `avg` metrics:
+
+[source]
+----
+curl -X POST http://localhost:8181/cxs/query/session/profile.properties.nbOfVisits/sum/avg \
+--user karaf:karaf \
+-H "Content-Type: application/json" \
+-d @- <<'EOF'
+{
+  "parameterValues": {
+    "subConditions": [
+      {
+        "type": "sessionPropertyCondition",
+        "parameterValues": {
+          "comparisonOperator": "equals",
+          "propertyName": "scope",
+          "propertyValue": "digitall"
+        }
+      }
+    ],
+    "operator": "and"
+  },
+  "type": "booleanCondition"
+}
+EOF
+----
+
+The result will look something like this:
+
+[source,json]
+----
+{
+   "_avg":1.0,
+   "_sum":9.0
+}
+----
+
+
+=== Aggregations
+
+Aggregations are a very powerful way to build queries in Apache Unomi that will collect and aggregate data by filtering
+on certain conditions.
+
+Aggregations are composed of :
+- an object type and a property on which to aggregate
+- an aggregation setup (how data will be aggregated, by date, by numeric range, date range or ip range)
+- a condition (used to filter the data set that will be aggregated)
+
+==== Aggregation types
+
+Aggregations may be of different types. They are listed here below.
+
+===== Date
+
+Date aggregations make it possible to automatically generate "buckets" by time periods. For more information about the
+format, it is directly inherited from ElasticSearch and you may find it here: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-aggregations-bucket-datehistogram-aggregation.html
+
+Here's an example of a request to retrieve a histogram of by day of all the session that have been create by newcomers (nbOfVisits=1)
+
+[source]
+----
+curl -X POST http://localhost:8181/cxs/query/session/timeStamp \
+--user karaf:karaf \
+-H "Content-Type: application/json" \
+-d @- <<'EOF'
+{
+  "aggregate": {
+    "type": "date",
+    "parameters": {
+      "interval": "1d",
+      "format": "yyyy-MM-dd"
+    }
+  },
+  "condition": {
+    "type": "booleanCondition",
+    "parameterValues": {
+      "operator": "and",
+      "subConditions": [
+        {
+          "type": "sessionPropertyCondition",
+          "parameterValues": {
+            "propertyName": "scope",
+            "comparisonOperator": "equals",
+            "propertyValue": "acme"
+          }
+        },
+        {
+          "type": "sessionPropertyCondition",
+          "parameterValues": {
+            "propertyName": "profile.properties.nbOfVisits",
+            "comparisonOperator": "equals",
+            "propertyValueInteger": 1
+          }
+        }
+      ]
+    }
+  }
+}
+EOF
+----
+
+The above request will produce a similar that looks like this:
+
+[source,json]
+----
+{
+  "_all": 8062,
+  "_filtered": 4085,
+  "2018-10-02": 3,
+  "2018-10-03": 17,
+  "2018-10-04": 18,
+  "2018-10-05": 19,
+  "2018-10-06": 23,
+  "2018-10-07": 18,
+  "2018-10-08": 20
+}
+----
+
+You can see that we retrieve the count of newcomers aggregated by day.
+
+===== Date range
+
+Date ranges make it possible to "bucket" dates, for example to regroup profiles by their birth date as in the example
+below:
+
+[source,shell script]
+----
+curl -X POST http://localhost:8181/cxs/query/profile/properties.birthDate \
+--user karaf:karaf \
+-H "Content-Type: application/json" \
+-d @- <<'EOF'
+{
+  "aggregate": {
+    "property": "properties.birthDate",
+    "type": "dateRange",
+    "dateRanges": [
+      {
+        "key": "After 2009",
+        "from": "now-10y/y",
+        "to": null
+      },
+      {
+        "key": "Between 1999 and 2009",
+        "from": "now-20y/y",
+        "to": "now-10y/y"
+      },
+      {
+        "key": "Between 1989 and 1999",
+        "from": "now-30y/y",
+        "to": "now-20y/y"
+      },
+      {
+        "key": "Between 1979 and 1989",
+        "from": "now-40y/y",
+        "to": "now-30y/y"
+      },
+      {
+        "key": "Between 1969 and 1979",
+        "from": "now-50y/y",
+        "to": "now-40y/y"
+      },
+      {
+        "key": "Before 1969",
+        "from": null,
+        "to": "now-50y/y"
+      }
+    ]
+  },
+  "condition": {
+    "type": "matchAllCondition",
+    "parameterValues": {}
+  }
+}
+EOF
+----
+
+The resulting JSON response will look something like this:
+
+[source,json]
+----
+{
+    "_all":4095,
+    "_filtered":4095,
+    "Before 1969":2517,
+    "Between 1969 and 1979":353,
+    "Between 1979 and 1989":336,
+    "Between 1989 and 1999":337,
+    "Between 1999 and 2009":35,
+    "After 2009":0,
+    "_missing":517
+}
+----
+
+You can find more information about the date range formats here: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-aggregations-bucket-daterange-aggregation.html
+
+
+===== Numeric range
+
+Numeric ranges make it possible to use "buckets" for the various ranges you want to classify.
+
+Here's an example of a using numeric range to regroup profiles by number of visits:
+
+[source,shell script]
+----
+curl -X POST http://localhost:8181/cxs/query/profile/properties.nbOfVisits \
+--user karaf:karaf \
+-H "Content-Type: application/json" \
+-d @- <<'EOF'
+{
+  "aggregate": {
+    "property": "properties.nbOfVisits",
+    "type": "numericRange",
+    "numericRanges": [
+      {
+        "key": "Less than 5",
+        "from": null,
+        "to": 5
+      },
+      {
+        "key": "Between 5 and 10",
+        "from": 5,
+        "to": 10
+      },
+      {
+        "key": "Between 10 and 20",
+        "from": 10,
+        "to": 20
+      },
+      {
+        "key": "Between 20 and 40",
+        "from": 20,
+        "to": 40
+      },
+      {
+        "key": "Between 40 and 80",
+        "from": 40,
+        "to": 80
+      },
+      {
+        "key": "Greater than 100",
+        "from": 100,
+        "to": null
+      }
+    ]
+  },
+  "condition": {
+    "type": "matchAllCondition",
+    "parameterValues": {}
+  }
+}
+EOF
+----
+
+This will produce an output that looks like this:
+
+[source,json]
+----
+{
+    "_all":4095,
+    "_filtered":4095,
+    "Less than 5":3855,
+    "Between 5 and 10":233,
+    "Between 10 and 20":7,
+    "Between 20 and 40":0,
+    "Between 40 and 80":0,
+    "Greater than 100":0
+}
+----
+