You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@unomi.apache.org by sh...@apache.org on 2019/10/11 13:10:37 UTC

[unomi] branch UNOMI-252-aggregation-doc created (now c8cf439)

This is an automated email from the ASF dual-hosted git repository.

shuber pushed a change to branch UNOMI-252-aggregation-doc
in repository https://gitbox.apache.org/repos/asf/unomi.git.


      at c8cf439  UNOMI-252 Add documentation for query counts, query metrics and aggregations - Added the suggested documentation

This branch includes the following new commits:

     new c8cf439  UNOMI-252 Add documentation for query counts, query metrics and aggregations - Added the suggested documentation

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[unomi] 01/01: UNOMI-252 Add documentation for query counts, query metrics and aggregations - Added the suggested documentation

Posted by sh...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

shuber pushed a commit to branch UNOMI-252-aggregation-doc
in repository https://gitbox.apache.org/repos/asf/unomi.git

commit c8cf4392ff76683e77ce79b370930629d79e472a
Author: Serge Huber <sh...@apache.org>
AuthorDate: Fri Oct 11 15:10:31 2019 +0200

    UNOMI-252 Add documentation for query counts, query metrics and aggregations
    - Added the suggested documentation
    
    Signed-off-by: Serge Huber <sh...@apache.org>
---
 manual/src/main/asciidoc/index.adoc                |   4 +
 .../main/asciidoc/queries-and-aggregations.adoc    | 344 +++++++++++++++++++++
 2 files changed, 348 insertions(+)

diff --git a/manual/src/main/asciidoc/index.adoc b/manual/src/main/asciidoc/index.adoc
index 79b2a6d..4f1f2b2 100644
--- a/manual/src/main/asciidoc/index.adoc
+++ b/manual/src/main/asciidoc/index.adoc
@@ -48,6 +48,10 @@ include::useful-unomi-urls.adoc[]
 
 include::how-profile-tracking-works.adoc[]
 
+== Queries and aggregations
+
+include::queries-and-aggregations.adoc[]
+
 == Profile import & export
 
 include::profile-import-export.adoc[]
diff --git a/manual/src/main/asciidoc/queries-and-aggregations.adoc b/manual/src/main/asciidoc/queries-and-aggregations.adoc
new file mode 100644
index 0000000..269ef1d
--- /dev/null
+++ b/manual/src/main/asciidoc/queries-and-aggregations.adoc
@@ -0,0 +1,344 @@
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//      http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+//
+Apache Unomi contains a `query` endpoint that is quite powerful. It provides ways to perform queries that can quickly
+get result counts, apply metrics such as sum/min/max/avg or even use powerful aggregations.
+
+In this section we will show examples of requests that may be built using this API.
+
+=== Query counts
+
+Query counts are highly optimized queries that will count the number of objects that match a certain condition without
+retrieving the results. This can be used for example to quickly figure out how many objects will match a given condition
+before actually retrieving the results. It uses ElasticSearch/Lucene optimizations to avoid the cost of loading all the
+resulting objects.
+
+Here's an example of a query:
+
+[source,bash]
+----
+curl -X POST http://localhost:8181/cxs/query/profile/count \
+--user karaf:karaf \
+-H "Content-Type: application/json" \
+-d @- <<'EOF'
+{
+  "parameterValues": {
+    "subConditions": [
+      {
+        "type": "profilePropertyCondition",
+        "parameterValues": {
+          "propertyName": "systemProperties.isAnonymousProfile",
+          "comparisonOperator": "missing"
+        }
+      },
+      {
+        "type": "profilePropertyCondition",
+        "parameterValues": {
+          "propertyName": "properties.nbOfVisits",
+          "comparisonOperator": "equals",
+          "propertyValueInteger": 1
+        }
+      }
+    ],
+    "operator": "and"
+  },
+  "type": "booleanCondition"
+}
+EOF
+----
+
+The above result will return the profile count of all the profiles
+
+Result will be something like this:
+
+    2084
+
+=== Metrics
+
+Metric queries make it possible to apply functions to the resulting property. The supported metrics are:
+
+- sum
+- avg
+- min
+- max
+
+It is also possible to request more than one metric in a single request by concatenating them with a "/" in the URL.
+Here's an example request that uses the `sum` and `avg` metrics:
+
+[source]
+----
+curl -X POST http://localhost:8181/cxs/query/session/profile.properties.nbOfVisits/sum/avg \
+--user karaf:karaf \
+-H "Content-Type: application/json" \
+-d @- <<'EOF'
+{
+  "parameterValues": {
+    "subConditions": [
+      {
+        "type": "sessionPropertyCondition",
+        "parameterValues": {
+          "comparisonOperator": "equals",
+          "propertyName": "scope",
+          "propertyValue": "digitall"
+        }
+      }
+    ],
+    "operator": "and"
+  },
+  "type": "booleanCondition"
+}
+EOF
+----
+
+The result will look something like this:
+
+[source,json]
+----
+{
+   "_avg":1.0,
+   "_sum":9.0
+}
+----
+
+
+=== Aggregations
+
+Aggregations are a very powerful way to build queries in Apache Unomi that will collect and aggregate data by filtering
+on certain conditions.
+
+Aggregations are composed of :
+- an object type and a property on which to aggregate
+- an aggregation setup (how data will be aggregated, by date, by numeric range, date range or ip range)
+- a condition (used to filter the data set that will be aggregated)
+
+==== Aggregation types
+
+Aggregations may be of different types. They are listed here below.
+
+===== Date
+
+Date aggregations make it possible to automatically generate "buckets" by time periods. For more information about the
+format, it is directly inherited from ElasticSearch and you may find it here: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-aggregations-bucket-datehistogram-aggregation.html
+
+Here's an example of a request to retrieve a histogram of by day of all the session that have been create by newcomers (nbOfVisits=1)
+
+[source]
+----
+curl -X POST http://localhost:8181/cxs/query/session/timeStamp \
+--user karaf:karaf \
+-H "Content-Type: application/json" \
+-d @- <<'EOF'
+{
+  "aggregate": {
+    "type": "date",
+    "parameters": {
+      "interval": "1d",
+      "format": "yyyy-MM-dd"
+    }
+  },
+  "condition": {
+    "type": "booleanCondition",
+    "parameterValues": {
+      "operator": "and",
+      "subConditions": [
+        {
+          "type": "sessionPropertyCondition",
+          "parameterValues": {
+            "propertyName": "scope",
+            "comparisonOperator": "equals",
+            "propertyValue": "acme"
+          }
+        },
+        {
+          "type": "sessionPropertyCondition",
+          "parameterValues": {
+            "propertyName": "profile.properties.nbOfVisits",
+            "comparisonOperator": "equals",
+            "propertyValueInteger": 1
+          }
+        }
+      ]
+    }
+  }
+}
+EOF
+----
+
+The above request will produce a similar that looks like this:
+
+[source,json]
+----
+{
+  "_all": 8062,
+  "_filtered": 4085,
+  "2018-10-02": 3,
+  "2018-10-03": 17,
+  "2018-10-04": 18,
+  "2018-10-05": 19,
+  "2018-10-06": 23,
+  "2018-10-07": 18,
+  "2018-10-08": 20
+}
+----
+
+You can see that we retrieve the count of newcomers aggregated by day.
+
+===== Date range
+
+Date ranges make it possible to "bucket" dates, for example to regroup profiles by their birth date as in the example
+below:
+
+[source,shell script]
+----
+curl -X POST http://localhost:8181/cxs/query/profile/properties.birthDate \
+--user karaf:karaf \
+-H "Content-Type: application/json" \
+-d @- <<'EOF'
+{
+  "aggregate": {
+    "property": "properties.birthDate",
+    "type": "dateRange",
+    "dateRanges": [
+      {
+        "key": "After 2009",
+        "from": "now-10y/y",
+        "to": null
+      },
+      {
+        "key": "Between 1999 and 2009",
+        "from": "now-20y/y",
+        "to": "now-10y/y"
+      },
+      {
+        "key": "Between 1989 and 1999",
+        "from": "now-30y/y",
+        "to": "now-20y/y"
+      },
+      {
+        "key": "Between 1979 and 1989",
+        "from": "now-40y/y",
+        "to": "now-30y/y"
+      },
+      {
+        "key": "Between 1969 and 1979",
+        "from": "now-50y/y",
+        "to": "now-40y/y"
+      },
+      {
+        "key": "Before 1969",
+        "from": null,
+        "to": "now-50y/y"
+      }
+    ]
+  },
+  "condition": {
+    "type": "matchAllCondition",
+    "parameterValues": {}
+  }
+}
+EOF
+----
+
+The resulting JSON response will look something like this:
+
+[source,json]
+----
+{
+    "_all":4095,
+    "_filtered":4095,
+    "Before 1969":2517,
+    "Between 1969 and 1979":353,
+    "Between 1979 and 1989":336,
+    "Between 1989 and 1999":337,
+    "Between 1999 and 2009":35,
+    "After 2009":0,
+    "_missing":517
+}
+----
+
+You can find more information about the date range formats here: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-aggregations-bucket-daterange-aggregation.html
+
+
+===== Numeric range
+
+Numeric ranges make it possible to use "buckets" for the various ranges you want to classify.
+
+Here's an example of a using numeric range to regroup profiles by number of visits:
+
+[source,shell script]
+----
+curl -X POST http://localhost:8181/cxs/query/profile/properties.nbOfVisits \
+--user karaf:karaf \
+-H "Content-Type: application/json" \
+-d @- <<'EOF'
+{
+  "aggregate": {
+    "property": "properties.nbOfVisits",
+    "type": "numericRange",
+    "numericRanges": [
+      {
+        "key": "Less than 5",
+        "from": null,
+        "to": 5
+      },
+      {
+        "key": "Between 5 and 10",
+        "from": 5,
+        "to": 10
+      },
+      {
+        "key": "Between 10 and 20",
+        "from": 10,
+        "to": 20
+      },
+      {
+        "key": "Between 20 and 40",
+        "from": 20,
+        "to": 40
+      },
+      {
+        "key": "Between 40 and 80",
+        "from": 40,
+        "to": 80
+      },
+      {
+        "key": "Greater than 100",
+        "from": 100,
+        "to": null
+      }
+    ]
+  },
+  "condition": {
+    "type": "matchAllCondition",
+    "parameterValues": {}
+  }
+}
+EOF
+----
+
+This will produce an output that looks like this:
+
+[source,json]
+----
+{
+    "_all":4095,
+    "_filtered":4095,
+    "Less than 5":3855,
+    "Between 5 and 10":233,
+    "Between 10 and 20":7,
+    "Between 20 and 40":0,
+    "Between 40 and 80":0,
+    "Greater than 100":0
+}
+----
+