You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by jb...@apache.org on 2019/06/25 16:26:02 UTC

[lucene-solr] branch SOLR-13105-visual updated: SOLR-13105: Start statistics vis

This is an automated email from the ASF dual-hosted git repository.

jbernste pushed a commit to branch SOLR-13105-visual
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git


The following commit(s) were added to refs/heads/SOLR-13105-visual by this push:
     new 4d0e648  SOLR-13105: Start statistics vis
4d0e648 is described below

commit 4d0e648c91325813c2c900c13d782944ffe5e7ee
Author: Joel Bernstein <jb...@apache.org>
AuthorDate: Tue Jun 25 12:25:51 2019 -0400

    SOLR-13105: Start statistics vis
---
 .../src/images/math-expressions/cumProb.png        | Bin 0 -> 146987 bytes
 .../src/images/math-expressions/describe.png       | Bin 0 -> 137702 bytes
 .../src/images/math-expressions/hist.png           | Bin 0 -> 139842 bytes
 .../src/images/math-expressions/histtable.png      | Bin 0 -> 218867 bytes
 solr/solr-ref-guide/src/statistics.adoc            | 157 +++++++--------------
 5 files changed, 50 insertions(+), 107 deletions(-)

diff --git a/solr/solr-ref-guide/src/images/math-expressions/cumProb.png b/solr/solr-ref-guide/src/images/math-expressions/cumProb.png
new file mode 100644
index 0000000..a37dda6
Binary files /dev/null and b/solr/solr-ref-guide/src/images/math-expressions/cumProb.png differ
diff --git a/solr/solr-ref-guide/src/images/math-expressions/describe.png b/solr/solr-ref-guide/src/images/math-expressions/describe.png
new file mode 100644
index 0000000..a3bd482
Binary files /dev/null and b/solr/solr-ref-guide/src/images/math-expressions/describe.png differ
diff --git a/solr/solr-ref-guide/src/images/math-expressions/hist.png b/solr/solr-ref-guide/src/images/math-expressions/hist.png
new file mode 100644
index 0000000..ec222ee
Binary files /dev/null and b/solr/solr-ref-guide/src/images/math-expressions/hist.png differ
diff --git a/solr/solr-ref-guide/src/images/math-expressions/histtable.png b/solr/solr-ref-guide/src/images/math-expressions/histtable.png
new file mode 100644
index 0000000..1595441
Binary files /dev/null and b/solr/solr-ref-guide/src/images/math-expressions/histtable.png differ
diff --git a/solr/solr-ref-guide/src/statistics.adoc b/solr/solr-ref-guide/src/statistics.adoc
index 48b81ed..044bfe5 100644
--- a/solr/solr-ref-guide/src/statistics.adoc
+++ b/solr/solr-ref-guide/src/statistics.adoc
@@ -69,6 +69,11 @@ When this expression is sent to the `/stream` handler it responds with:
 }
 ----
 
+This describe function can be visualized in a table with Zeppelin-Solr:
+
+image::images/math-expressions/describe.png[]
+
+
 == Histograms and Frequency Tables
 
 Histograms and frequency tables are are tools for understanding the distribution
@@ -80,14 +85,14 @@ The `hist` function creates a histogram designed for usage with continuous data.
 === histograms
 
 Below is an example that selects a random sample, creates a vector from the
-result set and uses the `hist` function to return a histogram with 5 bins.
+result set and uses the `hist` function to return a histogram with 15 bins.
 The `hist` function returns a list of tuples with summary statistics for each bin.
 
 [source,text]
 ----
-let(a=random(collection1, q="*:*", rows="15000", fl="price_f"),
-    b=col(a, price_f),
-    c=hist(b, 5))
+let(a=random(testapp, q="*:*", rows="15000", fl="response_d"),
+    b=col(a, response_d),
+    c=hist(b, 15))
 ----
 
 When this expression is sent to the `/stream` handler it responds with:
@@ -98,117 +103,55 @@ When this expression is sent to the `/stream` handler it responds with:
   "result-set": {
     "docs": [
       {
-        "c": [
-          {
-            "prob": 0.2057939717603699,
-            "min": 0.000010371208,
-            "max": 0.19996578,
-            "mean": 0.10010319358402578,
-            "var": 0.003366805016271609,
-            "cumProb": 0.10293732468049072,
-            "sum": 309.0185585938884,
-            "stdev": 0.058024176136086666,
-            "N": 3087
-          },
-          {
-            "prob": 0.19381868629885585,
-            "min": 0.20007741,
-            "max": 0.3999073,
-            "mean": 0.2993590803885827,
-            "var": 0.003401644034068929,
-            "cumProb": 0.3025295802728267,
-            "sum": 870.5362057700005,
-            "stdev": 0.0583236147205309,
-            "N": 2908
-          },
-          {
-            "prob": 0.20565789836690007,
-            "min": 0.39995712,
-            "max": 0.5999038,
-            "mean": 0.4993620963792545,
-            "var": 0.0033158364923609046,
-            "cumProb": 0.5023006239697967,
-            "sum": 1540.5320673300018,
-            "stdev": 0.05758330046429177,
-            "N": 3085
-          },
-          {
-            "prob": 0.19437108496008693,
-            "min": 0.6000449,
-            "max": 0.79973197,
-            "mean": 0.7001752711861512,
-            "var": 0.0033895105082360185,
-            "cumProb": 0.7026537198687285,
-            "sum": 2042.4112660500066,
-            "stdev": 0.058219502816805456,
-            "N": 2917
-          },
-          {
-            "prob": 0.20019582213899467,
-            "min": 0.7999126,
-            "max": 0.99987316,
-            "mean": 0.8985428275824184,
-            "var": 0.003312360017780078,
-            "cumProb": 0.899450457219298,
-            "sum": 2698.3241112299997,
-            "stdev": 0.05755310606544253,
-            "N": 3003
-          }
-        ]
+        "prob": 0.00021598266688541547,
+        "min": 675.8131195271407,
+        "max": 690.4491626920295,
+        "mean": 683.1150404530058,
+        "var": 62.68733114037831,
+        "cumProb": 0.00010781401860771812,
+        "sum": 2732.460161812023,
+        "stdev": 7.917533147412666,
+        "N": 4
       },
       {
-        "EOF": true,
-        "RESPONSE_TIME": 322
-      }
-    ]
-  }
-}
+        "prob": 0.0008119830346328834,
+        "min": 703.2132289932538,
+        "max": 721.1545076964856,
+        "mean": 712.8045685730215,
+        "var": 41.16456697234485,
+        "cumProb": 0.0007397051651814922,
+        "sum": 9266.459391449276,
+        "stdev": 6.415961889876283,
+        "N": 13
+      },
+      {
+        "prob": 0.005621625404424438,
+        "min": 722.3966535859041,
+        "max": 743.8768517321993,
+        "mean": 735.0570449093976,
+        "var": 34.32748804550742,
+        "cumProb": 0.004137705733961876,
+        "sum": 62479.84881729879,
+        "stdev": 5.8589664656411395,
+        "N": 85
+      },
+      ...
 ----
 
-The `col` function can be used to *vectorize* a column of data from the list of tuples
-returned by the `hist` function.
+With Zeppelin-Solr the histogram can be first visualized in a table:
 
-In the example below, the *N* field,
-which is the number of observations in the each bin, is returned as a vector.
+image::images/math-expressions/histtable.png[]
 
-[source,text]
-----
-let(a=random(collection1, q="*:*", rows="15000", fl="price_f"),
-     b=col(a, price_f),
-     c=hist(b, 11),
-     d=col(c, N))
-----
+Then the histogram can be visualized with a bar chart by plotting the *mean* of
+the bins on the *x-axis* and the *prob* (probability) on the *y-axis*:
+
+image::images/math-expressions/hist.png[]
+
+The cumulative probability can be plotted by switching the *y-axis* to the *cumProb* column:
+
+image::images/math-expressions/cumProb.png[]
 
-When this expression is sent to the `/stream` handler it responds with:
 
-[source,json]
-----
-{
-  "result-set": {
-    "docs": [
-      {
-        "d": [
-          1387,
-          1396,
-          1391,
-          1357,
-          1384,
-          1360,
-          1367,
-          1375,
-          1307,
-          1310,
-          1366
-        ]
-      },
-      {
-        "EOF": true,
-        "RESPONSE_TIME": 307
-      }
-    ]
-  }
-}
-----
 
 === Frequency Tables