You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by jb...@apache.org on 2019/06/25 16:42:16 UTC
[lucene-solr] branch SOLR-13105-visual updated: SOLR-13105: Start
statistics vis 1
This is an automated email from the ASF dual-hosted git repository.
jbernste pushed a commit to branch SOLR-13105-visual
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git
The following commit(s) were added to refs/heads/SOLR-13105-visual by this push:
new c9cf9c4 SOLR-13105: Start statistics vis 1
c9cf9c4 is described below
commit c9cf9c43217317014bbe586d163363305a4dcf78
Author: Joel Bernstein <jb...@apache.org>
AuthorDate: Tue Jun 25 12:42:08 2019 -0400
SOLR-13105: Start statistics vis 1
---
.../src/images/math-expressions/cumPct.png | Bin 0 -> 156142 bytes
.../src/images/math-expressions/freqTable.png | Bin 0 -> 181164 bytes
solr/solr-ref-guide/src/statistics.adoc | 105 +++++++++------------
3 files changed, 46 insertions(+), 59 deletions(-)
diff --git a/solr/solr-ref-guide/src/images/math-expressions/cumPct.png b/solr/solr-ref-guide/src/images/math-expressions/cumPct.png
new file mode 100644
index 0000000..173c7a4
Binary files /dev/null and b/solr/solr-ref-guide/src/images/math-expressions/cumPct.png differ
diff --git a/solr/solr-ref-guide/src/images/math-expressions/freqTable.png b/solr/solr-ref-guide/src/images/math-expressions/freqTable.png
new file mode 100644
index 0000000..02eaf4f
Binary files /dev/null and b/solr/solr-ref-guide/src/images/math-expressions/freqTable.png differ
diff --git a/solr/solr-ref-guide/src/statistics.adoc b/solr/solr-ref-guide/src/statistics.adoc
index 044bfe5..95777cd 100644
--- a/solr/solr-ref-guide/src/statistics.adoc
+++ b/solr/solr-ref-guide/src/statistics.adoc
@@ -152,21 +152,19 @@ The cumulative probability can be plotted by switching the *y-axis* to the *cumP
image::images/math-expressions/cumProb.png[]
-
=== Frequency Tables
The `freqTable` function returns a frequency distribution for a discrete data set.
The `freqTable` function doesn't create bins like the histogram. Instead it counts
the occurrence of each discrete data value and returns a list of tuples with the
-frequency statistics for each value. Fields from a frequency table can be vectorized using
-using the `col` function in the same manner as a histogram.
+frequency statistics for each value.
Below is a simple example of a frequency table built from a random sample of
a discrete variable.
[source,text]
----
-let(a=random(collection1, q="*:*", rows="15000", fl="day_i"),
+let(a=random(testapp, q="*:*", rows="15000", fl="day_i"),
b=col(a, day_i),
c=freqTable(b))
----
@@ -175,63 +173,52 @@ When this expression is sent to the `/stream` handler it responds with:
[source,json]
----
- "result-set": {
- "docs": [
- {
- "c": [
- {
- "pct": 0.0318,
- "count": 477,
- "cumFreq": 477,
- "cumPct": 0.0318,
- "value": 0
- },
- {
- "pct": 0.033133333333333334,
- "count": 497,
- "cumFreq": 974,
- "cumPct": 0.06493333333333333,
- "value": 1
- },
- {
- "pct": 0.03426666666666667,
- "count": 514,
- "cumFreq": 1488,
- "cumPct": 0.0992,
- "value": 2
- },
- {
- "pct": 0.0346,
- "count": 519,
- "cumFreq": 2007,
- "cumPct": 0.1338,
- "value": 3
- },
- {
- "pct": 0.03133333333333333,
- "count": 470,
- "cumFreq": 2477,
- "cumPct": 0.16513333333333333,
- "value": 4
- },
- {
- "pct": 0.03333333333333333,
- "count": 500,
- "cumFreq": 2977,
- "cumPct": 0.19846666666666668,
- "value": 5
- }
- ]
- },
- {
- "EOF": true,
- "RESPONSE_TIME": 281
- }
- ]
- }
-}
+ {
+ "result-set": {
+ "docs": [
+ {
+ "pct": 0.0362,
+ "count": 543,
+ "cumFreq": 543,
+ "cumPct": 0.0362,
+ "value": 0
+ },
+ {
+ "pct": 0.03186666666666667,
+ "count": 478,
+ "cumFreq": 1021,
+ "cumPct": 0.06806666666666666,
+ "value": 1
+ },
+ {
+ "pct": 0.0338,
+ "count": 507,
+ "cumFreq": 1528,
+ "cumPct": 0.10186666666666666,
+ "value": 2
+ },
+ {
+ "pct": 0.03546666666666667,
+ "count": 532,
+ "cumFreq": 2060,
+ "cumPct": 0.13733333333333334,
+ "value": 3
+ },
+ ...
----
+With Zeppelin-Solr the frequency table can be first visualized in a table:
+
+image::images/math-expressions/freqTable.png[]
+
+The frequency table can then be plotted by switching to a bar chart and selecting
+the *value* column for the *x-axis*. Any of the other columns can be visualized
+on the the *y-axis*. The example below visualizes the *cumPct* column which is the
+cumulative percent at each value.
+
+image::images/math-expressions/cumPct.png[]
+
+
== Percentiles
The `percentile` function returns the estimated value for a specific percentile in