You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by jb...@apache.org on 2019/12/13 14:03:02 UTC

[lucene-solr] branch visual-guide updated: Visual Guide: Improve quantile plot docs

This is an automated email from the ASF dual-hosted git repository.

jbernste pushed a commit to branch visual-guide
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git


The following commit(s) were added to refs/heads/visual-guide by this push:
     new 3a8e24a  Visual Guide: Improve quantile plot docs
3a8e24a is described below

commit 3a8e24a27adba5b54f4d46b780d033797b7cb786
Author: Joel Bernstein <jb...@apache.org>
AuthorDate: Fri Dec 13 09:02:38 2019 -0500

    Visual Guide: Improve quantile plot docs
---
 .../src/images/math-expressions/quantile-plot.png  | Bin 194672 -> 189926 bytes
 solr/solr-ref-guide/src/statistics.adoc            |  45 ++++++++++-----------
 2 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/solr/solr-ref-guide/src/images/math-expressions/quantile-plot.png b/solr/solr-ref-guide/src/images/math-expressions/quantile-plot.png
index 90ffdf1..c02d40d 100644
Binary files a/solr/solr-ref-guide/src/images/math-expressions/quantile-plot.png and b/solr/solr-ref-guide/src/images/math-expressions/quantile-plot.png differ
diff --git a/solr/solr-ref-guide/src/statistics.adoc b/solr/solr-ref-guide/src/statistics.adoc
index 787a618..f77208b 100644
--- a/solr/solr-ref-guide/src/statistics.adoc
+++ b/solr/solr-ref-guide/src/statistics.adoc
@@ -354,35 +354,34 @@ When this expression is sent to the `/stream` handler it responds with:
 
 === Quantile Plots
 
-A quantile plot, or QQ Plot, plots the percentiles from two distributions on the
-the same scatter plot for comparision.
-
-The example below uses the sampling capability
-described in the <<probability-distributions.adoc#probability-distributions,Probability>> section of the user guide. The same
-technique can be used with random samples drawn with the `random` function on empirical data
-stored in Solr Cloud collections. But its very useful to be able to
-use the probability distribution framework to sample from different distributions
-to learn how to read QQ plots.
-
-In the example 50000 samples from two normal distributions are drawn. Both distributions
-have a mean of 500 but have different standard deviations. A sequence is then created
-with 98 integers starting from 1 with a stride 1. This sequence will be used
-to specify the
-percentiles to calculate and also serve as the *x-axis* in the plot.
-Then the percentile function is used to calculate the percentiles for
-both distributions.
+Quantile plots or QQ Plots are powerful tools for visually comparing two or more distributions.
+
+A quantile plot, plots the percentiles from two or more distributions in the same visualization. This allows
+for visual comparison of the distributions at each percentile. A simple example will help illustrate the power
+of quantile plots.
+
+In this example the distribution of daily stock price changes for two stock tickers, *goog* and
+*amzn*, are visualized with a quantile plot.
+
+The example first creates an array of values representing the percentiles that will be calculated and sets this array
+to variable *p*. Then random samples of the *change_d* field are drawn for the tickers *amzn* and *goog*. The *change_d* field
+represents the change in stock price for one day. Then the *change_d* field is vectorized for both samples and placed
+in the variable *amzn* and *goog*. The `percentile` function is then used to calculate the percentiles for both vectors. Notice that
+the variable *p* is used to specify the list of percentiles that are calculated.
 
 Finally `zplot` is used to plot the percentiles sequence on the *x-axis* and the calculated
-percentile values for both distributions on the *y axis*. A scatter plot is used
+percentile values for both distributions on the *y axis*. And a line plot is used
 to visualize the QQ plot.
 
 image::images/math-expressions/quantile-plot.png[]
 
-Notice there are two scatter plots that intersect at 500 which is the mean
-of both distributions. But the red scatter plot, which has the
-higher standard deviation,
-has a steeper slope. The higher standard deviation creates a steeper slope
-because the percentile values are dispersed farther from the mean.
+This quantile plot provides a clear picture of the distributions of daily price changes for *amzn*
+and *googl*. In the plot the *x-axis* is the percentiles and the *y-axis* is the percentile value calculated.
+
+Notice that the *goog* percentile value starts lower and ends higher than the *amzn* plot and that there is a
+steeper slope. This shows the greater variability in the *goog* price change distribution. The plot gives a clear picture
+of the difference
+in the distributions across the full range of percentiles.
 
 
 == Correlation and Covariance