You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by dw...@apache.org on 2021/03/10 09:59:35 UTC
[lucene] 22/50: Solr Ref Guide: update 7.1 statistical function docs
This is an automated email from the ASF dual-hosted git repository.
dweiss pushed a commit to branch branch_7_1
in repository https://gitbox.apache.org/repos/asf/lucene.git
commit 59d402cd11822171fe50c0ce20040b34a163edf1
Author: Joel Bernstein <jb...@apache.org>
AuthorDate: Tue Oct 17 21:25:13 2017 -0400
Solr Ref Guide: update 7.1 statistical function docs
---
.../src/statistical-programming.adoc | 77 +++++++++++++++++++++-
.../src/stream-evaluator-reference.adoc | 57 +++++++++++++---
2 files changed, 122 insertions(+), 12 deletions(-)
diff --git a/solr/solr-ref-guide/src/statistical-programming.adoc b/solr/solr-ref-guide/src/statistical-programming.adoc
index 0d461be..07fba23 100644
--- a/solr/solr-ref-guide/src/statistical-programming.adoc
+++ b/solr/solr-ref-guide/src/statistical-programming.adoc
@@ -455,10 +455,81 @@ Returns the following response:
== Setting Variables with let
-The `let` function sets variables and runs a streaming expression that references the variables. The `let` function can be used to
-write small statistical programs.
+The `let` function sets variables and returns the last variable. The output of any statistical function can be set to a variable.
-A variable can be set to the output of any streaming expression. Here is a very simple example:
+Below is a simple example setting three variables `a`, `b` and `correlation`.
+
+[source,text]
+----
+let(a=array(1,2,3),
+ b=array(10, 20, 30),
+ correlation=corr(a, b))
+----
+
+Here is the output:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "correlation": 1
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+All variables can be output by setting the `echo` variable to `true`.
+
+[source,text]
+----
+let(echo=true,
+ a=array(1,2,3),
+ b=array(10, 20, 30),
+ correlation=corr(a, b))
+----
+
+Here is the output:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "a": [
+ 1,
+ 2,
+ 3
+ ],
+ "b": [
+ 10,
+ 20,
+ 30
+ ],
+ "correlation": 1
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+Streaming expressions can also be used inside of a `let` expression in the following ways:
+
+* A variable can be set to the output of any streaming expression.
+* A streaming expression can be executed after all variables have been set. The variables can then be referenced by the streaming expression that is executed. The `let` expression will stream the tuples that are emitted by the final streaming expression.
+
+Here is a very simple example:
[source,text]
----
diff --git a/solr/solr-ref-guide/src/stream-evaluator-reference.adoc b/solr/solr-ref-guide/src/stream-evaluator-reference.adoc
index 392144c..691ac2f 100644
--- a/solr/solr-ref-guide/src/stream-evaluator-reference.adoc
+++ b/solr/solr-ref-guide/src/stream-evaluator-reference.adoc
@@ -659,8 +659,14 @@ ebeSubtract(numericArray, numericArray)
== empiricalDistribution
+<<<<<<< 3196557fc49995bb3d083f25e13e09b3477a765c
The `empiricalDistribution` function returns https://en.wikipedia.org/wiki/Empirical_distribution_function[empirical distribution function], a continuous probability distribution function based
on an actual data set. This function is part of the probability distribution framework and is designed to work with the `<<sample>>`, `<<kolmogorovSmirnov>>` and `<<cumulativeProbability>>` functions.
+=======
+The `empiricalDistribution` function returns a continuous probability distribution function based
+on an actual data set (https://en.wikipedia.org/wiki/Empirical_distribution_function). This function is part of the probability distribution framework and is
+designed to work with the `sample`, `kolmogorovSmirnov` and `cumulativeProbability` functions.
+>>>>>>> Solr Ref Guide: update 7.1 statistical function docs
This function is designed to work with continuous data. To build a distribution from
a discrete data set use the `<<enumeratedDistribution>>`.
@@ -1049,7 +1055,11 @@ The supported distribution functions are: `<<empiricalDistribution>>`, `<<normal
=== kolmogorovSmirnov Returns
+<<<<<<< 3196557fc49995bb3d083f25e13e09b3477a765c
A result tuple: A tuple containing the p-value and d-statistic for the test result.
+=======
+result tuple : A tuple containing the p-value and d-statistic for the test result.
+>>>>>>> Solr Ref Guide: update 7.1 statistical function docs
=== kolmogorovSmirnov Syntax
@@ -1158,8 +1168,13 @@ if(gt(fieldA,fieldB),mod(fieldA,fieldB),mod(fieldB,fieldA)) // if fieldA > field
== monteCarlo
+<<<<<<< 3196557fc49995bb3d083f25e13e09b3477a765c
The `monteCarlo` function performs a https://en.wikipedia.org/wiki/Monte_Carlo_method[Monte Carlo simulation]
based on its parameters. The `monteCarlo` function runs another function a specified number of times and returns the results.
+=======
+The `monteCarlo` function performs a Monte Carlo simulation (https://en.wikipedia.org/wiki/Monte_Carlo_method)
+based on its parameters. The monteCarlo function runs another function a specified number of times and returns the results.
+>>>>>>> Solr Ref Guide: update 7.1 statistical function docs
The function being run typically has one or more variables that are drawn from probability
distributions on each run. The `<<sample>>` function is used in the function to draw the samples.
@@ -1325,9 +1340,15 @@ or(fieldA,fieldB,fieldC,and(fieldD,fieldE),fieldF)
== poissonDistribution
+<<<<<<< 3196557fc49995bb3d083f25e13e09b3477a765c
The `poissonDistribution` function returns a https://en.wikipedia.org/wiki/Poisson_distribution[poisson probability distribution]
based on its parameter. This function is part of the probability distribution framework and is designed to
work with the `<<sample>>`, `<<probability>>` and `<<cumulativeProbability>>` functions.
+=======
+The `poissonDistribution` function returns a poisson probability distribution (https://en.wikipedia.org/wiki/Poisson_distribution)
+based on its parameter. This function is part of the probability distribution framework and is designed to
+work with the `sample`, `probability` and `cumulativeProbability` functions.
+>>>>>>> Solr Ref Guide: update 7.1 statistical function docs
=== poissonDistribution Parameters
@@ -1348,9 +1369,15 @@ The `polyFit` function performs https://en.wikipedia.org/wiki/Curve_fitting#Fitt
=== polyFit Parameters
+<<<<<<< 3196557fc49995bb3d083f25e13e09b3477a765c
* `numeric array`: (Optional) x values. If omitted a sequence will be created for the x values.
* `numeric array`: y values
* `integer`: (Optional) polynomial degree. Defaults to 3.
+=======
+* `numeric array` : (Optional) x values. If omitted a sequence will be created for the x values.
+* `numeric array` : y values
+* `integer` : (Optional) polynomial degree. Defaults to 3.
+>>>>>>> Solr Ref Guide: update 7.1 statistical function docs
=== polyFit Returns
@@ -1359,7 +1386,8 @@ A numeric array: curve that was fit to the data points.
=== polyFit Syntax
[source,text]
-polyFit(yValues) // This creates the xValues automatically and fits a curve through the data points using a the default 3 degree polynomial.
+polyFit(yValues) // This creates the xValues automatically and fits a curve through the data points using the default 3 degree polynomial.
+polyFit(yValues, 5) // This creates the xValues automatically and fits a curve through the data points using a 5 degree polynomial.
polyFit(xValues, yValues, 5) // This will fit a curve through the data points using a 5 degree polynomial.
== polyfitDerivative
@@ -1368,9 +1396,15 @@ The `polyfitDerivative` function returns the derivative of the curve created by
=== polyfitDerivative Parameters
+<<<<<<< 3196557fc49995bb3d083f25e13e09b3477a765c
* `numeric array`: (Optional) x values. If omitted a sequence will be created for the x values.
* `numeric array`: y values
* `integer`: (Optional) polynomial degree. Defaults to 3.
+=======
+* `numeric array` : (Optional) x values. If omitted a sequence will be created for the x values.
+* `numeric array` : y values
+* `integer` : (Optional) polynomial degree. Defaults to 3.
+>>>>>>> Solr Ref Guide: update 7.1 statistical function docs
=== polyfitDerivative Returns
@@ -1380,6 +1414,7 @@ A numeric array: The curve for the derivative created by the polynomial curve fi
[source,text]
polyfitDerivative(yValues) // This creates the xValues automatically and returns the polyfit derivative
+polyfitDerivative(yValues, 5) // This creates the xValues automatically and fits a curve through the data points using a 5 degree polynomial and returns the polyfit derivative.
polyfitDerivative(xValues, yValues, 5) // This will fit a curve through the data points using a 5 degree polynomial and returns the polyfit derivative.
== pow
@@ -1439,13 +1474,17 @@ primes(100, 2000) // returns 100 primes starting from 2000
== probability
-The `probability` function returns the probability of encountering a random variable within a discrete
-probability distribution.
+The `probability` function returns the probability of a random variable within a discrete probability distribution.
=== probability Parameters
+<<<<<<< 3196557fc49995bb3d083f25e13e09b3477a765c
* `discrete probability distribution`: poissonDistribution | binomialDistribution | uniformDistribution | enumeratedDistribution
* `integer`: Value of the random variable to compute the probability for.
+=======
+* `discrete probability distribution` : poissonDistribution | binomialDistribution | uniformDistribution | enumeratedDistribution
+* `integer` : Value of the random variable to compute the probability for.
+>>>>>>> Solr Ref Guide: update 7.1 statistical function docs
=== probability Returns
@@ -1454,7 +1493,7 @@ A double: the probability.
=== probability Syntax
[source,text]
-probability(poissonDistribution(10), 7) // Returns the probability of encountering a random sample if 7 in a poisson distribution with a mean of 10.
+probability(poissonDistribution(10), 7) // Returns the probability of a random sample of 7 in a poisson distribution with a mean of 10.
== rank
@@ -1493,7 +1532,7 @@ eq(raw(fieldA), fieldA) // true if the value of fieldA equals the string "fieldA
== regress
-The `regress` function performs a simple regression on two numeric arrays.
+The `regress` function performs a simple regression of two numeric arrays.
The result of this expression is also used by the `<<predict>>` and `<<residuals>>` functions.
@@ -1512,8 +1551,8 @@ regress(numericArray1, numericArray2)
The `residuals` function takes three parameters: a simple regression model, an array of predictor values
and an array of actual values. The residuals function applies the simple regression model to the
-array of predictor values and computes a predictions array. The actual values array is then
-subtracted from the predictions array to compute the residuals array.
+array of predictor values and computes a predictions array. The predicted values array is then
+subtracted from the actual value array to compute the residuals array.
=== residuals Parameters
@@ -1576,8 +1615,8 @@ Either a single numeric random sample, or a numeric array depending on the sampl
=== sample Syntax
[source,text]
-sample(normalDistribution(50, 5)) // Return a single random sample from a normalDistribution with mean of 50 and standard deviation of 5.
-sample(poissonDistribution(5), 1000) // Return 1000 random samples from poissonDistribution with a mean of 5.
+sample(poissonDistribution(5)) // Returns a single random sample from a poissonDistribution with mean of 5.
+sample(poissonDistribution(5), 1000) // Returns 1000 random samples from poissonDistribution with a mean of 5.
== scale