You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by jb...@apache.org on 2018/08/12 17:02:38 UTC

[05/13] lucene-solr:branch_7x: SOLR-11947: Fix ref guide jenkins errors

SOLR-11947: Fix ref guide jenkins errors


Project: http://git-wip-us.apache.org/repos/asf/lucene-solr/repo
Commit: http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/e1666fc6
Tree: http://git-wip-us.apache.org/repos/asf/lucene-solr/tree/e1666fc6
Diff: http://git-wip-us.apache.org/repos/asf/lucene-solr/diff/e1666fc6

Branch: refs/heads/branch_7x
Commit: e1666fc65a2a40f3dd602a8885a9d0b8ed196323
Parents: a8a36ef
Author: Joel Bernstein <jb...@apache.org>
Authored: Mon Mar 26 18:00:44 2018 -0400
Committer: Joel Bernstein <jb...@apache.org>
Committed: Sun Aug 12 12:38:52 2018 -0400

----------------------------------------------------------------------
 solr/solr-ref-guide/src/machine-learning.adoc   |   8 +-
 solr/solr-ref-guide/src/math-expressions.adoc   |  30 +-
 solr/solr-ref-guide/src/montecarlo.adoc         | 213 ----------
 .../src/parallel-sql-interface.adoc             |   2 +-
 .../src/probability-distributions.adoc          | 415 +++++++++++++++++++
 solr/solr-ref-guide/src/probability.adoc        | 415 -------------------
 solr/solr-ref-guide/src/simulations.adoc        | 213 ++++++++++
 .../src/statistical-programming.adoc            |   2 +-
 solr/solr-ref-guide/src/term-vectors.adoc       |   2 +-
 9 files changed, 650 insertions(+), 650 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/e1666fc6/solr/solr-ref-guide/src/machine-learning.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/machine-learning.adoc b/solr/solr-ref-guide/src/machine-learning.adoc
index cbb3e05..577220a 100644
--- a/solr/solr-ref-guide/src/machine-learning.adoc
+++ b/solr/solr-ref-guide/src/machine-learning.adoc
@@ -124,7 +124,7 @@ This expression returns the following response:
 }
 ----
 
-=== Unitize
+=== Unit Vectors
 
 The `unitize` function scales vectors to a magnitude of 1. A vector with a
 magnitude of 1 is known as a unit vector.  Unit vectors are
@@ -171,7 +171,7 @@ This expression returns the following response:
 }
 ----
 
-== Distance
+== Distance Measures
 
 The `distance` function computes a distance measure for two
 numeric arrays or a *distance matrix* for the columns of a matrix.
@@ -267,7 +267,7 @@ Once the clustering has been completed there are a number of useful functions av
 for examining the *clusters* and *centroids*.
 
 The examples below are clustering *term vectors*.
-The chapter on link:term-vectors.adoc[Text Analysis and Term Vectors] should be
+The chapter on link:term-vectors.adoc#term-vectors[Text Analysis and Term Vectors] should be
 consulted for a full explanation of these features.
 
 === Centroid Features
@@ -603,7 +603,7 @@ This expression returns the following response:
 }
 ----
 
-== K-nearest Neighbor
+== K-nearest Neighbor (knn)
 
 The `knn` function searches the rows of a matrix for the
 K-nearest neighbors of a search vector. The `knn` function

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/e1666fc6/solr/solr-ref-guide/src/math-expressions.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/math-expressions.adoc b/solr/solr-ref-guide/src/math-expressions.adoc
index e2ed438..77d563b 100644
--- a/solr/solr-ref-guide/src/math-expressions.adoc
+++ b/solr/solr-ref-guide/src/math-expressions.adoc
@@ -1,5 +1,5 @@
 = Math Expressions
-:page-children: scalar-math, vector-math, variables, matrix-math, vectorization, term-vectors, statistics, probability, montecarlo, time-series, regression, numerical-analysis, curve-fitting, machine-learning
+:page-children: scalar-math, vector-math, variables, matrix-math, vectorization, term-vectors, statistics, probability-distributions, simulations, time-series, regression, numerical-analysis, curve-fitting, machine-learning
 
 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
@@ -30,30 +30,30 @@ and data structures and techniques for combining Solr's
 powerful streams with mathematical functions to make every
 record in your Solr Cloud cluster computable.
 
-== link:scalar-math.adoc[Scalar Math]
+== link:scalar-math.adoc#scalar-math[Scalar Math]
 
-== link:vector-math.adoc[Vector Math]
+== link:vector-math.adoc#vector-math[Vector Math]
 
-== link:variables.adoc[Variables]
+== link:variables.adoc#variables.adoc[Variables]
 
-== link:matrix-math.adoc[Matrix Math]
+== link:matrix-math.adoc#matrix-math[Matrix Math]
 
-== link:vectorization.adoc[Streams and Vectorization]
+== link:vectorization.adoc#vectorization[Streams and Vectorization]
 
-== link:term-vectors.adoc[Text Analysis and Term Vectors]
+== link:term-vectors.adoc#term-vectors[Text Analysis and Term Vectors]
 
-== link:statistics.adoc[Statistics]
+== link:statistics.adoc#statistics[Statistics]
 
-== link:probability.adoc[Probability]
+== link:probability-distributions.adoc#probability-distributions[Probability]
 
-== link:montecarlo.adoc[Monte Carlo Simulations]
+== link:simulations.adoc#simulations[Monte Carlo Simulations]
 
-== link:time-series.adoc[Time Series]
+== link:time-series.adoc#time-series[Time Series]
 
-== link:regression.adoc[Linear Regression]
+== link:regression.adoc#regression[Linear Regression]
 
-== link:numerical-analysis.adoc[Interpolation, Derivatives and Integrals]
+== link:numerical-analysis.adoc#numerical-analysis[Interpolation, Derivatives and Integrals]
 
-== link:curve-fitting.adoc[Curve Fitting]
+== link:curve-fitting.adoc#curve-fitting[Curve Fitting]
 
-== link:machine-learning.adoc[Machine Learning]
+== link:machine-learning.adoc#machine-learning[Machine Learning]

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/e1666fc6/solr/solr-ref-guide/src/montecarlo.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/montecarlo.adoc b/solr/solr-ref-guide/src/montecarlo.adoc
deleted file mode 100644
index 814110f..0000000
--- a/solr/solr-ref-guide/src/montecarlo.adoc
+++ /dev/null
@@ -1,213 +0,0 @@
-= Monte Carlo Simulations
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-
-Monte Carlo simulations are commonly used to model the behavior of
-stochastic systems. This section of the user guide describes
-how to perform both *uncorrelated* and *correlated* Monte Carlo simulations
-using the *sampling* capabilities of the probability distribution framework.
-
-=== Uncorrelated Simulations
-
-Uncorrelated Monte Carlo simulations model stochastic systems with the assumption
- that the underlying random variables move independently of each other.
- A simple example of a Monte Carlo simulation using two independently changing random variables
- is described below.
-
-In this example a Monte Carlo simulation is used to determine the probability that a simple hinge assembly will
-fall within a required length specification.
-
-The hinge has two components *A* and *B*. The combined length of the two components must be less then 5 centimeters
-to fall within specification.
-
-A random sampling of lengths for component *A* has shown that its length conforms to a
-normal distribution with a mean of 2.2 centimeters and a standard deviation of .0195
-centimeters.
-
-A random sampling of lengths for component *B* has shown that its length conforms
-to a normal distribution with a mean of 2.71 centimeters and a standard deviation of .0198 centimeters.
-
-The Monte Carlo simulation below performs the following steps:
-
-* A normal distribution with a mean of 2.2 and a standard deviation of .0195 is created to model the length of componentA.
-* A normal distribution with a mean of 2.71 and a standard deviation of .0198 is created to model the length of componentB.
-* The `monteCarlo` function is used to simulate component pairs. The `monteCarlo` function
-  calls the *add(sample(componentA), sample(componentB))* function 100000 times and collects the results in an array. Each
-  time the function is called a random sample is drawn from the componentA
-  and componentB length distributions. The `add` function adds the two samples to calculate the combined length.
-  The result of each function run is collected in an array and assigned to the *simresults* variable.
-* An `empiricalDistribution` function is then created from the *simresults* array to model the distribution of the
-  simulation results.
-* Finally, the `cumulativeProbability` function is called on the *simmodel* to determine the cumulative probability
-  that the combined length of the components is 5 or less.
-* Based on the simulation there is .9994371944629039 probability that the combined length of a component pair will
-be 5 or less.
-
-[source,text]
-----
-let(componentA=normalDistribution(2.2,  .0195),
-    componentB=normalDistribution(2.71, .0198),
-    simresults=monteCarlo(add(sample(componentA), sample(componentB)), 100000),
-    simmodel=empiricalDistribution(simresults),
-    prob=cumulativeProbability(simmodel,  5))
-----
-
-When this expression is sent to the /stream handler it responds with:
-
-[source,json]
-----
-{
-  "result-set": {
-    "docs": [
-      {
-        "prob": 0.9994371944629039
-      },
-      {
-        "EOF": true,
-        "RESPONSE_TIME": 660
-      }
-    ]
-  }
-}
-----
-
-=== Correlated Simulations
-
-The simulation above assumes that the lengths of *componentA* and *componentB* vary independently.
-What would happen to the probability model if there was a correlation between the lengths of
-*componentA* and *componentB*.
-
-In the example below a database containing assembled pairs of components is used to determine
-if there is a correlation between the lengths of the components, and how the correlation effects the model.
-
-Before performing a simulation of the effects of correlation on the probability model its
-useful to understand what the correlation is between the lengths of *componentA* and *componentB*.
-
-In the example below 5000 random samples are selected from a collection
-of assembled hinges. Each sample contains
-lengths of the components in the fields *componentA_d* and *componentB_d*.
-
-Both fields are then vectorized. The *componentA_d* vector is stored in
-variable *b* and the *componentB_d* variable is stored in variable *c*.
-
-Then the correlation of the two vectors is calculated using the `corr` function. Note that the outcome
-from `corr` is 0.9996931313216989. This means that *componentA_d* and *componentB_d* are almost
-perfectly correlated.
-
-[source,text]
-----
-let(a=random(collection5, q="*:*", rows="5000", fl="componentA_d, componentB_d"),
-    b=col(a, componentA_d)),
-    c=col(a, componentB_d)),
-    d=corr(b, c))
-----
-
-When this expression is sent to the /stream handler it responds with:
-
-[source,json]
-----
-{
-  "result-set": {
-    "docs": [
-      {
-        "d": 0.9996931313216989
-      },
-      {
-        "EOF": true,
-        "RESPONSE_TIME": 309
-      }
-    ]
-  }
-}
-----
-
-How does correlation effect the probability model?
-
-The example below explores how to use a *multivariate normal distribution* function
-to model how correlation effects the probability of hinge defects.
-
-In this example 5000 random samples are selected from a collection
-containing length data for assembled hinges. Each sample contains
-the fields *componentA_d* and *componentB_d*.
-
-Both fields are then vectorized. The *componentA_d* vector is stored in
-variable *b* and the *componentB_d* variable is stored in variable *c*.
-
-An array is created that contains the *means* of the two vectorized fields.
-
-Then both vectors are added to a matrix which is transposed. This creates
-an *observation* matrix where each row contains one observation of
-*componentA_d* and *componentB_d*. A covariance matrix is then created from the columns of
-the observation matrix with the
-`cov` function. The covariance matrix describes the covariance between
-*componentA_d* and *componentB_d*.
-
-The `multivariateNormalDistribution` function is then called with the
-array of means for the two fields and the covariance matrix. The model
-for the multivariate normal distribution is stored in variable *g*.
-
-The `monteCarlo` function then calls the function *add(sample(g))* 50000 times
-and collections the results in a vector. Each time the function is called a single sample
-is drawn from the multivariate normal distribution. Each sample is a vector containing
-one *componentA* and *componentB* pair. the `add` function adds the values in the vector to
-calculate the length of the pair. Over the long term the samples drawn from the
-multivariate normal distribution will conform to the covariance matrix used to construct it.
-
-Just as in the non-correlated example an empirical distribution is used to model probabilities
-of the simulation vector and the `cumulativeProbability` function is used to compute the cumulative
-probability that the combined component length will be 5 centimeters or less.
-
-Notice that the probability of a hinge meeting specification has dropped to 0.9889517439980468.
-This is because the strong correlation
-between the lengths of components means that their lengths rise together causing more hinges to
-fall out of the 5 centimeter specification.
-
-[source,text]
-----
-let(a=random(hinges, q="*:*", rows="5000", fl="componentA_d, componentB_d"),
-    b=col(a, componentA_d),
-    c=col(a, componentB_d),
-    cor=corr(b,c),
-    d=array(mean(b), mean(c)),
-    e=transpose(matrix(b, c)),
-    f=cov(e),
-    g=multiVariateNormalDistribution(d, f),
-    h=monteCarlo(add(sample(g)), 50000),
-    i=empiricalDistribution(h),
-    j=cumulativeProbability(i, 5))
-----
-
-When this expression is sent to the /stream handler it responds with:
-
-[source,json]
-----
-{
-  "result-set": {
-    "docs": [
-      {
-        "j": 0.9889517439980468
-      },
-      {
-        "EOF": true,
-        "RESPONSE_TIME": 599
-      }
-    ]
-  }
-}
-----
-

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/e1666fc6/solr/solr-ref-guide/src/parallel-sql-interface.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/parallel-sql-interface.adoc b/solr/solr-ref-guide/src/parallel-sql-interface.adoc
index e7b04da..01442ee 100644
--- a/solr/solr-ref-guide/src/parallel-sql-interface.adoc
+++ b/solr/solr-ref-guide/src/parallel-sql-interface.adoc
@@ -280,7 +280,7 @@ The `aggregationMode` parameter is available in the both the JDBC driver and HTT
 SELECT distinct fieldA as fa, fieldB as fb FROM tableA ORDER BY fa desc, fb desc
 ----
 
-=== Statistics
+=== Statistical Functions
 
 The SQL interface supports simple statistics calculated on numeric fields. The supported functions are `count(*)`, `min`, `max`, `sum`, and `avg`.
 

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/e1666fc6/solr/solr-ref-guide/src/probability-distributions.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/probability-distributions.adoc b/solr/solr-ref-guide/src/probability-distributions.adoc
new file mode 100644
index 0000000..5ae2248
--- /dev/null
+++ b/solr/solr-ref-guide/src/probability-distributions.adoc
@@ -0,0 +1,415 @@
+= Probability Distributions
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+This section of the user guide covers the
+*probability distribution
+framework* included in the math expressions library.
+
+== Probability Distribution Framework
+
+The probability distribution framework includes
+many commonly used *real* and *discrete* probability
+distributions, including support for *empirical* and
+*enumerated* distributions that model real world data.
+
+The probability distribution framework also includes a set
+of functions that use the probability distributions
+to support probability calculations and sampling.
+
+=== Real Distributions
+
+The probability distribution framework has the following functions
+which support well known real probability distributions:
+
+* `normalDistribution`: Creates a normal distribution function.
+
+* `logNormalDistribution`: Creates a log normal distribution function.
+
+* `gammaDistribution`: Creates a gamma distribution function.
+
+* `betaDistribution`: Creates a beta distribution function.
+
+* `uniformDistribution`: Creates a uniform real distribution function.
+
+* `weibullDistribution`: Creates a Weibull distribution function.
+
+* `triangularDistribution`: Creates a triangular distribution function.
+
+* `constantDistribution`: Creates constant real distribution function.
+
+=== Empirical Distribution
+
+The `empiricalDistribution` function creates a real probability
+distribution from actual data. An empirical distribution
+can be used interchangeably with any of the theoretical
+real distributions.
+
+=== Discrete
+
+The probability distribution framework has the following functions
+which support well known discrete probability distributions:
+
+* `poissonDistribution`: Creates a Poisson distribution function.
+
+* `binomialDistribution`: Creates a binomial distribution function.
+
+* `uniformIntegerDistribution`: Creates a uniform integer distribution function.
+
+* `geometricDistribution`: Creates a geometric distribution function.
+
+* `zipFDistribution`: Creates a Zipf distribution function.
+
+=== Enumerated Distributions
+
+The `enumeratedDistribution` function creates a discrete
+distribution function from a data set of discrete values,
+or from and enumerated list of values and probabilities.
+
+Enumerated distribution functions can be used interchangeably
+with any of the theoretical discrete distributions.
+
+=== Cumulative Probability
+
+The `cumulativeProbability` function can be used with all
+probability distributions to calculate the
+cumulative probability of encountering a specific
+random variable within a specific distribution.
+
+Below is example of calculating the cumulative probability
+of a random variable within a normal distribution.
+
+In the example a normal distribution function is created
+with a mean of 10 and a standard deviation of 5. Then
+the cumulative probability of the value 12 is calculated for this
+specific distribution.
+
+[source,text]
+----
+let(a=normalDistribution(10, 5),
+    b=cumulativeProbability(a, 12))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+  "result-set": {
+    "docs": [
+      {
+        "b": 0.6554217416103242
+      },
+      {
+        "EOF": true,
+        "RESPONSE_TIME": 0
+      }
+    ]
+  }
+}
+----
+
+Below is an example of a cumulative probability calculation
+using an empirical distribution.
+
+In the example an empirical distribution is created from a random
+sample taken from the *price_f* field.
+
+The cumulative probability of the value .75 is then calculated.
+The *price_f* field in this example was generated using a
+uniform real distribution between 0 and 1, so the output of the
+ `cumulativeProbability` function is very close to .75.
+
+[source,text]
+----
+let(a=random(collection1, q="*:*", rows="30000", fl="price_f"),
+    b=col(a, price_f),
+    c=empiricalDistribution(b),
+    d=cumulativeProbability(c, .75))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+  "result-set": {
+    "docs": [
+      {
+        "b": 0.7554217416103242
+      },
+      {
+        "EOF": true,
+        "RESPONSE_TIME": 0
+      }
+    ]
+  }
+}
+----
+
+=== Discrete Probability
+
+The `probability` function can be used with any discrete
+distribution function to compute the probability of a
+discrete value.
+
+Below is an example which calculates the probability
+of a discrete value within a Poisson distribution.
+
+In the example a Poisson distribution function is created
+with a mean of 100. Then the
+probability of encountering a sample of the discrete value 101 is calculated for this
+specific distribution.
+
+[source,text]
+----
+let(a=poissonDistribution(100),
+    b=probability(a, 101))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+  "result-set": {
+    "docs": [
+      {
+        "b": 0.039466333474403106
+      },
+      {
+        "EOF": true,
+        "RESPONSE_TIME": 0
+      }
+    ]
+  }
+}
+----
+
+Below is an example of a probability calculation
+using an enumerated distribution.
+
+In the example an enumerated distribution is created from a random
+sample taken from the *day_i* field, which was created
+using a uniform integer distribution between 0 and 30.
+
+The probability of the discrete value 10 is then calculated.
+
+[source,text]
+----
+let(a=random(collection1, q="*:*", rows="30000", fl="day_i"),
+    b=col(a, day_i),
+    c=enumeratedDistribution(b),
+    d=probability(c, 10))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+  "result-set": {
+    "docs": [
+      {
+        "d": 0.03356666666666666
+      },
+      {
+        "EOF": true,
+        "RESPONSE_TIME": 488
+      }
+    ]
+  }
+}
+----
+
+=== Sampling
+
+All probability distributions support sampling. The `sample`
+function returns 1 or more random samples from a probability
+distribution.
+
+Below is an example drawing a single sample from
+a normal distribution.
+
+[source,text]
+----
+let(a=normalDistribution(10, 5),
+    b=sample(a))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+  "result-set": {
+    "docs": [
+      {
+        "b": 11.24578055004963
+      },
+      {
+        "EOF": true,
+        "RESPONSE_TIME": 0
+      }
+    ]
+  }
+}
+----
+
+Below is an example drawing 10 samples from a normal
+distribution.
+
+[source,text]
+----
+let(a=normalDistribution(10, 5),
+    b=sample(a, 10))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+  "result-set": {
+    "docs": [
+      {
+        "b": [
+          10.18444709339441,
+          9.466947971749377,
+          1.2420697166234458,
+          11.074501226984806,
+          7.659629052136225,
+          0.4440887839190708,
+          13.710925254778786,
+          2.089566359480239,
+          0.7907293097654424,
+          2.8184587681006734
+        ]
+      },
+      {
+        "EOF": true,
+        "RESPONSE_TIME": 3
+      }
+    ]
+  }
+}
+----
+
+=== Multivariate Normal Distribution
+
+The multivariate normal distribution is a generalization of the
+univariate normal distribution to higher dimensions.
+
+The multivariate normal distribution models two or more random
+variables that are normally distributed. The relationship between
+the variables is defined by a covariance matrix.
+
+==== Sampling
+
+The `sample` function can be used to draw samples
+from a multivariate normal distribution in much the same
+way as a univariate normal distribution.
+The difference is that each sample will be an array containing a sample
+drawn from each of the underlying normal distributions.
+If multiple samples are drawn, the `sample` function returns a matrix with a
+sample in each row. Over the long term the columns of the sample
+matrix will conform to the covariance matrix used to parametrize the
+multivariate normal distribution.
+
+The example below demonstrates how to initialize and draw samples
+from a multivariate normal distribution.
+
+In this example 5000 random samples are selected from a collection
+of log records. Each sample contains
+the fields *filesize_d* and *response_d*. The values of both fields conform
+to a normal distribution.
+
+Both fields are then vectorized. The *filesize_d* vector is stored in
+variable *b* and the *response_d* variable is stored in variable *c*.
+
+An array is created that contains the *means* of the two vectorized fields.
+
+Then both vectors are added to a matrix which is transposed. This creates
+an *observation* matrix where each row contains one observation of
+*filesize_d* and *response_d*. A covariance matrix is then created from the columns of
+the observation matrix with the
+`cov` function. The covariance matrix describes the covariance between
+*filesize_d* and *response_d*.
+
+The `multivariateNormalDistribution` function is then called with the
+array of means for the two fields and the covariance matrix. The model for the
+multivariate normal distribution is assigned to variable *g*.
+
+Finally five samples are drawn from the multivariate normal distribution. The samples
+are returned as a matrix, with each row representing one sample. There are two
+columns in the matrix. The first column contains samples for *filesize_d* and the second
+column contains samples for *response_d*. Over the long term the covariance between
+the columns will conform to the covariance matrix used to instantiate the
+multivariate normal distribution.
+
+[source,text]
+----
+let(a=random(collection2, q="*:*", rows="5000", fl="filesize_d, response_d"),
+    b=col(a, filesize_d),
+    c=col(a, response_d),
+    d=array(mean(b), mean(c)),
+    e=transpose(matrix(b, c)),
+    f=cov(e),
+    g=multiVariateNormalDistribution(d, f),
+    h=sample(g, 5))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+  "result-set": {
+    "docs": [
+      {
+        "h": [
+          [
+            41974.85669321393,
+            779.4097049705296
+          ],
+          [
+            42869.19876441414,
+            834.2599296790783
+          ],
+          [
+            38556.30444839889,
+            720.3683470060988
+          ],
+          [
+            37689.31290928216,
+            686.5549428100018
+          ],
+          [
+            40564.74398214547,
+            769.9328090774
+          ]
+        ]
+      },
+      {
+        "EOF": true,
+        "RESPONSE_TIME": 162
+      }
+    ]
+  }
+}
+----
+

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/e1666fc6/solr/solr-ref-guide/src/probability.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/probability.adoc b/solr/solr-ref-guide/src/probability.adoc
deleted file mode 100644
index 9c46d08..0000000
--- a/solr/solr-ref-guide/src/probability.adoc
+++ /dev/null
@@ -1,415 +0,0 @@
-= Probability Distributions
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-This section of the user guide covers the
-*probability distribution
-framework* included in the math expressions library.
-
-== Probability Distributions
-
-The probability distribution framework includes
-many commonly used *real* and *discrete* probability
-distributions, including support for *empirical* and
-*enumerated* distributions that model real world data.
-
-The probability distribution framework also includes a set
-of functions that use the probability distributions
-to support probability calculations and sampling.
-
-=== Real Distributions
-
-The probability distribution framework has the following functions
-which support well known real probability distributions:
-
-* `normalDistribution`: Creates a normal distribution function.
-
-* `logNormalDistribution`: Creates a log normal distribution function.
-
-* `gammaDistribution`: Creates a gamma distribution function.
-
-* `betaDistribution`: Creates a beta distribution function.
-
-* `uniformDistribution`: Creates a uniform real distribution function.
-
-* `weibullDistribution`: Creates a Weibull distribution function.
-
-* `triangularDistribution`: Creates a triangular distribution function.
-
-* `constantDistribution`: Creates constant real distribution function.
-
-=== Empirical Distribution
-
-The `empiricalDistribution` function creates a real probability
-distribution from actual data. An empirical distribution
-can be used interchangeably with any of the theoretical
-real distributions.
-
-=== Discrete
-
-The probability distribution framework has the following functions
-which support well known discrete probability distributions:
-
-* `poissonDistribution`: Creates a Poisson distribution function.
-
-* `binomialDistribution`: Creates a binomial distribution function.
-
-* `uniformIntegerDistribution`: Creates a uniform integer distribution function.
-
-* `geometricDistribution`: Creates a geometric distribution function.
-
-* `zipFDistribution`: Creates a Zipf distribution function.
-
-=== Enumerated Distributions
-
-The `enumeratedDistribution` function creates a discrete
-distribution function from a data set of discrete values,
-or from and enumerated list of values and probabilities.
-
-Enumerated distribution functions can be used interchangeably
-with any of the theoretical discrete distributions.
-
-=== Cumulative Probability
-
-The `cumulativeProbability` function can be used with all
-probability distributions to calculate the
-cumulative probability of encountering a specific
-random variable within a specific distribution.
-
-Below is example of calculating the cumulative probability
-of a random variable within a normal distribution.
-
-In the example a normal distribution function is created
-with a mean of 10 and a standard deviation of 5. Then
-the cumulative probability of the value 12 is calculated for this
-specific distribution.
-
-[source,text]
-----
-let(a=normalDistribution(10, 5),
-    b=cumulativeProbability(a, 12))
-----
-
-When this expression is sent to the /stream handler it responds with:
-
-[source,json]
-----
-{
-  "result-set": {
-    "docs": [
-      {
-        "b": 0.6554217416103242
-      },
-      {
-        "EOF": true,
-        "RESPONSE_TIME": 0
-      }
-    ]
-  }
-}
-----
-
-Below is an example of a cumulative probability calculation
-using an empirical distribution.
-
-In the example an empirical distribution is created from a random
-sample taken from the *price_f* field.
-
-The cumulative probability of the value .75 is then calculated.
-The *price_f* field in this example was generated using a
-uniform real distribution between 0 and 1, so the output of the
- `cumulativeProbability` function is very close to .75.
-
-[source,text]
-----
-let(a=random(collection1, q="*:*", rows="30000", fl="price_f"),
-    b=col(a, price_f),
-    c=empiricalDistribution(b),
-    d=cumulativeProbability(c, .75))
-----
-
-When this expression is sent to the /stream handler it responds with:
-
-[source,json]
-----
-{
-  "result-set": {
-    "docs": [
-      {
-        "b": 0.7554217416103242
-      },
-      {
-        "EOF": true,
-        "RESPONSE_TIME": 0
-      }
-    ]
-  }
-}
-----
-
-=== Probability
-
-The `probability` function can be used with any discrete
-distribution function to compute the probability of a
-discrete value.
-
-Below is an example which calculates the probability
-of a discrete value within a Poisson distribution.
-
-In the example a Poisson distribution function is created
-with a mean of 100. Then the
-probability of encountering a sample of the discrete value 101 is calculated for this
-specific distribution.
-
-[source,text]
-----
-let(a=poissonDistribution(100),
-    b=probability(a, 101))
-----
-
-When this expression is sent to the /stream handler it responds with:
-
-[source,json]
-----
-{
-  "result-set": {
-    "docs": [
-      {
-        "b": 0.039466333474403106
-      },
-      {
-        "EOF": true,
-        "RESPONSE_TIME": 0
-      }
-    ]
-  }
-}
-----
-
-Below is an example of a probability calculation
-using an enumerated distribution.
-
-In the example an enumerated distribution is created from a random
-sample taken from the *day_i* field, which was created
-using a uniform integer distribution between 0 and 30.
-
-The probability of the discrete value 10 is then calculated.
-
-[source,text]
-----
-let(a=random(collection1, q="*:*", rows="30000", fl="day_i"),
-    b=col(a, day_i),
-    c=enumeratedDistribution(b),
-    d=probability(c, 10))
-----
-
-When this expression is sent to the /stream handler it responds with:
-
-[source,json]
-----
-{
-  "result-set": {
-    "docs": [
-      {
-        "d": 0.03356666666666666
-      },
-      {
-        "EOF": true,
-        "RESPONSE_TIME": 488
-      }
-    ]
-  }
-}
-----
-
-=== Sampling
-
-All probability distributions support sampling. The `sample`
-function returns 1 or more random samples from a probability
-distribution.
-
-Below is an example drawing a single sample from
-a normal distribution.
-
-[source,text]
-----
-let(a=normalDistribution(10, 5),
-    b=sample(a))
-----
-
-When this expression is sent to the /stream handler it responds with:
-
-[source,json]
-----
-{
-  "result-set": {
-    "docs": [
-      {
-        "b": 11.24578055004963
-      },
-      {
-        "EOF": true,
-        "RESPONSE_TIME": 0
-      }
-    ]
-  }
-}
-----
-
-Below is an example drawing 10 samples from a normal
-distribution.
-
-[source,text]
-----
-let(a=normalDistribution(10, 5),
-    b=sample(a, 10))
-----
-
-When this expression is sent to the /stream handler it responds with:
-
-[source,json]
-----
-{
-  "result-set": {
-    "docs": [
-      {
-        "b": [
-          10.18444709339441,
-          9.466947971749377,
-          1.2420697166234458,
-          11.074501226984806,
-          7.659629052136225,
-          0.4440887839190708,
-          13.710925254778786,
-          2.089566359480239,
-          0.7907293097654424,
-          2.8184587681006734
-        ]
-      },
-      {
-        "EOF": true,
-        "RESPONSE_TIME": 3
-      }
-    ]
-  }
-}
-----
-
-=== Multivariate Normal Distribution
-
-The multivariate normal distribution is a generalization of the
-univariate normal distribution to higher dimensions.
-
-The multivariate normal distribution models two or more random
-variables that are normally distributed. The relationship between
-the variables is defined by a covariance matrix.
-
-==== Sampling
-
-The `sample` function can be used to draw samples
-from a multivariate normal distribution in much the same
-way as a univariate normal distribution.
-The difference is that each sample will be an array containing a sample
-drawn from each of the underlying normal distributions.
-If multiple samples are drawn, the `sample` function returns a matrix with a
-sample in each row. Over the long term the columns of the sample
-matrix will conform to the covariance matrix used to parametrize the
-multivariate normal distribution.
-
-The example below demonstrates how to initialize and draw samples
-from a multivariate normal distribution.
-
-In this example 5000 random samples are selected from a collection
-of log records. Each sample contains
-the fields *filesize_d* and *response_d*. The values of both fields conform
-to a normal distribution.
-
-Both fields are then vectorized. The *filesize_d* vector is stored in
-variable *b* and the *response_d* variable is stored in variable *c*.
-
-An array is created that contains the *means* of the two vectorized fields.
-
-Then both vectors are added to a matrix which is transposed. This creates
-an *observation* matrix where each row contains one observation of
-*filesize_d* and *response_d*. A covariance matrix is then created from the columns of
-the observation matrix with the
-`cov` function. The covariance matrix describes the covariance between
-*filesize_d* and *response_d*.
-
-The `multivariateNormalDistribution` function is then called with the
-array of means for the two fields and the covariance matrix. The model for the
-multivariate normal distribution is assigned to variable *g*.
-
-Finally five samples are drawn from the multivariate normal distribution. The samples
-are returned as a matrix, with each row representing one sample. There are two
-columns in the matrix. The first column contains samples for *filesize_d* and the second
-column contains samples for *response_d*. Over the long term the covariance between
-the columns will conform to the covariance matrix used to instantiate the
-multivariate normal distribution.
-
-[source,text]
-----
-let(a=random(collection2, q="*:*", rows="5000", fl="filesize_d, response_d"),
-    b=col(a, filesize_d),
-    c=col(a, response_d),
-    d=array(mean(b), mean(c)),
-    e=transpose(matrix(b, c)),
-    f=cov(e),
-    g=multiVariateNormalDistribution(d, f),
-    h=sample(g, 5))
-----
-
-When this expression is sent to the /stream handler it responds with:
-
-[source,json]
-----
-{
-  "result-set": {
-    "docs": [
-      {
-        "h": [
-          [
-            41974.85669321393,
-            779.4097049705296
-          ],
-          [
-            42869.19876441414,
-            834.2599296790783
-          ],
-          [
-            38556.30444839889,
-            720.3683470060988
-          ],
-          [
-            37689.31290928216,
-            686.5549428100018
-          ],
-          [
-            40564.74398214547,
-            769.9328090774
-          ]
-        ]
-      },
-      {
-        "EOF": true,
-        "RESPONSE_TIME": 162
-      }
-    ]
-  }
-}
-----
-

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/e1666fc6/solr/solr-ref-guide/src/simulations.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/simulations.adoc b/solr/solr-ref-guide/src/simulations.adoc
new file mode 100644
index 0000000..814110f
--- /dev/null
+++ b/solr/solr-ref-guide/src/simulations.adoc
@@ -0,0 +1,213 @@
+= Monte Carlo Simulations
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+
+Monte Carlo simulations are commonly used to model the behavior of
+stochastic systems. This section of the user guide describes
+how to perform both *uncorrelated* and *correlated* Monte Carlo simulations
+using the *sampling* capabilities of the probability distribution framework.
+
+=== Uncorrelated Simulations
+
+Uncorrelated Monte Carlo simulations model stochastic systems with the assumption
+ that the underlying random variables move independently of each other.
+ A simple example of a Monte Carlo simulation using two independently changing random variables
+ is described below.
+
+In this example a Monte Carlo simulation is used to determine the probability that a simple hinge assembly will
+fall within a required length specification.
+
+The hinge has two components *A* and *B*. The combined length of the two components must be less then 5 centimeters
+to fall within specification.
+
+A random sampling of lengths for component *A* has shown that its length conforms to a
+normal distribution with a mean of 2.2 centimeters and a standard deviation of .0195
+centimeters.
+
+A random sampling of lengths for component *B* has shown that its length conforms
+to a normal distribution with a mean of 2.71 centimeters and a standard deviation of .0198 centimeters.
+
+The Monte Carlo simulation below performs the following steps:
+
+* A normal distribution with a mean of 2.2 and a standard deviation of .0195 is created to model the length of componentA.
+* A normal distribution with a mean of 2.71 and a standard deviation of .0198 is created to model the length of componentB.
+* The `monteCarlo` function is used to simulate component pairs. The `monteCarlo` function
+  calls the *add(sample(componentA), sample(componentB))* function 100000 times and collects the results in an array. Each
+  time the function is called a random sample is drawn from the componentA
+  and componentB length distributions. The `add` function adds the two samples to calculate the combined length.
+  The result of each function run is collected in an array and assigned to the *simresults* variable.
+* An `empiricalDistribution` function is then created from the *simresults* array to model the distribution of the
+  simulation results.
+* Finally, the `cumulativeProbability` function is called on the *simmodel* to determine the cumulative probability
+  that the combined length of the components is 5 or less.
+* Based on the simulation there is .9994371944629039 probability that the combined length of a component pair will
+be 5 or less.
+
+[source,text]
+----
+let(componentA=normalDistribution(2.2,  .0195),
+    componentB=normalDistribution(2.71, .0198),
+    simresults=monteCarlo(add(sample(componentA), sample(componentB)), 100000),
+    simmodel=empiricalDistribution(simresults),
+    prob=cumulativeProbability(simmodel,  5))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+  "result-set": {
+    "docs": [
+      {
+        "prob": 0.9994371944629039
+      },
+      {
+        "EOF": true,
+        "RESPONSE_TIME": 660
+      }
+    ]
+  }
+}
+----
+
+=== Correlated Simulations
+
+The simulation above assumes that the lengths of *componentA* and *componentB* vary independently.
+What would happen to the probability model if there was a correlation between the lengths of
+*componentA* and *componentB*.
+
+In the example below a database containing assembled pairs of components is used to determine
+if there is a correlation between the lengths of the components, and how the correlation effects the model.
+
+Before performing a simulation of the effects of correlation on the probability model its
+useful to understand what the correlation is between the lengths of *componentA* and *componentB*.
+
+In the example below 5000 random samples are selected from a collection
+of assembled hinges. Each sample contains
+lengths of the components in the fields *componentA_d* and *componentB_d*.
+
+Both fields are then vectorized. The *componentA_d* vector is stored in
+variable *b* and the *componentB_d* variable is stored in variable *c*.
+
+Then the correlation of the two vectors is calculated using the `corr` function. Note that the outcome
+from `corr` is 0.9996931313216989. This means that *componentA_d* and *componentB_d* are almost
+perfectly correlated.
+
+[source,text]
+----
+let(a=random(collection5, q="*:*", rows="5000", fl="componentA_d, componentB_d"),
+    b=col(a, componentA_d)),
+    c=col(a, componentB_d)),
+    d=corr(b, c))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+  "result-set": {
+    "docs": [
+      {
+        "d": 0.9996931313216989
+      },
+      {
+        "EOF": true,
+        "RESPONSE_TIME": 309
+      }
+    ]
+  }
+}
+----
+
+How does correlation effect the probability model?
+
+The example below explores how to use a *multivariate normal distribution* function
+to model how correlation effects the probability of hinge defects.
+
+In this example 5000 random samples are selected from a collection
+containing length data for assembled hinges. Each sample contains
+the fields *componentA_d* and *componentB_d*.
+
+Both fields are then vectorized. The *componentA_d* vector is stored in
+variable *b* and the *componentB_d* variable is stored in variable *c*.
+
+An array is created that contains the *means* of the two vectorized fields.
+
+Then both vectors are added to a matrix which is transposed. This creates
+an *observation* matrix where each row contains one observation of
+*componentA_d* and *componentB_d*. A covariance matrix is then created from the columns of
+the observation matrix with the
+`cov` function. The covariance matrix describes the covariance between
+*componentA_d* and *componentB_d*.
+
+The `multivariateNormalDistribution` function is then called with the
+array of means for the two fields and the covariance matrix. The model
+for the multivariate normal distribution is stored in variable *g*.
+
+The `monteCarlo` function then calls the function *add(sample(g))* 50000 times
+and collections the results in a vector. Each time the function is called a single sample
+is drawn from the multivariate normal distribution. Each sample is a vector containing
+one *componentA* and *componentB* pair. the `add` function adds the values in the vector to
+calculate the length of the pair. Over the long term the samples drawn from the
+multivariate normal distribution will conform to the covariance matrix used to construct it.
+
+Just as in the non-correlated example an empirical distribution is used to model probabilities
+of the simulation vector and the `cumulativeProbability` function is used to compute the cumulative
+probability that the combined component length will be 5 centimeters or less.
+
+Notice that the probability of a hinge meeting specification has dropped to 0.9889517439980468.
+This is because the strong correlation
+between the lengths of components means that their lengths rise together causing more hinges to
+fall out of the 5 centimeter specification.
+
+[source,text]
+----
+let(a=random(hinges, q="*:*", rows="5000", fl="componentA_d, componentB_d"),
+    b=col(a, componentA_d),
+    c=col(a, componentB_d),
+    cor=corr(b,c),
+    d=array(mean(b), mean(c)),
+    e=transpose(matrix(b, c)),
+    f=cov(e),
+    g=multiVariateNormalDistribution(d, f),
+    h=monteCarlo(add(sample(g)), 50000),
+    i=empiricalDistribution(h),
+    j=cumulativeProbability(i, 5))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+  "result-set": {
+    "docs": [
+      {
+        "j": 0.9889517439980468
+      },
+      {
+        "EOF": true,
+        "RESPONSE_TIME": 599
+      }
+    ]
+  }
+}
+----
+

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/e1666fc6/solr/solr-ref-guide/src/statistical-programming.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/statistical-programming.adoc b/solr/solr-ref-guide/src/statistical-programming.adoc
index 08693b2..25715c9 100644
--- a/solr/solr-ref-guide/src/statistical-programming.adoc
+++ b/solr/solr-ref-guide/src/statistical-programming.adoc
@@ -161,7 +161,7 @@ Returns the following response:
 
 Several types of data can be manipulated with the statistical programming syntax. The following sections explore <<Arrays,arrays>>, <<Tuples,tuples>>, and <<Lists,lists>>.
 
-=== Arrays
+=== Creating Arrays
 
 The first data structure we'll explore is the array.
 

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/e1666fc6/solr/solr-ref-guide/src/term-vectors.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/term-vectors.adoc b/solr/solr-ref-guide/src/term-vectors.adoc
index cbd21a0..32c4bfa 100644
--- a/solr/solr-ref-guide/src/term-vectors.adoc
+++ b/solr/solr-ref-guide/src/term-vectors.adoc
@@ -109,7 +109,7 @@ responds with:
 }
 ----
 
-== Term Vectors
+== TF-IDF Term Vectors
 
 The `termVectors` function can be used to build *TF-IDF*
 term vectors from the terms generated by the `analyze` function.