You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by jb...@apache.org on 2018/03/26 19:15:00 UTC
[2/3] lucene-solr:master: SOLR-11947: Squashed commit of the
following ref guide changes:
SOLR-11947: Squashed commit of the following ref guide changes:
commit 61053f2fe373bff0b451f549e063550f08ecdac1
Author: Joel Bernstein <jb...@apache.org>
Date: Mon Mar 26 12:44:12 2018 -0400
SOLR-11947: Fix orphaned files
commit 42302073bf61fde134caeff71b6db3978e113b4d
Author: Joel Bernstein <jb...@apache.org>
Date: Mon Mar 26 12:27:26 2018 -0400
SOLR-11947: small change
commit b16b1453c2e7d5083f588b4b874c918d521e9fe5
Author: Joel Bernstein <jb...@apache.org>
Date: Mon Mar 26 12:23:17 2018 -0400
SOLR-11947: proofing
commit 57265ce4659a427c179e206b79d8fe05b01a5f93
Author: Joel Bernstein <jb...@apache.org>
Date: Sat Mar 24 14:41:48 2018 -0400
SOLR-11947: monte carlo WIP
commit 04e8381f6b5b329c5fa17c1f31c2d848fe9cec2a
Author: Joel Bernstein <jb...@apache.org>
Date: Fri Mar 23 16:24:10 2018 -0400
SOLR-11947: probabiity WIP
commit 4298a6d514e7e431e322a4f62c22c336430a89f1
Author: Joel Bernstein <jb...@apache.org>
Date: Fri Mar 23 13:07:05 2018 -0400
SOLR-11947: time series WIP
commit 1a7654f9225948cd4adb3056bc2192cc0d24b3ee
Author: Joel Bernstein <jb...@apache.org>
Date: Fri Mar 23 11:32:53 2018 -0400
SOLR-11947: machine learning WIP
commit fae0c3aa46e6f26fecb59077207982b2f584ec86
Author: Joel Bernstein <jb...@apache.org>
Date: Thu Mar 22 22:14:15 2018 -0400
SOLR-11947: machine learning WIP
commit fb6a96b2bdc4bbc4c2b5b62b6e69cd561ef9e31b
Author: Joel Bernstein <jb...@apache.org>
Date: Thu Mar 22 14:36:08 2018 -0400
SOLR-11947: numerical analysis WIP
commit a648ba939c90caf5db2a5b88023bd580d4d1e8af
Author: Joel Bernstein <jb...@apache.org>
Date: Thu Mar 22 12:27:33 2018 -0400
SOLR-11947: numerical analysis WIP
commit ce8f1b710d414d8e3ff3c8676f64fc3017316a15
Author: Joel Bernstein <jb...@apache.org>
Date: Wed Mar 21 19:56:10 2018 -0400
SOLR-11947: numerical analysis WIP
commit 5e25a4884341cdd84988e13250f255eb23d7fd50
Author: Joel Bernstein <jb...@apache.org>
Date: Tue Mar 20 22:01:59 2018 -0400
SOLR-11947: Curve fitting WIP
commit f381414dc44ecfa781988c5ca75bfb1c80de6674
Author: Joel Bernstein <jb...@apache.org>
Date: Tue Mar 20 21:49:39 2018 -0400
SOLR-11947: Curve fitting WIP
commit 4be725132215ed44cc84587bb0d11be216360b74
Author: Joel Bernstein <jb...@apache.org>
Date: Mon Mar 19 19:55:10 2018 -0400
SOLR-11947: Monte Carlo WIP
commit d330b412e46be0ebf8d75e99295e3fe9f978c02c
Author: Joel Bernstein <jb...@apache.org>
Date: Sun Mar 18 22:00:55 2018 -0400
SOLR-11947: Probability WIP
commit e3d6160c1fa650e054b9694c57d34b3950c80175
Author: Joel Bernstein <jb...@apache.org>
Date: Sat Mar 17 21:18:43 2018 -0400
SOLR-11947: More WIP
commit 8484b0283f79825dee8eaee82604120d04511de4
Author: Joel Bernstein <jb...@apache.org>
Date: Fri Mar 16 15:03:06 2018 -0400
SOLR-11947: machine learning WIP
commit 77ecfdc71d79ca8eded0355669310c6025c70d96
Author: Joel Bernstein <jb...@apache.org>
Date: Thu Mar 15 21:33:09 2018 -0400
SOLR-11947: machine learning WIP
commit 7488caf5e54436a0e5fe85c0dda4ea31d8357600
Author: Joel Bernstein <jb...@apache.org>
Date: Thu Mar 15 19:08:50 2018 -0400
SOLR-11947: machine learning WIP
commit 102ee2e1857e7d7f45d7f3195a0a4e91eacb766d
Author: Joel Bernstein <jb...@apache.org>
Date: Thu Mar 15 15:18:31 2018 -0400
SOLR-11947: machine learning WIP
commit 0d5cd2b4a4fd012fe6d640a86733280702cf8673
Author: Joel Bernstein <jb...@apache.org>
Date: Wed Mar 14 21:49:15 2018 -0400
SOLR-11947: numerical analysis WIP
commit 31eec30576479a9023c7b0e6ccb2d9f685e128a1
Author: Joel Bernstein <jb...@apache.org>
Date: Wed Mar 14 14:41:06 2018 -0400
SOLR-11947: numerical analysis WIP
commit c6e324ac56ca6e9f229d6acb39fdcf60c3356230
Author: Joel Bernstein <jb...@apache.org>
Date: Tue Mar 13 15:16:26 2018 -0400
SOLR-11947: term vectors WIP
commit 8c843999eabdb82665641caa9c21f07e95b70a86
Author: Joel Bernstein <jb...@apache.org>
Date: Mon Mar 12 18:03:53 2018 -0400
SOLR-11947: Add curve fitting to TOC
commit 09be026f6ad400d965fd373403d7a2eb2fae0c90
Author: Joel Bernstein <jb...@apache.org>
Date: Mon Mar 12 15:36:05 2018 -0400
SOLR-11947: Text analysis WIP
commit e48b4d69abadb603a90c052aa1e36dd60ae7fd33
Author: Joel Bernstein <jb...@apache.org>
Date: Sun Mar 11 18:29:20 2018 -0400
SOLR-11947: TOC changes
commit f71ebc079713e16492ba45cedafc3b9512f6bae2
Author: Joel Bernstein <jb...@apache.org>
Date: Sat Mar 10 17:54:04 2018 -0500
SOLR-11947: WIP term vectors
commit ebc6b3943a27454adaf1a2309b6720bb2ba63c8c
Author: Joel Bernstein <jb...@apache.org>
Date: Sat Mar 10 13:34:19 2018 -0500
SOLR-11947: WIP regression
commit 44752b2d34f46bc7f5693839e42ab3cef9edc47c
Author: Joel Bernstein <jb...@apache.org>
Date: Fri Mar 9 22:40:40 2018 -0500
SOLR-11947: WIP for vectorization.adoc
commit 43254fcb05386264a6d591b1fa2c2573dcc2d2a3
Author: Joel Bernstein <jb...@apache.org>
Date: Fri Mar 9 19:42:26 2018 -0500
SOLR-11947: Test local links
commit b60df2000978f70720eb0a36543752fd3bf07d2c
Author: Joel Bernstein <jb...@apache.org>
Date: Thu Mar 8 21:41:17 2018 -0500
SOLR-11947: Update math-expressions TOC
commit de068c3af8557d60de37cb29f3ed7da3f5442772
Author: Joel Bernstein <jb...@apache.org>
Date: Thu Mar 8 21:24:46 2018 -0500
SOLR-11947: Continued work on math expressions documentation.
commit fe445f2c997ea825d1ae9b9912406521249befc0
Author: Joel Bernstein <jb...@apache.org>
Date: Sun Mar 4 20:22:33 2018 -0500
SOLR-12054: ebeAdd and ebeSubtract should support matrix operations
commit 1f3ae745cc26453a34a64a4327ceac7cc91d23f5
Author: Joel Bernstein <jb...@apache.org>
Date: Sun Mar 4 13:24:54 2018 -0500
SOLR-11947: Initial commit for new math expression docs WIP
Project: http://git-wip-us.apache.org/repos/asf/lucene-solr/repo
Commit: http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/1ed4e226
Tree: http://git-wip-us.apache.org/repos/asf/lucene-solr/tree/1ed4e226
Diff: http://git-wip-us.apache.org/repos/asf/lucene-solr/diff/1ed4e226
Branch: refs/heads/master
Commit: 1ed4e226ac66078a775c869a375c8c816220edec
Parents: dc2ad70
Author: Joel Bernstein <jb...@apache.org>
Authored: Mon Mar 26 12:48:33 2018 -0400
Committer: Joel Bernstein <jb...@apache.org>
Committed: Mon Mar 26 15:05:06 2018 -0400
----------------------------------------------------------------------
solr/solr-ref-guide/src/curve-fitting.adoc | 182 +++++
solr/solr-ref-guide/src/machine-learning.adoc | 680 +++++++++++++++++++
solr/solr-ref-guide/src/math-expressions.adoc | 59 ++
solr/solr-ref-guide/src/matrix-math.adoc | 443 ++++++++++++
solr/solr-ref-guide/src/montecarlo.adoc | 213 ++++++
solr/solr-ref-guide/src/numerical-analysis.adoc | 430 ++++++++++++
solr/solr-ref-guide/src/probability.adoc | 415 +++++++++++
solr/solr-ref-guide/src/regression.adoc | 439 ++++++++++++
solr/solr-ref-guide/src/scalar-math.adoc | 137 ++++
solr/solr-ref-guide/src/statistics.adoc | 575 ++++++++++++++++
.../src/streaming-expressions.adoc | 2 +-
solr/solr-ref-guide/src/term-vectors.adoc | 237 +++++++
solr/solr-ref-guide/src/time-series.adoc | 431 ++++++++++++
solr/solr-ref-guide/src/variables.adoc | 147 ++++
solr/solr-ref-guide/src/vector-math.adoc | 343 ++++++++++
solr/solr-ref-guide/src/vectorization.adoc | 243 +++++++
.../solrj/io/eval/FieldValueEvaluator.java | 12 +-
17 files changed, 4982 insertions(+), 6 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/1ed4e226/solr/solr-ref-guide/src/curve-fitting.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/curve-fitting.adoc b/solr/solr-ref-guide/src/curve-fitting.adoc
new file mode 100644
index 0000000..057cc23
--- /dev/null
+++ b/solr/solr-ref-guide/src/curve-fitting.adoc
@@ -0,0 +1,182 @@
+= Curve Fitting
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+
+== Polynomial Curve Fitting
+
+
+The `polyfit` function is a general purpose curve fitter used to model
+the *non-linear* relationship between two random variables.
+
+The `polyfit` function is passed *x* and *y* axises and fits a smooth curve to the data.
+If only a single array is provided it is treated as the *y* axis and a sequence is generated
+for the *x* axis.
+
+The `polyfit` function also has a parameter the specifies the degree of the polynomial. The higher
+the degree the more curves that can be modeled.
+
+The example below uses the `polyfit` function to fit a curve to an array using
+a 3 degree polynomial. The fitted curve is then subtracted from the original curve. The output
+shows the error between the fitted curve and the original curve, known as the residuals.
+The output also includes the sum-of-squares of the residuals which provides a measure
+of how large the error is..
+
+[source,text]
+----
+let(echo="residuals, sumSqError",
+ y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 6, 5, 5, 3, 2, 1, 0),
+ curve=polyfit(y, 3),
+ residuals=ebeSubtract(y, curve),
+ sumSqError=sumSq(residuals))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "residuals": [
+ 0.5886274509803899,
+ -0.0746078431372561,
+ -0.49492135315664765,
+ -0.6689571213100631,
+ -0.5933591898297781,
+ 0.4352283990519288,
+ 0.32016160310277897,
+ 1.1647963800904968,
+ 0.272488687782805,
+ -0.3534055160525744,
+ 0.2904697263520779,
+ -0.7925296272355089,
+ -0.5990476190476182,
+ -0.12572829131652274,
+ 0.6307843137254909
+ ],
+ "sumSqError": 4.7294282482223595
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+In the next example the curve is fit using a 5 degree polynomial. Notice that the curve
+is fit closer, shown by the smaller residuals and lower value for the sum-of-squares of the
+residuals. This is because the higher polynomial produced a closer fit.
+
+[source,text]
+----
+let(echo="residuals, sumSqError",
+ y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 6, 5, 5, 3, 2, 1, 0),
+ curve=polyfit(y, 5),
+ residuals=ebeSubtract(y, curve),
+ sumSqError=sumSq(residuals))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "residuals": [
+ -0.12337461300309674,
+ 0.22708978328173413,
+ 0.12266015718028167,
+ -0.16502738747320755,
+ -0.41142804563857105,
+ 0.2603044014808713,
+ -0.12128970101106162,
+ 0.6234168308471704,
+ -0.1754692675745293,
+ -0.5379689969473249,
+ 0.4651616185671843,
+ -0.288175756132409,
+ 0.027970945463215102,
+ 0.18699690402476687,
+ -0.09086687306501587
+ ],
+ "sumSqError": 1.413089480179252
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+
+== Prediction, Derivatives and Integrals
+
+The `polyfit` function returns an array which contains the *y* value data points
+of the fitted curve.
+
+In order to predict values along the curve an interpolation function must be created
+for the curve. Once an interpolation functin has been created the `predict`,
+`derivative` and `integral` functions can be applied to the curve.
+
+In the example below the x axis is included for clarity.
+The `polyfit` function returns an array with the fitted curve.
+A linear inpolation function is then created for the curve with the `lerp` function.
+The `predict` function is then used to predict a value along the curve, in this
+case the prediction is made for the *x* value of .5.
+
+[source,text]
+----
+let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14),
+ y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 6, 5, 5, 3, 2, 1, 0),
+ curve=polyfit(x, y, 5),
+ interp=lerp(x, curve),
+ p=predict(interp, .5))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "p": 0.4481424148606813
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+
+
+
http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/1ed4e226/solr/solr-ref-guide/src/machine-learning.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/machine-learning.adoc b/solr/solr-ref-guide/src/machine-learning.adoc
new file mode 100644
index 0000000..cbb3e05
--- /dev/null
+++ b/solr/solr-ref-guide/src/machine-learning.adoc
@@ -0,0 +1,680 @@
+= Machine Learning
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+
+This section of the math expressions user guide covers machine learning
+functions.
+
+== Feature Scaling
+
+Before performing machine learning operations its often necessary to
+scale the feature vectors so they can be compared at the same scale.
+
+All the scaling function operate on vectors and matrices.
+When operating on a matrix the *rows* of the matrix are scaled.
+
+=== Min/Max Scaling
+
+The `minMaxScale` function scales a vector or matrix between a min and
+max value. By default it will scale between 0 and 1 if min/max values
+are not provided.
+
+Below is a simple example of min/max scaling between 0 and 1.
+Notice that once brought into the same scale the vectors are the same.
+
+[source,text]
+----
+let(a=array(20, 30, 40, 50),
+ b=array(200, 300, 400, 500),
+ c=matrix(a, b),
+ d=minMaxScale(c))
+----
+
+This expression returns the following response:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "d": [
+ [
+ 0,
+ 0.3333333333333333,
+ 0.6666666666666666,
+ 1
+ ],
+ [
+ 0,
+ 0.3333333333333333,
+ 0.6666666666666666,
+ 1
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+=== Standardization
+
+The `standardize` function scales a vector so that it has a
+mean of 0 and a standard deviation of 1. Standardization can be
+used with machine learning algorithms, such as SVM, that
+perform better when the data has a normal distribution.
+
+[source,text]
+----
+let(a=array(20, 30, 40, 50),
+ b=array(200, 300, 400, 500),
+ c=matrix(a, b),
+ d=standardize(c))
+----
+
+This expression returns the following response:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "d": [
+ [
+ -1.161895003862225,
+ -0.3872983346207417,
+ 0.3872983346207417,
+ 1.161895003862225
+ ],
+ [
+ -1.1618950038622249,
+ -0.38729833462074165,
+ 0.38729833462074165,
+ 1.1618950038622249
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 17
+ }
+ ]
+ }
+}
+----
+
+=== Unitize
+
+The `unitize` function scales vectors to a magnitude of 1. A vector with a
+magnitude of 1 is known as a unit vector. Unit vectors are
+preferred when the vector math deals
+with vector direction rather than magnitude.
+
+[source,text]
+----
+let(a=array(20, 30, 40, 50),
+ b=array(200, 300, 400, 500),
+ c=matrix(a, b),
+ d=unitize(c))
+----
+
+This expression returns the following response:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "d": [
+ [
+ 0.2721655269759087,
+ 0.40824829046386296,
+ 0.5443310539518174,
+ 0.6804138174397716
+ ],
+ [
+ 0.2721655269759087,
+ 0.4082482904638631,
+ 0.5443310539518174,
+ 0.6804138174397717
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 6
+ }
+ ]
+ }
+}
+----
+
+== Distance
+
+The `distance` function computes a distance measure for two
+numeric arrays or a *distance matrix* for the columns of a matrix.
+
+There are four distance measures currently supported:
+
+* euclidean (default)
+* manhattan
+* canberra
+* earthMovers
+
+Below is an example for computing euclidean distance for
+two numeric arrays:
+
+
+[source,text]
+----
+let(a=array(20, 30, 40, 50),
+ b=array(21, 29, 41, 49),
+ c=distance(a, b))
+----
+
+This expression returns the following response:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "c": 2
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+Below is an example for computing a distance matrix for columns
+of a matrix:
+
+[source,text]
+----
+let(a=array(20, 30, 40),
+ b=array(21, 29, 41),
+ c=array(31, 40, 50),
+ d=matrix(a, b, c),
+ c=distance(d))
+----
+
+This expression returns the following response:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "e": [
+ [
+ 0,
+ 15.652475842498529,
+ 34.07345007480164
+ ],
+ [
+ 15.652475842498529,
+ 0,
+ 18.547236990991408
+ ],
+ [
+ 34.07345007480164,
+ 18.547236990991408,
+ 0
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 24
+ }
+ ]
+ }
+}
+----
+
+== K-means Clustering
+
+The `kmeans` functions performs k-means clustering of the rows of a matrix.
+Once the clustering has been completed there are a number of useful functions available
+for examining the *clusters* and *centroids*.
+
+The examples below are clustering *term vectors*.
+The chapter on link:term-vectors.adoc[Text Analysis and Term Vectors] should be
+consulted for a full explanation of these features.
+
+=== Centroid Features
+
+In the example below the `kmeans` function is used to cluster a result set from the Enron email data-set
+and then the top features are extracted from the cluster centroids.
+
+Let's look at what data is assigned to each variable:
+
+* *a*: The `random` function returns a sample of 500 documents from the *enron*
+collection that match the query *body:oil*. The `select` function selects the *id* and
+and annotates each tuple with the analyzed bigram terms from the body field.
+
+* *b*: The `termVectors` function creates a TF-IDF term vector matrix from the
+tuples stored in variable *a*. Each row in the matrix represents a document. The columns of the matrix
+are the bigram terms that were attached to each tuple.
+* *c*: The `kmeans` function clusters the rows of the matrix into 5 clusters. The k-means clustering is performed using the
+*Euclidean distance* measure.
+* *d*: The `getCentroids` function returns a matrix of cluster centroids. Each row in the matrix is a centroid
+from one of the 5 clusters. The columns of the matrix are the same bigrams terms of the term vector matrix.
+* *e*: The `topFeatures` function returns the column labels for the top 5 features of each centroid in the matrix.
+This returns the top 5 bigram terms for each centroid.
+
+[source,text]
+----
+let(a=select(random(enron, q="body:oil", rows="500", fl="id, body"),
+ id,
+ analyze(body, body_bigram) as terms),
+ b=termVectors(a, maxDocFreq=.10, minDocFreq=.05, minTermLength=14, exclude="_,copyright"),
+ c=kmeans(b, 5),
+ d=getCentroids(c),
+ e=topFeatures(d, 5))
+----
+
+This expression returns the following response:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "e": [
+ [
+ "enron enronxgate",
+ "north american",
+ "energy services",
+ "conference call",
+ "power generation"
+ ],
+ [
+ "financial times",
+ "chief financial",
+ "financial officer",
+ "exchange commission",
+ "houston chronicle"
+ ],
+ [
+ "southern california",
+ "california edison",
+ "public utilities",
+ "utilities commission",
+ "rate increases"
+ ],
+ [
+ "rolling blackouts",
+ "public utilities",
+ "electricity prices",
+ "federal energy",
+ "price controls"
+ ],
+ [
+ "california edison",
+ "regulatory commission",
+ "southern california",
+ "federal energy",
+ "power generators"
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 982
+ }
+ ]
+ }
+}
+----
+
+=== Cluster Features
+
+The example below examines the top features of a specific cluster. This example uses the same techniques
+as the centroids example but the top features are extracted from a cluster rather then the centroids.
+
+The `getCluster` function returns a cluster by its index. Each cluster is a matrix containing term vectors
+that have been clustered together based on their features.
+
+In the example below the `topFeatures` function is used to extract the top 4 features from each term vector
+in the cluster.
+
+[source,text]
+----
+let(a=select(random(collection3, q="body:oil", rows="500", fl="id, body"),
+ id,
+ analyze(body, body_bigram) as terms),
+ b=termVectors(a, maxDocFreq=.09, minDocFreq=.03, minTermLength=14, exclude="_,copyright"),
+ c=kmeans(b, 25),
+ d=getCluster(c, 0),
+ e=topFeatures(d, 4))
+----
+
+This expression returns the following response:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "e": [
+ [
+ "electricity board",
+ "maharashtra state",
+ "power purchase",
+ "state electricity",
+ "reserved enron"
+ ],
+ [
+ "electricity board",
+ "maharashtra state",
+ "state electricity",
+ "purchase agreement",
+ "independent power"
+ ],
+ [
+ "maharashtra state",
+ "reserved enron",
+ "federal government",
+ "state government",
+ "dabhol project"
+ ],
+ [
+ "purchase agreement",
+ "power purchase",
+ "electricity board",
+ "maharashtra state",
+ "state government"
+ ],
+ [
+ "investment grade",
+ "portland general",
+ "general electric",
+ "holding company",
+ "transmission lines"
+ ],
+ [
+ "state government",
+ "state electricity",
+ "purchase agreement",
+ "electricity board",
+ "maharashtra state"
+ ],
+ [
+ "electricity board",
+ "state electricity",
+ "energy management",
+ "maharashtra state",
+ "energy markets"
+ ],
+ [
+ "electricity board",
+ "maharashtra state",
+ "state electricity",
+ "state government",
+ "second quarter"
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 978
+ }
+ ]
+ }
+}
+----
+
+== Multi K-means Clustering
+
+K-means clustering will be produce different results depending on
+the initial placement of the centroids. K-means is fast enough
+that multiple trials can be performed and the best outcome selected.
+The `multiKmeans` function runs the K-means
+clustering algorithm for a gven number of trials and selects the
+best result based on which trial produces the lowest intra-cluster
+variance.
+
+The example below is identical to centroids example except that
+it uses `multiKmeans` with 100 trials, rather then a single
+trial of the `kmeans` function.
+
+[source,text]
+----
+let(a=select(random(collection3, q="body:oil", rows="500", fl="id, body"),
+ id,
+ analyze(body, body_bigram) as terms),
+ b=termVectors(a, maxDocFreq=.09, minDocFreq=.03, minTermLength=14, exclude="_,copyright"),
+ c=multiKmeans(b, 5, 100),
+ d=getCentroids(c),
+ e=topFeatures(d, 5))
+----
+
+This expression returns the following response:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "e": [
+ [
+ "enron enronxgate",
+ "energy trading",
+ "energy markets",
+ "energy services",
+ "unleaded gasoline"
+ ],
+ [
+ "maharashtra state",
+ "electricity board",
+ "state electricity",
+ "energy trading",
+ "chief financial"
+ ],
+ [
+ "price controls",
+ "electricity prices",
+ "francisco chronicle",
+ "wholesale electricity",
+ "power generators"
+ ],
+ [
+ "southern california",
+ "california edison",
+ "public utilities",
+ "francisco chronicle",
+ "utilities commission"
+ ],
+ [
+ "california edison",
+ "power purchases",
+ "system operator",
+ "term contracts",
+ "independent system"
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 1182
+ }
+ ]
+ }
+}
+----
+
+== Fuzzy K-means Clustering
+
+The `fuzzyKmeans` function is a soft clustering algorithm which
+allows vectors to be assigned to more then one cluster. The *fuzziness* parameter
+is a value between 1 and 2 that determines how fuzzy to make the cluster assignment.
+
+After the clustering has been performed the `getMembershipMatrix` function can be called
+on the clustering result to return a matrix describing which clusters each vector belongs to.
+There is a row in the matrix for each vector that was clustered. There is a column in the matrix
+for each cluster. The values in the columns are the probability that the vector belonged to the specific
+cluster.
+
+A simple example will make this more clear. In the example below 300 documents are analyzed and
+then turned into a term vector matrix. Then the `fuzzyKmeans` function clusters the
+term vectors into 12 clusters with a fuzziness factor of 1.25.
+
+The `getMembershipMatrix` function is used to return the membership matrix and the first row
+of membership matrix is retrieved with the `rowAt` function. The `precision` function is then applied to the first row
+of the matrix to make it easier to read.
+
+The output shows a single vector representing the cluster membership probabilities for the first
+term vector. Notice that the term vector has the highest association with the 12th cluster,
+but also has significant associations with the 3rd, 5th, 6th and 7th clusters.
+
+[source,text]
+----
+et(a=select(random(collection3, q="body:oil", rows="300", fl="id, body"),
+ id,
+ analyze(body, body_bigram) as terms),
+ b=termVectors(a, maxDocFreq=.09, minDocFreq=.03, minTermLength=14, exclude="_,copyright"),
+ c=fuzzyKmeans(b, 12, fuzziness=1.25),
+ d=getMembershipMatrix(c),
+ e=rowAt(d, 0),
+ f=precision(e, 5))
+----
+
+This expression returns the following response:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "f": [
+ 0,
+ 0,
+ 0.178,
+ 0,
+ 0.17707,
+ 0.17775,
+ 0.16214,
+ 0,
+ 0,
+ 0,
+ 0,
+ 0.30504
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 2157
+ }
+ ]
+ }
+}
+----
+
+== K-nearest Neighbor
+
+The `knn` function searches the rows of a matrix for the
+K-nearest neighbors of a search vector. The `knn` function
+returns a *matrix* of the K-nearest neighbors. The `knn` function
+has a *named parameter* called *distance* which specifies the distance measure.
+There are four distance measures currently supported:
+
+* euclidean (Default)
+* manhattan
+* canberra
+* earthMovers
+
+The example below builds on the clustering examples to demonstrate
+the `knn` function.
+
+In the example, the centroids matrix is set to variable *d*. The first
+centroid vector is selected from the matrix with the `rowAt` function.
+Then the `knn` function is used to find the 3 nearest neighbors
+to the centroid vector in the term vector matrix (variable b).
+
+The `knn` function returns a matrix with the 3 nearest neighbors based on the
+default distance measure which is euclidean. Finally, the top 4 features
+of the term vectors in the nearest neighbor matrix are returned.
+
+[source,text]
+----
+let(a=select(random(collection3, q="body:oil", rows="500", fl="id, body"),
+ id,
+ analyze(body, body_bigram) as terms),
+ b=termVectors(a, maxDocFreq=.09, minDocFreq=.03, minTermLength=14, exclude="_,copyright"),
+ c=multiKmeans(b, 5, 100),
+ d=getCentroids(c),
+ e=rowAt(d, 0),
+ g=knn(b, e, 3),
+ h=topFeatures(g, 4))
+----
+
+This expression returns the following response:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "h": [
+ [
+ "california power",
+ "electricity supply",
+ "concerned about",
+ "companies like"
+ ],
+ [
+ "maharashtra state",
+ "california power",
+ "electricity board",
+ "alternative energy"
+ ],
+ [
+ "electricity board",
+ "maharashtra state",
+ "state electricity",
+ "houston chronicle"
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 1243
+ }
+ ]
+ }
+}
+----
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/1ed4e226/solr/solr-ref-guide/src/math-expressions.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/math-expressions.adoc b/solr/solr-ref-guide/src/math-expressions.adoc
new file mode 100644
index 0000000..e2ed438
--- /dev/null
+++ b/solr/solr-ref-guide/src/math-expressions.adoc
@@ -0,0 +1,59 @@
+= Math Expressions
+:page-children: scalar-math, vector-math, variables, matrix-math, vectorization, term-vectors, statistics, probability, montecarlo, time-series, regression, numerical-analysis, curve-fitting, machine-learning
+
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+The Streaming Expression library includes a powerful
+mathematical programing syntax with many of the features of a
+functional programming language. The syntax includes variables,
+data structures and a growing set of mathematical functions.
+
+This user guide provides an overview of the different areas of
+mathematical coverage starting with basic scalar math and
+ending with machine learning. Along the way the guide covers variables
+and data structures and techniques for combining Solr's
+powerful streams with mathematical functions to make every
+record in your Solr Cloud cluster computable.
+
+== link:scalar-math.adoc[Scalar Math]
+
+== link:vector-math.adoc[Vector Math]
+
+== link:variables.adoc[Variables]
+
+== link:matrix-math.adoc[Matrix Math]
+
+== link:vectorization.adoc[Streams and Vectorization]
+
+== link:term-vectors.adoc[Text Analysis and Term Vectors]
+
+== link:statistics.adoc[Statistics]
+
+== link:probability.adoc[Probability]
+
+== link:montecarlo.adoc[Monte Carlo Simulations]
+
+== link:time-series.adoc[Time Series]
+
+== link:regression.adoc[Linear Regression]
+
+== link:numerical-analysis.adoc[Interpolation, Derivatives and Integrals]
+
+== link:curve-fitting.adoc[Curve Fitting]
+
+== link:machine-learning.adoc[Machine Learning]
http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/1ed4e226/solr/solr-ref-guide/src/matrix-math.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/matrix-math.adoc b/solr/solr-ref-guide/src/matrix-math.adoc
new file mode 100644
index 0000000..ba45cca
--- /dev/null
+++ b/solr/solr-ref-guide/src/matrix-math.adoc
@@ -0,0 +1,443 @@
+= Matrices and Matrix Math
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+This section of the user guide covers the
+basics of matrix creation, manipulation and matrix math. Other sections
+of the user guide demonstrate how matrices are used by the statistics,
+probability and machine learning functions.
+
+== Matrix Creation
+
+A matrix can be created with the `matrix` function.
+The matrix function is passed a list of `arrays` with
+each array representing a *row* in the matrix.
+
+The example below creates a two-by-two matrix.
+
+[source,text]
+----
+matrix(array(1, 2),
+ array(4, 5))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "return-value": [
+ [
+ 1,
+ 2
+ ],
+ [
+ 4,
+ 5
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+----
+
+== Accessing Rows and Columns
+
+The rows and columns of a matrix can be accessed using the `rowAt`
+and `colAt` functions.
+
+The example below creates a 2 by 2 matrix and returns the second column of the matrix.
+Notice that the matrix is passed variables in this example rather than
+directly passed a list of arrays.
+
+[source,text]
+----
+let(a=array(1, 2),
+ b=array(4, 5),
+ c=matrix(a, b),
+ d=colAt(c, 1))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "d": [
+ 2,
+ 5
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+== Row and Column Labels
+
+A matrix can have column and rows and labels. The functions
+`setRowLabels`, `setColumnLabels`, `getRowLabels` and `getColumnLabels`
+can be used to set and get the labels. The label values
+are set using string arrays.
+
+The example below sets the row and column labels. In other sections of the
+user guide examples are shown where functions return matrices
+with the labels already set.
+
+Below is a simple example of setting and
+getting row and column labels
+on a matrix.
+
+[source,text]
+----
+let(echo="d, e",
+ a=matrix(array(1, 2),
+ array(4, 5)),
+ b=setRowLabels(a, array("row0", "row1")),
+ c=setColumnLabels(b, array("col0", "col1")),
+ d=getRowLabels(c),
+ e=getColumnLabels(c))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "d": [
+ "row0",
+ "row1"
+ ],
+ "e": [
+ "col0",
+ "col1"
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+== Matrix Attributes
+
+A matrix can also have an arbitrary set of named attributes associated
+with it. Certain functions, such as the `termVectors` function,
+return matrices that contain attributes that describe data in the matrix.
+
+Attributes can be retrieved by name using the `getAttribute` function and
+the entire attribute map can be returned using the `getAttributes`
+function.
+
+== Matrix Dimensions
+
+The dimensions of a matrix can be determined using the
+`rowCount` and `columnCount` functions.
+
+The example below retrieves the dimensions of a matrix.
+
+[source,text]
+----
+let(echo="b,c",
+ a=matrix(array(1, 2, 3),
+ array(4, 5, 6)),
+ b=rowCount(a),
+ c=columnCount(a))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "b": 2,
+ "c": 3
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+== Matrix Transposition
+
+A matrix can be https://en.wikipedia.org/wiki/Transpose[transposed]
+using the `transpose` function.
+
+An example of matrix transposition is shown below:
+
+[source,text]
+----
+let(a=matrix(array(1, 2),
+ array(4, 5)),
+ b=transpose(a))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "b": [
+ [
+ 1,
+ 4
+ ],
+ [
+ 2,
+ 5
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 24
+ }
+ ]
+ }
+}
+----
+
+== Matrix Summations
+
+The rows and columns of a matrix can be summed with the `sumRows` and `sumColumns` functions.
+Below is an example of the `sumRows` function which returns an
+array with the sum of each row.
+
+[source,text]
+----
+let(a=matrix(array(1, 2, 3),
+ array(4, 5, 6)),
+ b=sumRows(a))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "b": [
+ 6,
+ 15
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 2
+ }
+ ]
+ }
+}
+----
+
+The `grandSum` function returns the sum of all values in the matrix.
+Below is an example of the `grandSum` function:
+
+[source,text]
+----
+let(a=matrix(array(1, 2, 3),
+ array(4, 5, 6)),
+ b=grandSum(a))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "b": 21
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+== Scalar Matrix Math
+
+The same scalar math functions that apply to vectors can also be applied to matrices: `scalarAdd`, `scalarSubtract`,
+`scalarMultiply`, `scalarDivide`. Below is an example of the `scalarAdd` function
+which adds a scalar value to each element in a matrix.
+
+
+[source,text]
+----
+let(a=matrix(array(1, 2),
+ array(4, 5)),
+ b=scalarAdd(10, a))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "b": [
+ [
+ 11,
+ 12
+ ],
+ [
+ 14,
+ 15
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+== Matrix Addition and Subtraction
+
+Two matrices can be added and subtracted using the `ebeAdd` and `ebeSubtract` functions,
+which perform element-by-element addition
+and subtraction of matrices.
+
+Below is a simple example of an element-by-element addition of a matrix by itself:
+
+[source,text]
+----
+let(a=matrix(array(1, 2),
+ array(4, 5)),
+ b=ebeAdd(a, a))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "b": [
+ [
+ 2,
+ 4
+ ],
+ [
+ 8,
+ 10
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+== Matrix Multiplication
+
+Matrix multiplication can be accomplished using the `matrixMult` function. Below is a simple
+example of matrix multiplication:
+
+[source,text]
+----
+let(a=matrix(array(1, 2),
+ array(4, 5)),
+ b=matrix(array(11, 12),
+ array(14, 15)),
+ c=matrixMult(a, b))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "c": [
+ [
+ 39,
+ 42
+ ],
+ [
+ 114,
+ 123
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/1ed4e226/solr/solr-ref-guide/src/montecarlo.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/montecarlo.adoc b/solr/solr-ref-guide/src/montecarlo.adoc
new file mode 100644
index 0000000..814110f
--- /dev/null
+++ b/solr/solr-ref-guide/src/montecarlo.adoc
@@ -0,0 +1,213 @@
+= Monte Carlo Simulations
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+
+Monte Carlo simulations are commonly used to model the behavior of
+stochastic systems. This section of the user guide describes
+how to perform both *uncorrelated* and *correlated* Monte Carlo simulations
+using the *sampling* capabilities of the probability distribution framework.
+
+=== Uncorrelated Simulations
+
+Uncorrelated Monte Carlo simulations model stochastic systems with the assumption
+ that the underlying random variables move independently of each other.
+ A simple example of a Monte Carlo simulation using two independently changing random variables
+ is described below.
+
+In this example a Monte Carlo simulation is used to determine the probability that a simple hinge assembly will
+fall within a required length specification.
+
+The hinge has two components *A* and *B*. The combined length of the two components must be less then 5 centimeters
+to fall within specification.
+
+A random sampling of lengths for component *A* has shown that its length conforms to a
+normal distribution with a mean of 2.2 centimeters and a standard deviation of .0195
+centimeters.
+
+A random sampling of lengths for component *B* has shown that its length conforms
+to a normal distribution with a mean of 2.71 centimeters and a standard deviation of .0198 centimeters.
+
+The Monte Carlo simulation below performs the following steps:
+
+* A normal distribution with a mean of 2.2 and a standard deviation of .0195 is created to model the length of componentA.
+* A normal distribution with a mean of 2.71 and a standard deviation of .0198 is created to model the length of componentB.
+* The `monteCarlo` function is used to simulate component pairs. The `monteCarlo` function
+ calls the *add(sample(componentA), sample(componentB))* function 100000 times and collects the results in an array. Each
+ time the function is called a random sample is drawn from the componentA
+ and componentB length distributions. The `add` function adds the two samples to calculate the combined length.
+ The result of each function run is collected in an array and assigned to the *simresults* variable.
+* An `empiricalDistribution` function is then created from the *simresults* array to model the distribution of the
+ simulation results.
+* Finally, the `cumulativeProbability` function is called on the *simmodel* to determine the cumulative probability
+ that the combined length of the components is 5 or less.
+* Based on the simulation there is .9994371944629039 probability that the combined length of a component pair will
+be 5 or less.
+
+[source,text]
+----
+let(componentA=normalDistribution(2.2, .0195),
+ componentB=normalDistribution(2.71, .0198),
+ simresults=monteCarlo(add(sample(componentA), sample(componentB)), 100000),
+ simmodel=empiricalDistribution(simresults),
+ prob=cumulativeProbability(simmodel, 5))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "prob": 0.9994371944629039
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 660
+ }
+ ]
+ }
+}
+----
+
+=== Correlated Simulations
+
+The simulation above assumes that the lengths of *componentA* and *componentB* vary independently.
+What would happen to the probability model if there was a correlation between the lengths of
+*componentA* and *componentB*.
+
+In the example below a database containing assembled pairs of components is used to determine
+if there is a correlation between the lengths of the components, and how the correlation effects the model.
+
+Before performing a simulation of the effects of correlation on the probability model its
+useful to understand what the correlation is between the lengths of *componentA* and *componentB*.
+
+In the example below 5000 random samples are selected from a collection
+of assembled hinges. Each sample contains
+lengths of the components in the fields *componentA_d* and *componentB_d*.
+
+Both fields are then vectorized. The *componentA_d* vector is stored in
+variable *b* and the *componentB_d* variable is stored in variable *c*.
+
+Then the correlation of the two vectors is calculated using the `corr` function. Note that the outcome
+from `corr` is 0.9996931313216989. This means that *componentA_d* and *componentB_d* are almost
+perfectly correlated.
+
+[source,text]
+----
+let(a=random(collection5, q="*:*", rows="5000", fl="componentA_d, componentB_d"),
+ b=col(a, componentA_d)),
+ c=col(a, componentB_d)),
+ d=corr(b, c))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "d": 0.9996931313216989
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 309
+ }
+ ]
+ }
+}
+----
+
+How does correlation effect the probability model?
+
+The example below explores how to use a *multivariate normal distribution* function
+to model how correlation effects the probability of hinge defects.
+
+In this example 5000 random samples are selected from a collection
+containing length data for assembled hinges. Each sample contains
+the fields *componentA_d* and *componentB_d*.
+
+Both fields are then vectorized. The *componentA_d* vector is stored in
+variable *b* and the *componentB_d* variable is stored in variable *c*.
+
+An array is created that contains the *means* of the two vectorized fields.
+
+Then both vectors are added to a matrix which is transposed. This creates
+an *observation* matrix where each row contains one observation of
+*componentA_d* and *componentB_d*. A covariance matrix is then created from the columns of
+the observation matrix with the
+`cov` function. The covariance matrix describes the covariance between
+*componentA_d* and *componentB_d*.
+
+The `multivariateNormalDistribution` function is then called with the
+array of means for the two fields and the covariance matrix. The model
+for the multivariate normal distribution is stored in variable *g*.
+
+The `monteCarlo` function then calls the function *add(sample(g))* 50000 times
+and collections the results in a vector. Each time the function is called a single sample
+is drawn from the multivariate normal distribution. Each sample is a vector containing
+one *componentA* and *componentB* pair. the `add` function adds the values in the vector to
+calculate the length of the pair. Over the long term the samples drawn from the
+multivariate normal distribution will conform to the covariance matrix used to construct it.
+
+Just as in the non-correlated example an empirical distribution is used to model probabilities
+of the simulation vector and the `cumulativeProbability` function is used to compute the cumulative
+probability that the combined component length will be 5 centimeters or less.
+
+Notice that the probability of a hinge meeting specification has dropped to 0.9889517439980468.
+This is because the strong correlation
+between the lengths of components means that their lengths rise together causing more hinges to
+fall out of the 5 centimeter specification.
+
+[source,text]
+----
+let(a=random(hinges, q="*:*", rows="5000", fl="componentA_d, componentB_d"),
+ b=col(a, componentA_d),
+ c=col(a, componentB_d),
+ cor=corr(b,c),
+ d=array(mean(b), mean(c)),
+ e=transpose(matrix(b, c)),
+ f=cov(e),
+ g=multiVariateNormalDistribution(d, f),
+ h=monteCarlo(add(sample(g)), 50000),
+ i=empiricalDistribution(h),
+ j=cumulativeProbability(i, 5))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "j": 0.9889517439980468
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 599
+ }
+ ]
+ }
+}
+----
+
http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/1ed4e226/solr/solr-ref-guide/src/numerical-analysis.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/numerical-analysis.adoc b/solr/solr-ref-guide/src/numerical-analysis.adoc
new file mode 100644
index 0000000..cb2bc2e
--- /dev/null
+++ b/solr/solr-ref-guide/src/numerical-analysis.adoc
@@ -0,0 +1,430 @@
+= Interpolation, Derivatives and Integrals
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+This section of the math expression user guide covers *interpolation*, *derivatives* and *integrals*.
+These three interrelated topics are part of the field of mathematics called *numerical analysis*.
+
+== Interpolation
+
+Interpolation is used to construct new data points between a set of known control of points.
+The ability to *predict* new data points allows for *sampling* along the curve defined by the
+control points.
+
+The interpolation functions described below all return an *interpolation model*
+that can be passed to other functions which make use of the sampling capability.
+
+If returned directly the interpolation model returns an array containing predictions for each of the
+control points. This is useful in the case of `loess` interpolation which first smooths the control points
+and then interpolates the smoothed points. All other interpolation function simply return the original
+control points because interpolation predicts a curve that passes through the original control points.
+
+There are different algorithms for interpolation that will result in different predictions
+along the curve. The math expressions library currently supports the following
+interpolation functions:
+
+* `lerp`: Linear interpolation predicts points that pass through each control point and
+ form straight lines between control points.
+* `spline`: Spline interpolation predicts points that pass through each control point
+and form a smooth curve between control points.
+* `akima`: Akima spline interpolation is similar to spline interpolation but is stable to outliers.
+* `loess`: Loess interpolation first performs a non-linear local regression to smooth the original
+control points. Then a spline is used to interpolate the smoothed control points.
+
+=== Upsampling
+
+Interpolation can be used to increase the sampling rate along a curve. One example
+of this would be to take a time series with samples every minute and create a data set with
+samples every second. In order to do this the data points between the minutes must be created.
+
+The `predict` function can be used to predict values anywhere within the bounds of the interpolation
+range. The example below shows a very simple example of upsampling.
+
+In the example linear interpolation is performed on the arrays in variables *x* and *y*. The *x* variable,
+which is the x axis, is a sequence from 0 to 20 with a stride of 2. The *y* variable defines the curve
+along the x axis.
+
+The `lerp` function performs the interpolation and returns the interpolation model.
+
+The `u` value is an array from 0 to 20 with a stride of 1. This fills in the gaps of the original x axis.
+The `predict` function then uses the interpolation function in variable *l* to predict values for
+every point in the array assigned to variable *u*.
+
+The variable *p* is the array of predictions, which is the upsampled set of y values.
+
+[source,text]
+----
+let(x=array(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20),
+ y=array(5, 10, 60, 190, 100, 130, 100, 20, 30, 10, 5),
+ l=lerp(x, y),
+ u=array(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20),
+ p=predict(l, u))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "g": [
+ 5,
+ 7.5,
+ 10,
+ 35,
+ 60,
+ 125,
+ 190,
+ 145,
+ 100,
+ 115,
+ 130,
+ 115,
+ 100,
+ 60,
+ 20,
+ 25,
+ 30,
+ 20,
+ 10,
+ 7.5,
+ 5
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+=== Smoothing Interpolation
+
+The `loess` function is a smoothing interpolator which means it doesn't derive
+a function that passes through the original control points. Instead the `loess` function
+returns a function that smooths the original control points.
+
+A technique known as local regression is used to compute the smoothed curve. The size of the
+neighborhood of the local regression can be adjusted
+to control how close the new curve conforms to the original control points.
+
+The `loess` function is passed *x* and *y* axises and fits a smooth curve to the data.
+If only a single array is provided it is treated as the *y* axis and a sequence is generated
+for the *x* axis.
+
+The example below uses the `loess` function to fit a curve to a set of *y* values in an array.
+The bandwidth parameter defines the percent of data to use for the local
+regression. The lower the percent the smaller the neighborhood used for the local
+regression and the closer the curve will be to the original data.
+
+In the example the fitted curve is subtracted from the original curve using the
+`ebeSubtract` function. The output shows the error between the
+fitted curve and the original curve, known as the residuals. The output also includes
+the sum-of-squares of the residuals which provides a measure
+of how large the error is.
+
+[source,text]
+----
+let(echo="residuals, sumSqError",
+ y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
+ curve=loess(y, bandwidth=.3),
+ residuals=ebeSubtract(y, curve),
+ sumSqError=sumSq(residuals))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "residuals": [
+ 0,
+ 0,
+ 0,
+ -0.040524802275866634,
+ -0.10531988096456502,
+ 0.5906115002526198,
+ 0.004215074334896762,
+ 0.4201374330912433,
+ 0.09618315578013803,
+ 0.012107948556718817,
+ -0.9892939034492398,
+ 0.012014364143757561,
+ 0.1093830927709325,
+ 0.523166271893805,
+ 0.09658362075164639,
+ -0.011433819306139625,
+ 0.9899403519886416,
+ -0.011707983372932773,
+ -0.004223284004140737,
+ -0.00021462867928434548,
+ 0.0018723112875456138
+ ],
+ "sumSqError": 2.8016013870800616
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+In the next example the curve is fit using a bandwidth of .25. Notice that the curve
+is a closer fit, shown by the smaller residuals and lower value for the sum-of-squares of the
+residuals.
+
+[source,text]
+----
+let(echo="residuals, sumSqError",
+ y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 6, 5, 5, 3, 2, 1, 0),
+ curve=loess(y, .25),
+ residuals=ebeSubtract(y, curve),
+ sumSqError=sumSq(residuals))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "residuals": [
+ 0,
+ 0,
+ 0,
+ 0,
+ -0.19117650587715396,
+ 0.442863451538809,
+ -0.18553845993358564,
+ 0.29990769020356645,
+ 0,
+ 0.23761890236245709,
+ -0.7344358765888117,
+ 0.2376189023624491,
+ 0,
+ 0.30373119215254984,
+ -3.552713678800501e-15,
+ -0.23761890236245264,
+ 0.7344358765888046,
+ -0.2376189023625095,
+ 0,
+ 2.842170943040401e-14,
+ -2.4868995751603507e-14
+ ],
+ "sumSqError": 1.7539413576337557
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+== Derivatives
+
+The derivative of a function measures the rate of change of the *y* value in respects to the
+rate of change of the *x* value.
+
+The `derivative` function can compute the derivative of any *interpolation* function.
+The `derivative` function can also compute the derivative of a derivative.
+
+The example below computes the derivative for a `loess` interpolation function.
+
+[source,text]
+----
+let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
+ y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
+ curve=loess(x, y, bandwidth=.3),
+ derivative=derivative(curve))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "derivative": [
+ 1.0022002675659012,
+ 0.9955994648681976,
+ 1.0154018729613081,
+ 1.0643674501141696,
+ 1.0430879694757085,
+ 0.9698717643975381,
+ 0.7488201070357539,
+ 0.44627000894357516,
+ 0.19019561285422165,
+ 0.01703599324311178,
+ -0.001908408138535126,
+ -0.009121607450087499,
+ -0.2576361507216319,
+ -0.49378951291352746,
+ -0.7288073815664,
+ -0.9871806872210384,
+ -1.0025400632604322,
+ -1.001836567536853,
+ -1.0076227586138085,
+ -1.0021524620888589,
+ -1.0020541789058157
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+== Integrals
+
+An integral is a measure of the volume underneath a curve.
+The `integrate` function computes an integral for a specific
+range of an interpolated curve.
+
+In the example below the `integrate` function computes an
+integral for the entire range of the curve, 0 through 20.
+
+[source,text]
+----
+let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
+ y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
+ curve=loess(x, y, bandwidth=.3),
+ integral=integrate(curve, 0, 20))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "integral": 90.17446104846645
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+In the next example an integral is computed for the range of 0 through 10.
+
+[source,text]
+----
+let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
+ y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
+ curve=loess(x, y, bandwidth=.3),
+ integral=integrate(curve, 0, 10))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "integral": 45.300912584519914
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+== Bicubic Spline
+
+The `bicubicSpline` function can be used to interpolate and predict values
+anywhere within a grid of data.
+
+A simple example will make this more clear.
+
+In example below a bicubic spline is used to interpolate a matrix of real estate data.
+Each row of the matrix represents a specific *year*. Each column of the matrix
+represents a *floor* of the building. The grid of numbers is the average selling price of
+an apartment for each year and floor. For example in 2002 the average selling price for
+the 9th floor was 415000 (row 3, column 3).
+
+The `bicubicSpline` function is then used to
+interpolate the grid, and the `predict` function is used to predict a value for year 2003, floor 8.
+Notice that the matrix does not included a data point for year 2003, floor 8. The `bicupicSpline`
+function creates that data point based on the surrounding data in the matrix.
+
+[source,text]
+----
+let(years=array(1998, 2000, 2002, 2004, 2006),
+ floors=array(1, 5, 9, 13, 17, 19),
+ prices = matrix(array(300000, 320000, 330000, 350000, 360000, 370000),
+ array(320000, 330000, 340000, 350000, 365000, 380000),
+ array(400000, 410000, 415000, 425000, 430000, 440000),
+ array(410000, 420000, 425000, 435000, 445000, 450000),
+ array(420000, 430000, 435000, 445000, 450000, 470000)),
+ bspline=bicubicSpline(years, floors, prices),
+ prediction=predict(bspline, 2003, 8))
+----
+
+When this expression is sent to the /stream handler it
+responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "prediction": 418279.5009328358
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/1ed4e226/solr/solr-ref-guide/src/probability.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/probability.adoc b/solr/solr-ref-guide/src/probability.adoc
new file mode 100644
index 0000000..9c46d08
--- /dev/null
+++ b/solr/solr-ref-guide/src/probability.adoc
@@ -0,0 +1,415 @@
+= Probability Distributions
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+This section of the user guide covers the
+*probability distribution
+framework* included in the math expressions library.
+
+== Probability Distributions
+
+The probability distribution framework includes
+many commonly used *real* and *discrete* probability
+distributions, including support for *empirical* and
+*enumerated* distributions that model real world data.
+
+The probability distribution framework also includes a set
+of functions that use the probability distributions
+to support probability calculations and sampling.
+
+=== Real Distributions
+
+The probability distribution framework has the following functions
+which support well known real probability distributions:
+
+* `normalDistribution`: Creates a normal distribution function.
+
+* `logNormalDistribution`: Creates a log normal distribution function.
+
+* `gammaDistribution`: Creates a gamma distribution function.
+
+* `betaDistribution`: Creates a beta distribution function.
+
+* `uniformDistribution`: Creates a uniform real distribution function.
+
+* `weibullDistribution`: Creates a Weibull distribution function.
+
+* `triangularDistribution`: Creates a triangular distribution function.
+
+* `constantDistribution`: Creates constant real distribution function.
+
+=== Empirical Distribution
+
+The `empiricalDistribution` function creates a real probability
+distribution from actual data. An empirical distribution
+can be used interchangeably with any of the theoretical
+real distributions.
+
+=== Discrete
+
+The probability distribution framework has the following functions
+which support well known discrete probability distributions:
+
+* `poissonDistribution`: Creates a Poisson distribution function.
+
+* `binomialDistribution`: Creates a binomial distribution function.
+
+* `uniformIntegerDistribution`: Creates a uniform integer distribution function.
+
+* `geometricDistribution`: Creates a geometric distribution function.
+
+* `zipFDistribution`: Creates a Zipf distribution function.
+
+=== Enumerated Distributions
+
+The `enumeratedDistribution` function creates a discrete
+distribution function from a data set of discrete values,
+or from and enumerated list of values and probabilities.
+
+Enumerated distribution functions can be used interchangeably
+with any of the theoretical discrete distributions.
+
+=== Cumulative Probability
+
+The `cumulativeProbability` function can be used with all
+probability distributions to calculate the
+cumulative probability of encountering a specific
+random variable within a specific distribution.
+
+Below is example of calculating the cumulative probability
+of a random variable within a normal distribution.
+
+In the example a normal distribution function is created
+with a mean of 10 and a standard deviation of 5. Then
+the cumulative probability of the value 12 is calculated for this
+specific distribution.
+
+[source,text]
+----
+let(a=normalDistribution(10, 5),
+ b=cumulativeProbability(a, 12))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "b": 0.6554217416103242
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+Below is an example of a cumulative probability calculation
+using an empirical distribution.
+
+In the example an empirical distribution is created from a random
+sample taken from the *price_f* field.
+
+The cumulative probability of the value .75 is then calculated.
+The *price_f* field in this example was generated using a
+uniform real distribution between 0 and 1, so the output of the
+ `cumulativeProbability` function is very close to .75.
+
+[source,text]
+----
+let(a=random(collection1, q="*:*", rows="30000", fl="price_f"),
+ b=col(a, price_f),
+ c=empiricalDistribution(b),
+ d=cumulativeProbability(c, .75))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "b": 0.7554217416103242
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+=== Probability
+
+The `probability` function can be used with any discrete
+distribution function to compute the probability of a
+discrete value.
+
+Below is an example which calculates the probability
+of a discrete value within a Poisson distribution.
+
+In the example a Poisson distribution function is created
+with a mean of 100. Then the
+probability of encountering a sample of the discrete value 101 is calculated for this
+specific distribution.
+
+[source,text]
+----
+let(a=poissonDistribution(100),
+ b=probability(a, 101))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "b": 0.039466333474403106
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+Below is an example of a probability calculation
+using an enumerated distribution.
+
+In the example an enumerated distribution is created from a random
+sample taken from the *day_i* field, which was created
+using a uniform integer distribution between 0 and 30.
+
+The probability of the discrete value 10 is then calculated.
+
+[source,text]
+----
+let(a=random(collection1, q="*:*", rows="30000", fl="day_i"),
+ b=col(a, day_i),
+ c=enumeratedDistribution(b),
+ d=probability(c, 10))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "d": 0.03356666666666666
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 488
+ }
+ ]
+ }
+}
+----
+
+=== Sampling
+
+All probability distributions support sampling. The `sample`
+function returns 1 or more random samples from a probability
+distribution.
+
+Below is an example drawing a single sample from
+a normal distribution.
+
+[source,text]
+----
+let(a=normalDistribution(10, 5),
+ b=sample(a))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "b": 11.24578055004963
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+Below is an example drawing 10 samples from a normal
+distribution.
+
+[source,text]
+----
+let(a=normalDistribution(10, 5),
+ b=sample(a, 10))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "b": [
+ 10.18444709339441,
+ 9.466947971749377,
+ 1.2420697166234458,
+ 11.074501226984806,
+ 7.659629052136225,
+ 0.4440887839190708,
+ 13.710925254778786,
+ 2.089566359480239,
+ 0.7907293097654424,
+ 2.8184587681006734
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 3
+ }
+ ]
+ }
+}
+----
+
+=== Multivariate Normal Distribution
+
+The multivariate normal distribution is a generalization of the
+univariate normal distribution to higher dimensions.
+
+The multivariate normal distribution models two or more random
+variables that are normally distributed. The relationship between
+the variables is defined by a covariance matrix.
+
+==== Sampling
+
+The `sample` function can be used to draw samples
+from a multivariate normal distribution in much the same
+way as a univariate normal distribution.
+The difference is that each sample will be an array containing a sample
+drawn from each of the underlying normal distributions.
+If multiple samples are drawn, the `sample` function returns a matrix with a
+sample in each row. Over the long term the columns of the sample
+matrix will conform to the covariance matrix used to parametrize the
+multivariate normal distribution.
+
+The example below demonstrates how to initialize and draw samples
+from a multivariate normal distribution.
+
+In this example 5000 random samples are selected from a collection
+of log records. Each sample contains
+the fields *filesize_d* and *response_d*. The values of both fields conform
+to a normal distribution.
+
+Both fields are then vectorized. The *filesize_d* vector is stored in
+variable *b* and the *response_d* variable is stored in variable *c*.
+
+An array is created that contains the *means* of the two vectorized fields.
+
+Then both vectors are added to a matrix which is transposed. This creates
+an *observation* matrix where each row contains one observation of
+*filesize_d* and *response_d*. A covariance matrix is then created from the columns of
+the observation matrix with the
+`cov` function. The covariance matrix describes the covariance between
+*filesize_d* and *response_d*.
+
+The `multivariateNormalDistribution` function is then called with the
+array of means for the two fields and the covariance matrix. The model for the
+multivariate normal distribution is assigned to variable *g*.
+
+Finally five samples are drawn from the multivariate normal distribution. The samples
+are returned as a matrix, with each row representing one sample. There are two
+columns in the matrix. The first column contains samples for *filesize_d* and the second
+column contains samples for *response_d*. Over the long term the covariance between
+the columns will conform to the covariance matrix used to instantiate the
+multivariate normal distribution.
+
+[source,text]
+----
+let(a=random(collection2, q="*:*", rows="5000", fl="filesize_d, response_d"),
+ b=col(a, filesize_d),
+ c=col(a, response_d),
+ d=array(mean(b), mean(c)),
+ e=transpose(matrix(b, c)),
+ f=cov(e),
+ g=multiVariateNormalDistribution(d, f),
+ h=sample(g, 5))
+----
+
+When this expression is sent to the /stream handler it responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "h": [
+ [
+ 41974.85669321393,
+ 779.4097049705296
+ ],
+ [
+ 42869.19876441414,
+ 834.2599296790783
+ ],
+ [
+ 38556.30444839889,
+ 720.3683470060988
+ ],
+ [
+ 37689.31290928216,
+ 686.5549428100018
+ ],
+ [
+ 40564.74398214547,
+ 769.9328090774
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 162
+ }
+ ]
+ }
+}
+----
+