You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by jb...@apache.org on 2019/10/09 23:33:02 UTC
[lucene-solr] branch SOLR-13105-visual updated: SOLR-13105: Improve
ml docs 15
This is an automated email from the ASF dual-hosted git repository.
jbernste pushed a commit to branch SOLR-13105-visual
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git
The following commit(s) were added to refs/heads/SOLR-13105-visual by this push:
new bac9f40 SOLR-13105: Improve ml docs 15
bac9f40 is described below
commit bac9f408149c2a1c8f2a242b16506ef17907bbf4
Author: Joel Bernstein <jb...@apache.org>
AuthorDate: Wed Oct 9 19:32:52 2019 -0400
SOLR-13105: Improve ml docs 15
---
.../src/images/math-expressions/distance.png | Bin 0 -> 372293 bytes
solr/solr-ref-guide/src/machine-learning.adoc | 416 ++++++++++-----------
2 files changed, 201 insertions(+), 215 deletions(-)
diff --git a/solr/solr-ref-guide/src/images/math-expressions/distance.png b/solr/solr-ref-guide/src/images/math-expressions/distance.png
new file mode 100644
index 0000000..373cb35
Binary files /dev/null and b/solr/solr-ref-guide/src/images/math-expressions/distance.png differ
diff --git a/solr/solr-ref-guide/src/machine-learning.adoc b/solr/solr-ref-guide/src/machine-learning.adoc
index 7264d19..68ba161 100644
--- a/solr/solr-ref-guide/src/machine-learning.adoc
+++ b/solr/solr-ref-guide/src/machine-learning.adoc
@@ -20,188 +20,19 @@
This section of the math expressions user guide covers machine learning
functions.
-<<Feature Scaling, Feature Scaling>> -
<<Distance and Distance Measures, Distance>> -
-<<knnSearch, knnSearch>> -
<<K-Nearest Neighbor (KNN), KNN>> -
<<K-Nearest Neighbor Regression, KNN Regression>> -
<<K-Means Clustering, K-means Clustering>> -
-<<Fuzzy K-Means Clustering, Fuzzy K-means>>
+<<Fuzzy K-Means Clustering, Fuzzy K-means>> -
+<<Feature Scaling, Feature Scaling>>
-== Feature Scaling
-
-Before performing machine learning operations its often necessary to
-scale the feature vectors so they can be compared at the same scale.
-
-All the scaling functions below operate on vectors and matrices.
-When operating on a matrix the rows of the matrix are scaled.
-
-=== Min/Max Scaling
-
-The `minMaxScale` function scales a vector or matrix between a minimum and maximum value.
-By default it will scale between 0 and 1 if min/max values are not provided.
-
-Below is a plot of a sine wave, with an amplitude of 1, before and
-after it has been scaled between -5 and 5.
-
-image::images/math-expressions/minmaxscale.png[]
-
-
-Below is a simple example of min/max scaling of a matrix between 0 and 1.
-Notice that once brought into the same scale the vectors are the same.
-
-[source,text]
-----
-let(a=array(20, 30, 40, 50),
- b=array(200, 300, 400, 500),
- c=matrix(a, b),
- d=minMaxScale(c))
-----
-
-When this expression is sent to the `/stream` handler it responds with:
-
-[source,json]
-----
-{
- "result-set": {
- "docs": [
- {
- "d": [
- [
- 0,
- 0.3333333333333333,
- 0.6666666666666666,
- 1
- ],
- [
- 0,
- 0.3333333333333333,
- 0.6666666666666666,
- 1
- ]
- ]
- },
- {
- "EOF": true,
- "RESPONSE_TIME": 0
- }
- ]
- }
-}
-----
-
-=== Standardization
-
-The `standardize` function scales a vector so that it has a
-mean of 0 and a standard deviation of 1.
-
-Below is a plot of a sine wave, with an amplitude of 1, before and
-after it has been standardized.
-
-image::images/math-expressions/standardize.png[]
-
-Below is a simple example of of a standardized matrix.
-Notice that once brought into the same scale the vectors are the same.
-
-[source,text]
-----
-let(a=array(20, 30, 40, 50),
- b=array(200, 300, 400, 500),
- c=matrix(a, b),
- d=standardize(c))
-----
-
-When this expression is sent to the `/stream` handler it responds with:
-
-[source,json]
-----
-{
- "result-set": {
- "docs": [
- {
- "d": [
- [
- -1.161895003862225,
- -0.3872983346207417,
- 0.3872983346207417,
- 1.161895003862225
- ],
- [
- -1.1618950038622249,
- -0.38729833462074165,
- 0.38729833462074165,
- 1.1618950038622249
- ]
- ]
- },
- {
- "EOF": true,
- "RESPONSE_TIME": 17
- }
- ]
- }
-}
-----
-
-=== Unit Vectors
-
-The `unitize` function scales vectors to a magnitude of 1. A vector with a
-magnitude of 1 is known as a unit vector. Unit vectors are preferred
-when the vector math deals with vector direction rather than magnitude.
-
-Below is a plot of a sine wave, with an amplitude of 1, before and
-after it has been unitized.
-
-image::images/math-expressions/unitize.png[]
-
-Below is a simple example of a unitized matrix.
-Notice that once brought into the same scale the vectors are the same.
-
-[source,text]
-----
-let(a=array(20, 30, 40, 50),
- b=array(200, 300, 400, 500),
- c=matrix(a, b),
- d=unitize(c))
-----
-
-When this expression is sent to the `/stream` handler it responds with:
-
-[source,json]
-----
-{
- "result-set": {
- "docs": [
- {
- "d": [
- [
- 0.2721655269759087,
- 0.40824829046386296,
- 0.5443310539518174,
- 0.6804138174397716
- ],
- [
- 0.2721655269759087,
- 0.4082482904638631,
- 0.5443310539518174,
- 0.6804138174397717
- ]
- ]
- },
- {
- "EOF": true,
- "RESPONSE_TIME": 6
- }
- ]
- }
-}
-----
== Distance and Distance Measures
The `distance` function computes the distance for two numeric arrays or a distance matrix for the columns of a matrix.
-There are five distance measure functions that return a function that performs the actual distance calculation:
+There are six distance measure functions that return a function that performs the actual distance calculation:
* `euclidean` (default)
* `manhattan`
@@ -270,52 +101,38 @@ When this expression is sent to the `/stream` handler it responds with:
----
-Below is an example for computing a distance matrix for columns
-of a matrix:
+Distance matrices are powerful tools for visualizing the distance
+between two or more
+vectors.
-[source,text]
-----
-let(a=array(20, 30, 40),
- b=array(21, 29, 41),
- c=array(31, 40, 50),
- d=matrix(a, b, c),
- c=distance(d))
-----
+The `distance` function builds a distance matrix
+if a matrix is passed as the parameter. The distance matrix is computed for the *columns*
+of the matrix.
+
+The example below demonstrates the power of distance matrices combined with 2 dimensional faceting.
+
+In this example the `facet2D` function is used to generate a two dimensional facet aggregation
+over the fields *complaint_type_s* and *zip_s* from the *nyc311* complaints database.
+The *top 20* complaint types and the *top 25* zip codes for each complaint type are aggregated.
+The result is a stream of tuples each containing the fields *complaint_type_s*, *zip_s* and
+the count for the pair.
+
+The `pivot` function is then used to pivot the fields into a *matrix* with the *zip_s*
+field as the *rows* and the *complaint_type_s* field as the *columns*. The `count(*)` field populates
+the values in the cells of the matrix.
+
+The `distance` function is then used to compute the distance matrix for the columns
+of the matrix using `cosine` distance. This produces a distance matrix
+that shows distance between complaint types based on the zip codes they appear in.
+
+Finally the `zplot` function is used to plot the distance matrix as a heat map. Notice that the
+heat map has been configured so that the intensity of color increases as the distance between vectors
+decreases.
+
+
+image::images/math-expressions/distance.png[]
-When this expression is sent to the `/stream` handler it responds with:
-[source,json]
-----
-{
- "result-set": {
- "docs": [
- {
- "e": [
- [
- 0,
- 15.652475842498529,
- 34.07345007480164
- ],
- [
- 15.652475842498529,
- 0,
- 18.547236990991408
- ],
- [
- 34.07345007480164,
- 18.547236990991408,
- 0
- ]
- ]
- },
- {
- "EOF": true,
- "RESPONSE_TIME": 24
- }
- ]
- }
-}
-----
== knnSearch
@@ -855,3 +672,172 @@ Further analysis of the features in both clusters can be performed to understand
the relationship between cluster1 and cluster5.
image::images/math-expressions/fuzzyk.png[]
+
+== Feature Scaling
+
+Before performing machine learning operations its often necessary to
+scale the feature vectors so they can be compared at the same scale.
+
+All the scaling functions below operate on vectors and matrices.
+When operating on a matrix the rows of the matrix are scaled.
+
+=== Min/Max Scaling
+
+The `minMaxScale` function scales a vector or matrix between a minimum and maximum value.
+By default it will scale between 0 and 1 if min/max values are not provided.
+
+Below is a plot of a sine wave, with an amplitude of 1, before and
+after it has been scaled between -5 and 5.
+
+image::images/math-expressions/minmaxscale.png[]
+
+
+Below is a simple example of min/max scaling of a matrix between 0 and 1.
+Notice that once brought into the same scale the vectors are the same.
+
+[source,text]
+----
+let(a=array(20, 30, 40, 50),
+ b=array(200, 300, 400, 500),
+ c=matrix(a, b),
+ d=minMaxScale(c))
+----
+
+When this expression is sent to the `/stream` handler it responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "d": [
+ [
+ 0,
+ 0.3333333333333333,
+ 0.6666666666666666,
+ 1
+ ],
+ [
+ 0,
+ 0.3333333333333333,
+ 0.6666666666666666,
+ 1
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 0
+ }
+ ]
+ }
+}
+----
+
+=== Standardization
+
+The `standardize` function scales a vector so that it has a
+mean of 0 and a standard deviation of 1.
+
+Below is a plot of a sine wave, with an amplitude of 1, before and
+after it has been standardized.
+
+image::images/math-expressions/standardize.png[]
+
+Below is a simple example of of a standardized matrix.
+Notice that once brought into the same scale the vectors are the same.
+
+[source,text]
+----
+let(a=array(20, 30, 40, 50),
+ b=array(200, 300, 400, 500),
+ c=matrix(a, b),
+ d=standardize(c))
+----
+
+When this expression is sent to the `/stream` handler it responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "d": [
+ [
+ -1.161895003862225,
+ -0.3872983346207417,
+ 0.3872983346207417,
+ 1.161895003862225
+ ],
+ [
+ -1.1618950038622249,
+ -0.38729833462074165,
+ 0.38729833462074165,
+ 1.1618950038622249
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 17
+ }
+ ]
+ }
+}
+----
+
+=== Unit Vectors
+
+The `unitize` function scales vectors to a magnitude of 1. A vector with a
+magnitude of 1 is known as a unit vector. Unit vectors are preferred
+when the vector math deals with vector direction rather than magnitude.
+
+Below is a plot of a sine wave, with an amplitude of 1, before and
+after it has been unitized.
+
+image::images/math-expressions/unitize.png[]
+
+Below is a simple example of a unitized matrix.
+Notice that once brought into the same scale the vectors are the same.
+
+[source,text]
+----
+let(a=array(20, 30, 40, 50),
+ b=array(200, 300, 400, 500),
+ c=matrix(a, b),
+ d=unitize(c))
+----
+
+When this expression is sent to the `/stream` handler it responds with:
+
+[source,json]
+----
+{
+ "result-set": {
+ "docs": [
+ {
+ "d": [
+ [
+ 0.2721655269759087,
+ 0.40824829046386296,
+ 0.5443310539518174,
+ 0.6804138174397716
+ ],
+ [
+ 0.2721655269759087,
+ 0.4082482904638631,
+ 0.5443310539518174,
+ 0.6804138174397717
+ ]
+ ]
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 6
+ }
+ ]
+ }
+}
+----
\ No newline at end of file