You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by jb...@apache.org on 2019/10/01 13:30:52 UTC
[lucene-solr] branch SOLR-13105-visual updated: SOLR-13105: Update
machine learning docs 3
This is an automated email from the ASF dual-hosted git repository.
jbernste pushed a commit to branch SOLR-13105-visual
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git
The following commit(s) were added to refs/heads/SOLR-13105-visual by this push:
new 1d47b0e SOLR-13105: Update machine learning docs 3
1d47b0e is described below
commit 1d47b0ec9703af8abe7ffb9c77bcd03124843e1e
Author: Joel Bernstein <jb...@apache.org>
AuthorDate: Tue Oct 1 09:30:33 2019 -0400
SOLR-13105: Update machine learning docs 3
---
.../src/images/math-expressions/knnSearch.png | Bin 0 -> 256236 bytes
solr/solr-ref-guide/src/machine-learning.adoc | 23 ++++++++++++++++++++-
solr/solr-ref-guide/src/regression.adoc | 12 +++++------
3 files changed, 28 insertions(+), 7 deletions(-)
diff --git a/solr/solr-ref-guide/src/images/math-expressions/knnSearch.png b/solr/solr-ref-guide/src/images/math-expressions/knnSearch.png
new file mode 100644
index 0000000..761e180
Binary files /dev/null and b/solr/solr-ref-guide/src/images/math-expressions/knnSearch.png differ
diff --git a/solr/solr-ref-guide/src/machine-learning.adoc b/solr/solr-ref-guide/src/machine-learning.adoc
index 11fc388..bc6802e 100644
--- a/solr/solr-ref-guide/src/machine-learning.adoc
+++ b/solr/solr-ref-guide/src/machine-learning.adoc
@@ -322,7 +322,28 @@ When this expression is sent to the `/stream` handler it responds with:
The `knnSearch` function returns the k-nearest neighbors
for a document based on text similarity.
Under the covers the `knnSearch` function
-uses the More Like This query parser plugin.
+uses the More Like This query parser plugin. This capability uses the search
+engines query, term statistics, scoring and ranking capability to perform a fast,
+nearest neighbor search for similar documents over large distributed indexes.
+
+The results of this
+search can be used directly or provide *candidates* records for machine learning operations such
+as a secondary knn vector search.
+
+The example below shows the `knnSearch` function run over a movie reviews data set. The
+search returns the 50 documents most similar to document id *83e9b5b0-...*
+based on the similarity of the *review_t* field which contains
+the text of the review. The *mindf* and *maxdf* specify the min and max document frequency of the terms
+used to perform the search. This makes the query faster by eliminating very high frequency terms
+and terms and also improves accuracy be removing noise from search.
+
+
+image::images/math-expressions/knnSearch.png[]
+
+NOTE: In this example the `select`
+function is used to truncate the review in the output to 220 characters to make it easier
+to read in a table.
+
== K-Nearest Neighbor (KNN)
diff --git a/solr/solr-ref-guide/src/regression.adoc b/solr/solr-ref-guide/src/regression.adoc
index 5b73935..69af0ed 100644
--- a/solr/solr-ref-guide/src/regression.adoc
+++ b/solr/solr-ref-guide/src/regression.adoc
@@ -135,7 +135,7 @@ let(a=random(logs, q="*:*", rows="5000", fl="filesize_d, response_d"),
When this expression is sent to the `/stream` handler it responds with:
-[source,json]
+[source,text]
----
{
"result-set": {
@@ -271,11 +271,11 @@ independent variables and the `response_d` values, stored in variable *`d`*, as
[source,text]
----
let(a=random(testapp, q="*:*", rows="30000", fl="filesize_d, load_d, response_d"),
- b=col(a, filesize_d),
- c=col(a, load_d),
- d=col(a, response_d),
- e=transpose(matrix(b, c)),
- f=olsRegress(e, d))
+ x=col(a, filesize_d),
+ y=col(a, load_d),
+ z=col(a, response_d),
+ m=transpose(matrix(x, y)),
+ r=olsRegress(m, z))
----
Notice in the response that the RSquared of the regression analysis is 1. This means that linear relationship between