You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by jb...@apache.org on 2019/10/01 13:30:52 UTC

[lucene-solr] branch SOLR-13105-visual updated: SOLR-13105: Update machine learning docs 3

This is an automated email from the ASF dual-hosted git repository.

jbernste pushed a commit to branch SOLR-13105-visual
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git


The following commit(s) were added to refs/heads/SOLR-13105-visual by this push:
     new 1d47b0e  SOLR-13105: Update machine learning docs 3
1d47b0e is described below

commit 1d47b0ec9703af8abe7ffb9c77bcd03124843e1e
Author: Joel Bernstein <jb...@apache.org>
AuthorDate: Tue Oct 1 09:30:33 2019 -0400

    SOLR-13105: Update machine learning docs 3
---
 .../src/images/math-expressions/knnSearch.png      | Bin 0 -> 256236 bytes
 solr/solr-ref-guide/src/machine-learning.adoc      |  23 ++++++++++++++++++++-
 solr/solr-ref-guide/src/regression.adoc            |  12 +++++------
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/solr/solr-ref-guide/src/images/math-expressions/knnSearch.png b/solr/solr-ref-guide/src/images/math-expressions/knnSearch.png
new file mode 100644
index 0000000..761e180
Binary files /dev/null and b/solr/solr-ref-guide/src/images/math-expressions/knnSearch.png differ
diff --git a/solr/solr-ref-guide/src/machine-learning.adoc b/solr/solr-ref-guide/src/machine-learning.adoc
index 11fc388..bc6802e 100644
--- a/solr/solr-ref-guide/src/machine-learning.adoc
+++ b/solr/solr-ref-guide/src/machine-learning.adoc
@@ -322,7 +322,28 @@ When this expression is sent to the `/stream` handler it responds with:
 The `knnSearch` function returns the k-nearest neighbors
 for a document based on text similarity.
 Under the covers the `knnSearch` function
-uses the More Like This query parser plugin.
+uses the More Like This query parser plugin. This capability uses the search
+engines query, term statistics, scoring and ranking capability to perform a fast,
+nearest neighbor search for similar documents over large distributed indexes.
+
+The results of this
+search can be used directly or provide *candidates* records for machine learning operations such
+as a secondary knn vector search.
+
+The example below shows the `knnSearch` function run over a movie reviews data set. The
+search returns the 50 documents most similar to document id *83e9b5b0-...*
+based on the similarity of the *review_t* field which contains
+the text of the review. The *mindf* and *maxdf* specify the min and max document frequency of the terms
+used to perform the search. This makes the query faster by eliminating very high frequency terms
+and terms and also improves accuracy be removing noise from search.
+
+
+image::images/math-expressions/knnSearch.png[]
+
+NOTE: In this example the `select`
+function is used to truncate the review in the output to 220 characters to make it easier
+to read in a table.
+
 
 == K-Nearest Neighbor (KNN)
 
diff --git a/solr/solr-ref-guide/src/regression.adoc b/solr/solr-ref-guide/src/regression.adoc
index 5b73935..69af0ed 100644
--- a/solr/solr-ref-guide/src/regression.adoc
+++ b/solr/solr-ref-guide/src/regression.adoc
@@ -135,7 +135,7 @@ let(a=random(logs, q="*:*", rows="5000", fl="filesize_d, response_d"),
 
 When this expression is sent to the `/stream` handler it responds with:
 
-[source,json]
+[source,text]
 ----
 {
   "result-set": {
@@ -271,11 +271,11 @@ independent variables and the `response_d` values, stored in variable *`d`*, as
 [source,text]
 ----
 let(a=random(testapp, q="*:*", rows="30000", fl="filesize_d, load_d, response_d"),
-    b=col(a, filesize_d),
-    c=col(a, load_d),
-    d=col(a, response_d),
-    e=transpose(matrix(b, c)),
-    f=olsRegress(e, d))
+    x=col(a, filesize_d),
+    y=col(a, load_d),
+    z=col(a, response_d),
+    m=transpose(matrix(x, y)),
+    r=olsRegress(m, z))
 ----
 
 Notice in the response that the RSquared of the regression analysis is 1. This means that linear relationship between