You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by jb...@apache.org on 2019/10/09 20:43:02 UTC

[lucene-solr] branch SOLR-13105-visual updated: SOLR-13105: Improve ml docs 10

This is an automated email from the ASF dual-hosted git repository.

jbernste pushed a commit to branch SOLR-13105-visual
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git


The following commit(s) were added to refs/heads/SOLR-13105-visual by this push:
     new fcf457f  SOLR-13105: Improve ml docs 10
fcf457f is described below

commit fcf457fef184bc53645afb5545dd89c59d030c1c
Author: Joel Bernstein <jb...@apache.org>
AuthorDate: Wed Oct 9 16:42:49 2019 -0400

    SOLR-13105: Improve ml docs 10
---
 .../src/images/math-expressions/fuzzyk.png         | Bin 0 -> 219772 bytes
 solr/solr-ref-guide/src/machine-learning.adoc      |  78 ++-------------------
 2 files changed, 7 insertions(+), 71 deletions(-)

diff --git a/solr/solr-ref-guide/src/images/math-expressions/fuzzyk.png b/solr/solr-ref-guide/src/images/math-expressions/fuzzyk.png
new file mode 100644
index 0000000..34bd944
Binary files /dev/null and b/solr/solr-ref-guide/src/images/math-expressions/fuzzyk.png differ
diff --git a/solr/solr-ref-guide/src/machine-learning.adoc b/solr/solr-ref-guide/src/machine-learning.adoc
index 037217b..1032b76 100644
--- a/solr/solr-ref-guide/src/machine-learning.adoc
+++ b/solr/solr-ref-guide/src/machine-learning.adoc
@@ -846,76 +846,12 @@ membership probabilities for each document. The membership matrix is comprised o
 vector that was clustered. There is a column in the matrix for each cluster.
 The values in the matrix contain the probability that a specific vector belongs to a specific cluster.
 
-In the example the `corr` function is used to create a *correlation matrix* from the columns of the
-membership matrix. In other words the correlation matrix shows the correlation of the clusters
-based on the document co-occurrence in the clusters.
+In the example the `distance` function is then used to create a *distance matrix* from the columns of the
+membership matrix. The distance matrix is then visualized with the `zplot` function as a heat map. Notice
+that the heat map has been configured to increase in color intensity as the distance shortens.
 
-Notice that in the example cluster3 and cluster5 are very highly correlated, which means that
-many documents had a probability of occurring in both clusters. Further analysis of the key features
-in both clusters can be performed to understand how these clusters are interconnected.
-
-[source,text]
-----
-let(a=select(search(reviews, q="text_t:\"star wars\"", rows="500"),
-                    id,
-                    analyze(text_t, body) as terms),
-    vectors=termVectors(a, maxDocFreq=.10, minDocFreq=.03, minTermLength=13, exclude="_,br,have"),
-    clusters=fuzzyKmeans(vectors, 5, fuzziness=1.3),
-    m=getMembershipMatrix(clusters),
-    corr=corr(m))
-----
-
-
-[source,json]
-----
-{
-  "result-set": {
-    "docs": [
-      {
-        "corr": [
-          [
-            1,
-            -0.3107483649904961,
-            -0.01238925922725737,
-            -0.034546141301127015,
-            -0.012389261961639414
-          ],
-          [
-            -0.3107483649904961,
-            1,
-            -0.7752380698457411,
-            -0.49268725855405776,
-            -0.7752380691584819
-          ],
-          [
-            -0.01238925922725737,
-            -0.7752380698457411,
-            1,
-            -0.0508166330303757,
-            0.9999999999999954
-          ],
-          [
-            -0.034546141301127015,
-            -0.49268725855405776,
-            -0.0508166330303757,
-            1,
-            -0.05081663258795273
-          ],
-          [
-            -0.012389261961639414,
-            -0.7752380691584819,
-            0.9999999999999954,
-            -0.05081663258795273,
-            1
-          ]
-        ]
-      },
-      {
-        "EOF": true,
-        "RESPONSE_TIME": 245
-      }
-    ]
-  }
-}
-----
+In the example cluster1 and cluster5 have the shortest distance between the clusters.
+Further analysis of the features in both clusters can be performed to understand
+the relationship between cluster1 and cluster5.
 
+image::images/math-expressions/fuzzyk.png[]