You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by jb...@apache.org on 2019/10/02 14:16:56 UTC
[lucene-solr] branch SOLR-13105-visual updated: SOLR-13105: Update
machine learning docs 11
This is an automated email from the ASF dual-hosted git repository.
jbernste pushed a commit to branch SOLR-13105-visual
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git
The following commit(s) were added to refs/heads/SOLR-13105-visual by this push:
new 9060aee SOLR-13105: Update machine learning docs 11
9060aee is described below
commit 9060aee4d8e8cd4b14846dc9990d650e390fdb09
Author: Joel Bernstein <jb...@apache.org>
AuthorDate: Wed Oct 2 10:16:48 2019 -0400
SOLR-13105: Update machine learning docs 11
---
solr/solr-ref-guide/src/machine-learning.adoc | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/solr/solr-ref-guide/src/machine-learning.adoc b/solr/solr-ref-guide/src/machine-learning.adoc
index b107391..3f80a4e 100644
--- a/solr/solr-ref-guide/src/machine-learning.adoc
+++ b/solr/solr-ref-guide/src/machine-learning.adoc
@@ -780,22 +780,23 @@ allows vectors to be assigned to more then one cluster. The `fuzziness` paramete
is a value between 1 and 2 that determines how fuzzy to make the cluster assignment.
After the clustering has been performed the `getMembershipMatrix` function can be called
-on the clustering result to return a matrix describing which clusters each vector belongs to.
+on the clustering result to return a matrix describing the probabilities
+of cluster membership for each vector.
This matrix can be used to understand relationships between clusters.
In the example below `fuzzyKmeans` is used to cluster the movie reviews matching the phrase "star wars".
But instead of looking at the clusters or centroids the `getMembershipMatrix` is used to return the
membership probabilities for each document. The membership matrix is comprised of a row for each
vector that was clustered. There is a column in the matrix for each cluster.
-The values in the matrix are the probability that the vector belongs to a specific cluster.
+The values in the matrix contain the probability that a specific vector belongs to a specific cluster.
-In the example the `corr` function is used to create a *correlation matrix* of the columns of the
+In the example the `corr` function is used to create a *correlation matrix* from the columns of the
membership matrix. In other words the correlation matrix shows the correlation of the clusters
based on the document co-occurrence in the clusters.
Notice that in the example cluster3 and cluster5 are very highly correlated, which means that
many documents had a probability of occurring in both clusters. Further analysis of the key features
-in both clusters can done to understand the reason how these cluster are interconnected.
+in both clusters can be performed to understand how these clusters are interconnected.
[source,text]
----