You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by jb...@apache.org on 2019/10/09 20:43:02 UTC
[lucene-solr] branch SOLR-13105-visual updated: SOLR-13105: Improve
ml docs 10
This is an automated email from the ASF dual-hosted git repository.
jbernste pushed a commit to branch SOLR-13105-visual
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git
The following commit(s) were added to refs/heads/SOLR-13105-visual by this push:
new fcf457f SOLR-13105: Improve ml docs 10
fcf457f is described below
commit fcf457fef184bc53645afb5545dd89c59d030c1c
Author: Joel Bernstein <jb...@apache.org>
AuthorDate: Wed Oct 9 16:42:49 2019 -0400
SOLR-13105: Improve ml docs 10
---
.../src/images/math-expressions/fuzzyk.png | Bin 0 -> 219772 bytes
solr/solr-ref-guide/src/machine-learning.adoc | 78 ++-------------------
2 files changed, 7 insertions(+), 71 deletions(-)
diff --git a/solr/solr-ref-guide/src/images/math-expressions/fuzzyk.png b/solr/solr-ref-guide/src/images/math-expressions/fuzzyk.png
new file mode 100644
index 0000000..34bd944
Binary files /dev/null and b/solr/solr-ref-guide/src/images/math-expressions/fuzzyk.png differ
diff --git a/solr/solr-ref-guide/src/machine-learning.adoc b/solr/solr-ref-guide/src/machine-learning.adoc
index 037217b..1032b76 100644
--- a/solr/solr-ref-guide/src/machine-learning.adoc
+++ b/solr/solr-ref-guide/src/machine-learning.adoc
@@ -846,76 +846,12 @@ membership probabilities for each document. The membership matrix is comprised o
vector that was clustered. There is a column in the matrix for each cluster.
The values in the matrix contain the probability that a specific vector belongs to a specific cluster.
-In the example the `corr` function is used to create a *correlation matrix* from the columns of the
-membership matrix. In other words the correlation matrix shows the correlation of the clusters
-based on the document co-occurrence in the clusters.
+In the example the `distance` function is then used to create a *distance matrix* from the columns of the
+membership matrix. The distance matrix is then visualized with the `zplot` function as a heat map. Notice
+that the heat map has been configured to increase in color intensity as the distance shortens.
-Notice that in the example cluster3 and cluster5 are very highly correlated, which means that
-many documents had a probability of occurring in both clusters. Further analysis of the key features
-in both clusters can be performed to understand how these clusters are interconnected.
-
-[source,text]
-----
-let(a=select(search(reviews, q="text_t:\"star wars\"", rows="500"),
- id,
- analyze(text_t, body) as terms),
- vectors=termVectors(a, maxDocFreq=.10, minDocFreq=.03, minTermLength=13, exclude="_,br,have"),
- clusters=fuzzyKmeans(vectors, 5, fuzziness=1.3),
- m=getMembershipMatrix(clusters),
- corr=corr(m))
-----
-
-
-[source,json]
-----
-{
- "result-set": {
- "docs": [
- {
- "corr": [
- [
- 1,
- -0.3107483649904961,
- -0.01238925922725737,
- -0.034546141301127015,
- -0.012389261961639414
- ],
- [
- -0.3107483649904961,
- 1,
- -0.7752380698457411,
- -0.49268725855405776,
- -0.7752380691584819
- ],
- [
- -0.01238925922725737,
- -0.7752380698457411,
- 1,
- -0.0508166330303757,
- 0.9999999999999954
- ],
- [
- -0.034546141301127015,
- -0.49268725855405776,
- -0.0508166330303757,
- 1,
- -0.05081663258795273
- ],
- [
- -0.012389261961639414,
- -0.7752380691584819,
- 0.9999999999999954,
- -0.05081663258795273,
- 1
- ]
- ]
- },
- {
- "EOF": true,
- "RESPONSE_TIME": 245
- }
- ]
- }
-}
-----
+In the example cluster1 and cluster5 have the shortest distance between the clusters.
+Further analysis of the features in both clusters can be performed to understand
+the relationship between cluster1 and cluster5.
+image::images/math-expressions/fuzzyk.png[]