You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hama.apache.org by Apache Wiki <wi...@apache.org> on 2008/09/17 09:47:14 UTC
[Hama Wiki] Trivial Update of "WordCountMatrix" by udanax
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.
The following page has been changed by udanax:
http://wiki.apache.org/hama/WordCountMatrix
------------------------------------------------------------------------------
== Abstract ==
- Basically, It'll shows how to construct the matrix from the files.
+ The word count matrix (document-word) approach is often referred to as latent semantic indexing and document clustering (Of course, A word frequently present in all documents will not be useful for clustering -- The length of all documents is not uniform so a lengthy document will have higher word counts). This example gives parallel implementation of the Matrix-creation (In the future, the matrix sparse decomposition technique).
- This word count matrix (document-word) approach is often referred to as latent semantic indexing and document clustering (Of course, A word frequently present in all documents will not be useful for clustering -- The length of all documents is not uniform so a lengthy document will have higher word counts).
-