You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Hossein Kazemi <ho...@gridline.nl> on 2012/04/11 12:15:03 UTC
Kmeans cluster mapping to actual document IDs
Hi,
I have clustered a set of documents using the Mahout's Kmeans
(map-reduce) I used Sparse Vectors due to the large size of my corpus.
In the book it says that the folder named ClusteredPoints contains the
mapping between the clustered documents and the document IDs. However,
all I can see is just a "1:0" , a feature-vector and a ClusterID. where
can I find the actual document names/ids ?
thx
Re: Kmeans cluster mapping to actual document IDs
Posted by Baoqiang Cao <bq...@gmail.com>.
My very limited experience is that
in seq2sparse step, you need use "-nv" option so that in clusterdump
output, you will see document ID.
Best,
Baoqiang
On Wed, Apr 11, 2012 at 5:15 AM, Hossein Kazemi <ho...@gridline.nl> wrote:
> Hi,
> I have clustered a set of documents using the Mahout's Kmeans (map-reduce) I
> used Sparse Vectors due to the large size of my corpus. In the book it says
> that the folder named ClusteredPoints contains the mapping between the
> clustered documents and the document IDs. However, all I can see is just a
> "1:0" , a feature-vector and a ClusterID. where can I find the actual
> document names/ids ?
> thx
>