You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Divya <di...@k2associates.com.sg> on 2010/10/26 08:10:49 UTC
generate document-document similarity matrix
Hi,
I am new mahout user and using Mahout 0.4 with eclipse.
I need to generate document similarity matrix from the vector file which I
have already created using SparseVectorsFromSequenceFiles
Now I need to generate the document similarity matrix.
Which gave me
Directory structure
-> df-count
-> tfidf-vectors
-> tf-vectors
-> tokenized-documents
-> wordcount
-> .dictionary.file-0.crc
-> .frequency.file-0.crc
-> dictionary.file-0
-> frequency.file-0
I am confused now which one to use
Which utility of mahout computes document document similairity matrix.
Can any one help me.
Regards,
Divya
RE: generate document-document similarity matrix
Posted by Divya <di...@k2associates.com.sg>.
Right now I have only few documents..
Just wanna know what kind of similarity it generates.
As I have no idea on what basis it generates similarity..
-----Original Message-----
From: Sebastian Schelter [mailto:ssc@apache.org]
Sent: Tuesday, October 26, 2010 2:37 PM
To: dev@mahout.apache.org
Subject: Re: generate document-document similarity matrix
Hi,
how many documents do you have and what kind of similarity do you wanna use?
--sebastian
On 26.10.2010 08:10, Divya wrote:
> Hi,
>
> I am new mahout user and using Mahout 0.4 with eclipse.
>
> I need to generate document similarity matrix from the vector file which I
> have already created using SparseVectorsFromSequenceFiles
>
> Now I need to generate the document similarity matrix.
>
> Which gave me
>
> Directory structure
>
> -> df-count
>
> -> tfidf-vectors
>
> -> tf-vectors
>
> -> tokenized-documents
>
> -> wordcount
>
> -> .dictionary.file-0.crc
>
> -> .frequency.file-0.crc
>
> -> dictionary.file-0
>
> -> frequency.file-0
>
>
>
> I am confused now which one to use
>
> Which utility of mahout computes document document similairity matrix.
>
>
>
> Can any one help me.
>
>
>
>
>
> Regards,
>
> Divya
>
>
>
Re: generate document-document similarity matrix
Posted by Sebastian Schelter <ss...@apache.org>.
Hi,
how many documents do you have and what kind of similarity do you wanna use?
--sebastian
On 26.10.2010 08:10, Divya wrote:
> Hi,
>
> I am new mahout user and using Mahout 0.4 with eclipse.
>
> I need to generate document similarity matrix from the vector file which I
> have already created using SparseVectorsFromSequenceFiles
>
> Now I need to generate the document similarity matrix.
>
> Which gave me
>
> Directory structure
>
> -> df-count
>
> -> tfidf-vectors
>
> -> tf-vectors
>
> -> tokenized-documents
>
> -> wordcount
>
> -> .dictionary.file-0.crc
>
> -> .frequency.file-0.crc
>
> -> dictionary.file-0
>
> -> frequency.file-0
>
>
>
> I am confused now which one to use
>
> Which utility of mahout computes document document similairity matrix.
>
>
>
> Can any one help me.
>
>
>
>
>
> Regards,
>
> Divya
>
>
>