You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Juan José Ramos <jj...@gmail.com> on 2014/02/25 15:22:59 UTC
Wiki - 'Quick tour of text analysis using the Mahout command line' clarification
In the wiki page: 'Quick tour of text analysis using the Mahout command
line'.
https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line
At the very bottom it is said that
1. This will generate the 10 most similar docs to each doc in the
collection.
1. Examine the similarity list:
mahout seqdumper -i reuters-matrix/matrix | more
Instead of reuters-matrix/matrix, shouldn't it be reuters-similarity/
part-r-00000 since that is the file of the output of rowsimilarity? Or does
on the contrary the rowsimilarity tool also write to reuters-matrix/?
I would expect to contain the 10 most similar documents for every document
in the reuters' catalogue. Is that correct?
Many thanks.
Juanjo.
Re: Wiki - 'Quick tour of text analysis using the Mahout command
line' clarification
Posted by Juan José Ramos <jj...@gmail.com>.
Cool. Thanks for the clarification.
On Tue, Feb 25, 2014 at 3:18 PM, Suneel Marthi <su...@yahoo.com>wrote:
> That's a mistake on wiki that needs to be corrected. U r tight it should
> be the similarity.
>
> Each row would have the 10 most similar docs for ever doc.
>
>
>
> Sent from my iPhone
>
> > On Feb 25, 2014, at 9:22 AM, Juan José Ramos <jj...@gmail.com> wrote:
> >
> > In the wiki page: 'Quick tour of text analysis using the Mahout command
> > line'.
> >
> >
> https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line
> >
> > At the very bottom it is said that
> >
> > 1. This will generate the 10 most similar docs to each doc in the
> > collection.
> >
> >
> > 1. Examine the similarity list:
> > mahout seqdumper -i reuters-matrix/matrix | more
> >
> >
> > Instead of reuters-matrix/matrix, shouldn't it be reuters-similarity/
> > part-r-00000 since that is the file of the output of rowsimilarity? Or
> does
> > on the contrary the rowsimilarity tool also write to reuters-matrix/?
> >
> > I would expect to contain the 10 most similar documents for every
> document
> > in the reuters' catalogue. Is that correct?
> >
> > Many thanks.
> > Juanjo.
>
Re: Wiki - 'Quick tour of text analysis using the Mahout command line' clarification
Posted by Suneel Marthi <su...@yahoo.com>.
That's a mistake on wiki that needs to be corrected. U r tight it should be the similarity.
Each row would have the 10 most similar docs for ever doc.
Sent from my iPhone
> On Feb 25, 2014, at 9:22 AM, Juan José Ramos <jj...@gmail.com> wrote:
>
> In the wiki page: 'Quick tour of text analysis using the Mahout command
> line'.
>
> https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line
>
> At the very bottom it is said that
>
> 1. This will generate the 10 most similar docs to each doc in the
> collection.
>
>
> 1. Examine the similarity list:
> mahout seqdumper -i reuters-matrix/matrix | more
>
>
> Instead of reuters-matrix/matrix, shouldn't it be reuters-similarity/
> part-r-00000 since that is the file of the output of rowsimilarity? Or does
> on the contrary the rowsimilarity tool also write to reuters-matrix/?
>
> I would expect to contain the 10 most similar documents for every document
> in the reuters' catalogue. Is that correct?
>
> Many thanks.
> Juanjo.