You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Chui-Hui Chiu <cc...@tigers.lsu.edu> on 2012/11/21 18:17:01 UTC

Reading the vector files

Hello, all,

I ran the K-Mean Clustering sample and got the output files.  How do I
convert the output Mahout vector files to a human readable format?  Is
there any documents about that?


Thanks,
Chiu

Re: Mahout svd command question

Posted by kuba <pa...@interia.pl>.
Thanks for info!
I also found documentation for ssvd:

https://cwiki.apache.org/MAHOUT/stochastic-singular-value-decomposition.html

That would definitley completly solve my problem.
Big Thanks again!


W dniu 22.11.2012 22:00, Ted Dunning pisze:
> That implementation is deprecated.  The SSVD implement should be used
> instead.
>
> On Thu, Nov 22, 2012 at 9:58 AM, Abramov Pavel <p....@rambler-co.ru>wrote:
>
>> Hi,
>>
>> Here is step by step manual for Lanczos implementation:
>>
>> https://cwiki.apache.org/MAHOUT/dimensional-reduction.html
>>
>> Pavel
>> ________________________________________
>> От: kuba [pawloch@interia.pl]
>> Отправлено: 22 ноября 2012 г. 21:34
>> To: user@mahout.apache.org
>> Тема: Mahout svd command question
>>
>> Hi,
>>
>> I'm new to hadoop, mahout, and language processing.
>> I'm trying to do LSA (Latent Semantic Analysis) in mahout.
>> I've made my own version of tf-idf matrix building (I know there's
>> seqdirectory and seq2sparse, that can do it for me, but I needed some
>> modifications).
>> I've done 'mahout svd' and I've got output, but don't know how to
>> interpret it.
>>
>> According to books I've read SVD should return three matrices:
>> M = U * Epsilon * (Vt),
>>
>> but 'mahout svd' return only one. I can't find any documentation. Which
>> one does it return, is it U ?
>>
>> Do I have to transpose my tf-idf matrix and compute SVD again to get
>> second matrix ( V )?
>>
>> Also I've found people using:
>> mahout cleansvd
>> what is it for? is there any good documentation?
>>
>>


Re: Mahout svd command question

Posted by Ted Dunning <te...@gmail.com>.
That implementation is deprecated.  The SSVD implement should be used
instead.

On Thu, Nov 22, 2012 at 9:58 AM, Abramov Pavel <p....@rambler-co.ru>wrote:

> Hi,
>
> Here is step by step manual for Lanczos implementation:
>
> https://cwiki.apache.org/MAHOUT/dimensional-reduction.html
>
> Pavel
> ________________________________________
> От: kuba [pawloch@interia.pl]
> Отправлено: 22 ноября 2012 г. 21:34
> To: user@mahout.apache.org
> Тема: Mahout svd command question
>
> Hi,
>
> I'm new to hadoop, mahout, and language processing.
> I'm trying to do LSA (Latent Semantic Analysis) in mahout.
> I've made my own version of tf-idf matrix building (I know there's
> seqdirectory and seq2sparse, that can do it for me, but I needed some
> modifications).
> I've done 'mahout svd' and I've got output, but don't know how to
> interpret it.
>
> According to books I've read SVD should return three matrices:
> M = U * Epsilon * (Vt),
>
> but 'mahout svd' return only one. I can't find any documentation. Which
> one does it return, is it U ?
>
> Do I have to transpose my tf-idf matrix and compute SVD again to get
> second matrix ( V )?
>
> Also I've found people using:
> mahout cleansvd
> what is it for? is there any good documentation?
>
>

HA: Mahout svd command question

Posted by Abramov Pavel <p....@rambler-co.ru>.
Hi, 

Here is step by step manual for Lanczos implementation:

https://cwiki.apache.org/MAHOUT/dimensional-reduction.html

Pavel
________________________________________
От: kuba [pawloch@interia.pl]
Отправлено: 22 ноября 2012 г. 21:34
To: user@mahout.apache.org
Тема: Mahout svd command question

Hi,

I'm new to hadoop, mahout, and language processing.
I'm trying to do LSA (Latent Semantic Analysis) in mahout.
I've made my own version of tf-idf matrix building (I know there's
seqdirectory and seq2sparse, that can do it for me, but I needed some
modifications).
I've done 'mahout svd' and I've got output, but don't know how to
interpret it.

According to books I've read SVD should return three matrices:
M = U * Epsilon * (Vt),

but 'mahout svd' return only one. I can't find any documentation. Which
one does it return, is it U ?

Do I have to transpose my tf-idf matrix and compute SVD again to get
second matrix ( V )?

Also I've found people using:
mahout cleansvd
what is it for? is there any good documentation?


Mahout svd command question

Posted by kuba <pa...@interia.pl>.
Hi,

I'm new to hadoop, mahout, and language processing.
I'm trying to do LSA (Latent Semantic Analysis) in mahout.
I've made my own version of tf-idf matrix building (I know there's 
seqdirectory and seq2sparse, that can do it for me, but I needed some 
modifications).
I've done 'mahout svd' and I've got output, but don't know how to 
interpret it.

According to books I've read SVD should return three matrices:
M = U * Epsilon * (Vt),

but 'mahout svd' return only one. I can't find any documentation. Which 
one does it return, is it U ?

Do I have to transpose my tf-idf matrix and compute SVD again to get 
second matrix ( V )?

Also I've found people using:
mahout cleansvd
what is it for? is there any good documentation?


Re: Reading the vector files

Posted by DAN HELM <da...@verizon.net>.
See: http://amgadmadkour.blogspot.com/2012/07/kmeans-clustering-using-apache-mahout.html

 

________________________________
 From: Chui-Hui Chiu <cc...@tigers.lsu.edu>
To: user@mahout.apache.org 
Sent: Wednesday, November 21, 2012 12:17 PM
Subject: Reading the vector files
  
Hello, all,

I ran the K-Mean Clustering sample and got the output files.  How do I
convert the output Mahout vector files to a human readable format?  Is
there any documents about that?


Thanks,
Chiu