You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by kuba <pa...@interia.pl> on 2012/11/22 18:34:13 UTC

Mahout svd command question

Hi,

I'm new to hadoop, mahout, and language processing.
I'm trying to do LSA (Latent Semantic Analysis) in mahout.
I've made my own version of tf-idf matrix building (I know there's 
seqdirectory and seq2sparse, that can do it for me, but I needed some 
modifications).
I've done 'mahout svd' and I've got output, but don't know how to 
interpret it.

According to books I've read SVD should return three matrices:
M = U * Epsilon * (Vt),

but 'mahout svd' return only one. I can't find any documentation. Which 
one does it return, is it U ?

Do I have to transpose my tf-idf matrix and compute SVD again to get 
second matrix ( V )?

Also I've found people using:
mahout cleansvd
what is it for? is there any good documentation?


Re: Mahout svd command question

Posted by kuba <pa...@interia.pl>.
Thanks for info!
I also found documentation for ssvd:

https://cwiki.apache.org/MAHOUT/stochastic-singular-value-decomposition.html

That would definitley completly solve my problem.
Big Thanks again!


W dniu 22.11.2012 22:00, Ted Dunning pisze:
> That implementation is deprecated.  The SSVD implement should be used
> instead.
>
> On Thu, Nov 22, 2012 at 9:58 AM, Abramov Pavel <p....@rambler-co.ru>wrote:
>
>> Hi,
>>
>> Here is step by step manual for Lanczos implementation:
>>
>> https://cwiki.apache.org/MAHOUT/dimensional-reduction.html
>>
>> Pavel
>> ________________________________________
>> От: kuba [pawloch@interia.pl]
>> Отправлено: 22 ноября 2012 г. 21:34
>> To: user@mahout.apache.org
>> Тема: Mahout svd command question
>>
>> Hi,
>>
>> I'm new to hadoop, mahout, and language processing.
>> I'm trying to do LSA (Latent Semantic Analysis) in mahout.
>> I've made my own version of tf-idf matrix building (I know there's
>> seqdirectory and seq2sparse, that can do it for me, but I needed some
>> modifications).
>> I've done 'mahout svd' and I've got output, but don't know how to
>> interpret it.
>>
>> According to books I've read SVD should return three matrices:
>> M = U * Epsilon * (Vt),
>>
>> but 'mahout svd' return only one. I can't find any documentation. Which
>> one does it return, is it U ?
>>
>> Do I have to transpose my tf-idf matrix and compute SVD again to get
>> second matrix ( V )?
>>
>> Also I've found people using:
>> mahout cleansvd
>> what is it for? is there any good documentation?
>>
>>


Re: Mahout svd command question

Posted by Ted Dunning <te...@gmail.com>.
That implementation is deprecated.  The SSVD implement should be used
instead.

On Thu, Nov 22, 2012 at 9:58 AM, Abramov Pavel <p....@rambler-co.ru>wrote:

> Hi,
>
> Here is step by step manual for Lanczos implementation:
>
> https://cwiki.apache.org/MAHOUT/dimensional-reduction.html
>
> Pavel
> ________________________________________
> От: kuba [pawloch@interia.pl]
> Отправлено: 22 ноября 2012 г. 21:34
> To: user@mahout.apache.org
> Тема: Mahout svd command question
>
> Hi,
>
> I'm new to hadoop, mahout, and language processing.
> I'm trying to do LSA (Latent Semantic Analysis) in mahout.
> I've made my own version of tf-idf matrix building (I know there's
> seqdirectory and seq2sparse, that can do it for me, but I needed some
> modifications).
> I've done 'mahout svd' and I've got output, but don't know how to
> interpret it.
>
> According to books I've read SVD should return three matrices:
> M = U * Epsilon * (Vt),
>
> but 'mahout svd' return only one. I can't find any documentation. Which
> one does it return, is it U ?
>
> Do I have to transpose my tf-idf matrix and compute SVD again to get
> second matrix ( V )?
>
> Also I've found people using:
> mahout cleansvd
> what is it for? is there any good documentation?
>
>

HA: Mahout svd command question

Posted by Abramov Pavel <p....@rambler-co.ru>.
Hi, 

Here is step by step manual for Lanczos implementation:

https://cwiki.apache.org/MAHOUT/dimensional-reduction.html

Pavel
________________________________________
От: kuba [pawloch@interia.pl]
Отправлено: 22 ноября 2012 г. 21:34
To: user@mahout.apache.org
Тема: Mahout svd command question

Hi,

I'm new to hadoop, mahout, and language processing.
I'm trying to do LSA (Latent Semantic Analysis) in mahout.
I've made my own version of tf-idf matrix building (I know there's
seqdirectory and seq2sparse, that can do it for me, but I needed some
modifications).
I've done 'mahout svd' and I've got output, but don't know how to
interpret it.

According to books I've read SVD should return three matrices:
M = U * Epsilon * (Vt),

but 'mahout svd' return only one. I can't find any documentation. Which
one does it return, is it U ?

Do I have to transpose my tf-idf matrix and compute SVD again to get
second matrix ( V )?

Also I've found people using:
mahout cleansvd
what is it for? is there any good documentation?