You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by AJ Rader <an...@gmail.com> on 2014/02/27 18:30:25 UTC

Re: parallelALS and RMSE TEST

Sean Owen <srowen <at> gmail.com> writes:

> 
> Parallel ALS is exactly an example of where you can use matrix
> factorization for "0/1" data.
> 
> On Mon, May 6, 2013 at 9:22 PM, Tevfik Aytekin <tevfik.aytekin <at> 
gmail.com> wrote:
> > Hi Sean,
> > Isn't boolean preferences is supported in the context of memory-based
> > recommendation algorithms in Mahout?
> > Are there matrix factorization algorithms in Mahout which can work
> > with this kind of data (that is, the kind of data which consists of
> > users and the movies they have seen).
> >
> >
> >
> >
> > On Mon, May 6, 2013 at 10:34 PM, Sean Owen <srowen <at> gmail.com> 
wrote:
> >> Yes, it goes by the name 'boolean prefs' in the project since target
> >> variables don't have values -- they just exist or don't.
> >> So, yes it's certainly supported but the question here is how to
> >> evaluate the output.
> >>
> >> On Mon, May 6, 2013 at 8:29 PM, Tevfik Aytekin <tevfik.aytekin <at> 
gmail.com> wrote:
> >>> This problem is called one-class classification problem. In the domain
> >>> of collaborative filtering it is called one-class collaborative
> >>> filtering (since what you have are only positive preferences). You may
> >>> search the web with these key words to find papers providing
> >>> solutions. I'm not sure whether Mahout has algorithms for one-class
> >>> collaborative filtering.
> >>>
> >>> On Mon, May 6, 2013 at 1:42 PM, Sean Owen <srowen <at> gmail.com> 
wrote:
> >>>> ALS-WR weights the error on each term differently, so the average
> >>>> error doesn't really have meaning here, even if you are comparing the
> >>>> difference with "1". I think you will need to fall back to mean
> >>>> average precision or something.
> >>>>
> >>>> On Mon, May 6, 2013 at 11:24 AM, William <icswilliam2010 <at> 
gmail.com> wrote:
> >>>>> Sean Owen <srowen <at> gmail.com> writes:
> >>>>>
> >>>>>>
> >>>>>> If you have no ratings, how are you using RMSE? this typically
> >>>>>> measures error in reconstructing ratings.
> >>>>>> I think you are probably measuring something meaningless.
> >>>>>>
> >>>>>
> >>>>>
> >>>>> I suppose the rate of seen movies are 1. Is it right?
> >>>>> If I use Collaborative Filtering with ALS-WR to get some 
recommendations, I
> >>>>> must have a real rating-matrix?
> >>>>>
> >>>>>
> >>>>>

I was wondering what kind of format the output produced by parallelALS is 
stored in. More specifically I am looking for a way to decode/read this 
information. 

I have been able to run the mahout parallelALS command, calculate RMSE using 
mahout evaluateFactorization, and generate recommendations via mahout 
recommendfactorized.  

However I would like to take a closer look at things like the factorized 
products for my probeSet (stored in --tempDir from the 'mahout 
evaluateFactorization' command) and the actual feature vectors stored in the 
/out/U/ and /out/M/ directories.

thanks
AJ

Re: parallelALS and RMSE TEST

Posted by Sebastian Schelter <ss...@apache.org>.

The output of parallelALS are two matrices U and M whose product is an 
approximation of your input matrix.

The matrices are outputed as sequence files with an IntWritable as key 
(the index of the row in the matrix) and a VectorWritable as value which 
holds the contents of the row vector.

--sebastian

On 02/27/2014 06:30 PM, AJ Rader wrote:
>
> Sean Owen <srowen <at> gmail.com> writes:
>
>>
>> Parallel ALS is exactly an example of where you can use matrix
>> factorization for "0/1" data.
>>
>> On Mon, May 6, 2013 at 9:22 PM, Tevfik Aytekin <tevfik.aytekin <at>
> gmail.com> wrote:
>>> Hi Sean,
>>> Isn't boolean preferences is supported in the context of memory-based
>>> recommendation algorithms in Mahout?
>>> Are there matrix factorization algorithms in Mahout which can work
>>> with this kind of data (that is, the kind of data which consists of
>>> users and the movies they have seen).
>>>
>>>
>>>
>>>
>>> On Mon, May 6, 2013 at 10:34 PM, Sean Owen <srowen <at> gmail.com>
> wrote:
>>>> Yes, it goes by the name 'boolean prefs' in the project since target
>>>> variables don't have values -- they just exist or don't.
>>>> So, yes it's certainly supported but the question here is how to
>>>> evaluate the output.
>>>>
>>>> On Mon, May 6, 2013 at 8:29 PM, Tevfik Aytekin <tevfik.aytekin <at>
> gmail.com> wrote:
>>>>> This problem is called one-class classification problem. In the domain
>>>>> of collaborative filtering it is called one-class collaborative
>>>>> filtering (since what you have are only positive preferences). You may
>>>>> search the web with these key words to find papers providing
>>>>> solutions. I'm not sure whether Mahout has algorithms for one-class
>>>>> collaborative filtering.
>>>>>
>>>>> On Mon, May 6, 2013 at 1:42 PM, Sean Owen <srowen <at> gmail.com>
> wrote:
>>>>>> ALS-WR weights the error on each term differently, so the average
>>>>>> error doesn't really have meaning here, even if you are comparing the
>>>>>> difference with "1". I think you will need to fall back to mean
>>>>>> average precision or something.
>>>>>>
>>>>>> On Mon, May 6, 2013 at 11:24 AM, William <icswilliam2010 <at>
> gmail.com> wrote:
>>>>>>> Sean Owen <srowen <at> gmail.com> writes:
>>>>>>>
>>>>>>>>
>>>>>>>> If you have no ratings, how are you using RMSE? this typically
>>>>>>>> measures error in reconstructing ratings.
>>>>>>>> I think you are probably measuring something meaningless.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I suppose the rate of seen movies are 1. Is it right?
>>>>>>> If I use Collaborative Filtering with ALS-WR to get some
> recommendations, I
>>>>>>> must have a real rating-matrix?
>>>>>>>
>>>>>>>
>>>>>>>
>
> I was wondering what kind of format the output produced by parallelALS is
> stored in. More specifically I am looking for a way to decode/read this
> information.
>
> I have been able to run the mahout parallelALS command, calculate RMSE using
> mahout evaluateFactorization, and generate recommendations via mahout
> recommendfactorized.
>
> However I would like to take a closer look at things like the factorized
> products for my probeSet (stored in --tempDir from the 'mahout
> evaluateFactorization' command) and the actual feature vectors stored in the
> /out/U/ and /out/M/ directories.
>
> thanks
> AJ
>
>