You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sebastian Schelter (Created) (JIRA)" <ji...@apache.org> on 2011/11/08 11:18:51 UTC

[jira] [Created] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Provide better examples for the parallel ALS recommender code
-------------------------------------------------------------

                 Key: MAHOUT-878
                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
             Project: Mahout
          Issue Type: Task
          Components: Collaborative Filtering
    Affects Versions: 1.0
            Reporter: Sebastian Schelter
            Assignee: Sebastian Schelter


We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Posted by "Sebastian Schelter (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148582#comment-13148582 ] 

Sebastian Schelter commented on MAHOUT-878:
-------------------------------------------

Ok. But we already have a small example using the 1 million movielens dataset for this algorithm.
                
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
>                 Key: MAHOUT-878
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
>             Project: Mahout
>          Issue Type: Task
>          Components: Collaborative Filtering
>    Affects Versions: 1.0
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.6
>
>         Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148572#comment-13148572 ] 

Grant Ingersoll commented on MAHOUT-878:
----------------------------------------

Sure, but most of are examples are meant to try out locally, too.
                
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
>                 Key: MAHOUT-878
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
>             Project: Mahout
>          Issue Type: Task
>          Components: Collaborative Filtering
>    Affects Versions: 1.0
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.6
>
>         Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Posted by "Sebastian Schelter (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter resolved MAHOUT-878.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.6
    
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
>                 Key: MAHOUT-878
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
>             Project: Mahout
>          Issue Type: Task
>          Components: Collaborative Filtering
>    Affects Versions: 1.0
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.6
>
>         Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Posted by Grant Ingersoll <gs...@apache.org>.
On Nov 9, 2011, at 10:16 AM, Sebastian Schelter wrote:

> Hi Grant,
> 
> I'm currently looking into MailToRecMapper to understand the data you
> extract from the ASF email archives. (Haven't had the time to actually
> run it yet)
> 
> As far as I understand it outputs
> 
> from,msgId,1
> 
> for each mail. What exactly is the msgId here?

It's the mail message-id header

> 
> I'm searching for an example where I have implicit feedback data in the form
> 
> <user> <item> <number of observed interactions>
> 
> It would be important to have different numbers of interaction as the
> algorithm I'm trying to exemplify uses this number to calculate a
> "confidence" for the data point. E.g. if a user has never seen some
> movie, you would see 0 interactions, which could mean that he doesn't
> like the movie, but it could also mean he just doesn't know it exists,
> so we have low confidence in the observation. On the other hand if he
> watched the movie 20 times, we can be pretty sure he likes it.
> 
> Would it be possible to extract data in the form
> 
> <email> <thread> <number of responses>

Yeah, I think so.  That was my original plan, but then decided not to, but the code should be simple.

> 
> from the asf email archives? I recall a discussion stating that
> identifying a thread is pretty hard task...
> 
> Best,
> Sebastian
> 
> 
> 
> On 09.11.2011 16:35, Grant Ingersoll (Commented) (JIRA) wrote:
>> 
>>    [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147103#comment-13147103 ] 
>> 
>> Grant Ingersoll commented on MAHOUT-878:
>> ----------------------------------------
>> 
>> See also the stuff I did for build-asf-email.sh.  Would be nice to add into that.
>> 
>>> Provide better examples for the parallel ALS recommender code
>>> -------------------------------------------------------------
>>> 
>>>                Key: MAHOUT-878
>>>                URL: https://issues.apache.org/jira/browse/MAHOUT-878
>>>            Project: Mahout
>>>         Issue Type: Task
>>>         Components: Collaborative Filtering
>>>   Affects Versions: 1.0
>>>           Reporter: Sebastian Schelter
>>>           Assignee: Sebastian Schelter
>>> 
>>> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
>> 
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>> 
>> 
> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com




Re: [jira] [Commented] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Grant,

I'm currently looking into MailToRecMapper to understand the data you
extract from the ASF email archives. (Haven't had the time to actually
run it yet)

As far as I understand it outputs

from,msgId,1

for each mail. What exactly is the msgId here?

I'm searching for an example where I have implicit feedback data in the form

<user> <item> <number of observed interactions>

It would be important to have different numbers of interaction as the
algorithm I'm trying to exemplify uses this number to calculate a
"confidence" for the data point. E.g. if a user has never seen some
movie, you would see 0 interactions, which could mean that he doesn't
like the movie, but it could also mean he just doesn't know it exists,
so we have low confidence in the observation. On the other hand if he
watched the movie 20 times, we can be pretty sure he likes it.

Would it be possible to extract data in the form

<email> <thread> <number of responses>

from the asf email archives? I recall a discussion stating that
identifying a thread is pretty hard task...

Best,
Sebastian



On 09.11.2011 16:35, Grant Ingersoll (Commented) (JIRA) wrote:
> 
>     [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147103#comment-13147103 ] 
> 
> Grant Ingersoll commented on MAHOUT-878:
> ----------------------------------------
> 
> See also the stuff I did for build-asf-email.sh.  Would be nice to add into that.
>                 
>> Provide better examples for the parallel ALS recommender code
>> -------------------------------------------------------------
>>
>>                 Key: MAHOUT-878
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
>>             Project: Mahout
>>          Issue Type: Task
>>          Components: Collaborative Filtering
>>    Affects Versions: 1.0
>>            Reporter: Sebastian Schelter
>>            Assignee: Sebastian Schelter
>>
>> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
>         


[jira] [Commented] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147103#comment-13147103 ] 

Grant Ingersoll commented on MAHOUT-878:
----------------------------------------

See also the stuff I did for build-asf-email.sh.  Would be nice to add into that.
                
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
>                 Key: MAHOUT-878
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
>             Project: Mahout
>          Issue Type: Task
>          Components: Collaborative Filtering
>    Affects Versions: 1.0
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148541#comment-13148541 ] 

Grant Ingersoll commented on MAHOUT-878:
----------------------------------------

You might also do one for the Amazon product review data set at http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html  That has 5.8M reviews.  I've got some sequential preprocessing code that extracts out the items, converts ids to longs and gets the rating.
                
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
>                 Key: MAHOUT-878
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
>             Project: Mahout
>          Issue Type: Task
>          Components: Collaborative Filtering
>    Affects Versions: 1.0
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.6
>
>         Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147930#comment-13147930 ] 

Hudson commented on MAHOUT-878:
-------------------------------

Integrated in Mahout-Quality #1164 (See [https://builds.apache.org/job/Mahout-Quality/1164/])
    MAHOUT-878 Provide better examples for the parallel ALS recommender code

ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1200366
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/FactorizationEvaluator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/ParallelALSFactorizationJob.java
* /mahout/trunk/examples/bin/factorize-netflix.sh
* /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/hadoop
* /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/hadoop/example
* /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/hadoop/example/als
* /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/hadoop/example/als/netflix
* /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/hadoop/example/als/netflix/NetflixDatasetConverter.java
* /mahout/trunk/math/src/main/java/org/apache/mahout/math/als/AlternatingLeastSquaresSolver.java

                
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
>                 Key: MAHOUT-878
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
>             Project: Mahout
>          Issue Type: Task
>          Components: Collaborative Filtering
>    Affects Versions: 1.0
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.6
>
>         Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Posted by "Sebastian Schelter (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148559#comment-13148559 ] 

Sebastian Schelter commented on MAHOUT-878:
-------------------------------------------

Would make a nice usecase but 5.8 million is a bit small for a hadoop based solution.
                
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
>                 Key: MAHOUT-878
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
>             Project: Mahout
>          Issue Type: Task
>          Components: Collaborative Filtering
>    Affects Versions: 1.0
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.6
>
>         Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Posted by "Sebastian Schelter (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-878:
--------------------------------------

    Attachment: MAHOUT-878.patch

shell script to run parallel ALS on the netflix dataset
                
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
>                 Key: MAHOUT-878
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
>             Project: Mahout
>          Issue Type: Task
>          Components: Collaborative Filtering
>    Affects Versions: 1.0
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.6
>
>         Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira