You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sebastian Schelter (Created) (JIRA)" <ji...@apache.org> on 2011/11/08 11:18:51 UTC
[jira] [Created] (MAHOUT-878) Provide better examples for the
parallel ALS recommender code
Provide better examples for the parallel ALS recommender code
-------------------------------------------------------------
Key: MAHOUT-878
URL: https://issues.apache.org/jira/browse/MAHOUT-878
Project: Mahout
Issue Type: Task
Components: Collaborative Filtering
Affects Versions: 1.0
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-878) Provide better examples for the
parallel ALS recommender code
Posted by "Sebastian Schelter (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148582#comment-13148582 ]
Sebastian Schelter commented on MAHOUT-878:
-------------------------------------------
Ok. But we already have a small example using the 1 million movielens dataset for this algorithm.
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
> Key: MAHOUT-878
> URL: https://issues.apache.org/jira/browse/MAHOUT-878
> Project: Mahout
> Issue Type: Task
> Components: Collaborative Filtering
> Affects Versions: 1.0
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Fix For: 0.6
>
> Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-878) Provide better examples for the
parallel ALS recommender code
Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148572#comment-13148572 ]
Grant Ingersoll commented on MAHOUT-878:
----------------------------------------
Sure, but most of are examples are meant to try out locally, too.
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
> Key: MAHOUT-878
> URL: https://issues.apache.org/jira/browse/MAHOUT-878
> Project: Mahout
> Issue Type: Task
> Components: Collaborative Filtering
> Affects Versions: 1.0
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Fix For: 0.6
>
> Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAHOUT-878) Provide better examples for the
parallel ALS recommender code
Posted by "Sebastian Schelter (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Schelter resolved MAHOUT-878.
---------------------------------------
Resolution: Fixed
Fix Version/s: 0.6
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
> Key: MAHOUT-878
> URL: https://issues.apache.org/jira/browse/MAHOUT-878
> Project: Mahout
> Issue Type: Task
> Components: Collaborative Filtering
> Affects Versions: 1.0
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Fix For: 0.6
>
> Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Commented] (MAHOUT-878) Provide better examples for the parallel ALS recommender code
Posted by Grant Ingersoll <gs...@apache.org>.
On Nov 9, 2011, at 10:16 AM, Sebastian Schelter wrote:
> Hi Grant,
>
> I'm currently looking into MailToRecMapper to understand the data you
> extract from the ASF email archives. (Haven't had the time to actually
> run it yet)
>
> As far as I understand it outputs
>
> from,msgId,1
>
> for each mail. What exactly is the msgId here?
It's the mail message-id header
>
> I'm searching for an example where I have implicit feedback data in the form
>
> <user> <item> <number of observed interactions>
>
> It would be important to have different numbers of interaction as the
> algorithm I'm trying to exemplify uses this number to calculate a
> "confidence" for the data point. E.g. if a user has never seen some
> movie, you would see 0 interactions, which could mean that he doesn't
> like the movie, but it could also mean he just doesn't know it exists,
> so we have low confidence in the observation. On the other hand if he
> watched the movie 20 times, we can be pretty sure he likes it.
>
> Would it be possible to extract data in the form
>
> <email> <thread> <number of responses>
Yeah, I think so. That was my original plan, but then decided not to, but the code should be simple.
>
> from the asf email archives? I recall a discussion stating that
> identifying a thread is pretty hard task...
>
> Best,
> Sebastian
>
>
>
> On 09.11.2011 16:35, Grant Ingersoll (Commented) (JIRA) wrote:
>>
>> [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147103#comment-13147103 ]
>>
>> Grant Ingersoll commented on MAHOUT-878:
>> ----------------------------------------
>>
>> See also the stuff I did for build-asf-email.sh. Would be nice to add into that.
>>
>>> Provide better examples for the parallel ALS recommender code
>>> -------------------------------------------------------------
>>>
>>> Key: MAHOUT-878
>>> URL: https://issues.apache.org/jira/browse/MAHOUT-878
>>> Project: Mahout
>>> Issue Type: Task
>>> Components: Collaborative Filtering
>>> Affects Versions: 1.0
>>> Reporter: Sebastian Schelter
>>> Assignee: Sebastian Schelter
>>>
>>> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>
>>
>
--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com
Re: [jira] [Commented] (MAHOUT-878) Provide better examples for the
parallel ALS recommender code
Posted by Sebastian Schelter <ss...@apache.org>.
Hi Grant,
I'm currently looking into MailToRecMapper to understand the data you
extract from the ASF email archives. (Haven't had the time to actually
run it yet)
As far as I understand it outputs
from,msgId,1
for each mail. What exactly is the msgId here?
I'm searching for an example where I have implicit feedback data in the form
<user> <item> <number of observed interactions>
It would be important to have different numbers of interaction as the
algorithm I'm trying to exemplify uses this number to calculate a
"confidence" for the data point. E.g. if a user has never seen some
movie, you would see 0 interactions, which could mean that he doesn't
like the movie, but it could also mean he just doesn't know it exists,
so we have low confidence in the observation. On the other hand if he
watched the movie 20 times, we can be pretty sure he likes it.
Would it be possible to extract data in the form
<email> <thread> <number of responses>
from the asf email archives? I recall a discussion stating that
identifying a thread is pretty hard task...
Best,
Sebastian
On 09.11.2011 16:35, Grant Ingersoll (Commented) (JIRA) wrote:
>
> [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147103#comment-13147103 ]
>
> Grant Ingersoll commented on MAHOUT-878:
> ----------------------------------------
>
> See also the stuff I did for build-asf-email.sh. Would be nice to add into that.
>
>> Provide better examples for the parallel ALS recommender code
>> -------------------------------------------------------------
>>
>> Key: MAHOUT-878
>> URL: https://issues.apache.org/jira/browse/MAHOUT-878
>> Project: Mahout
>> Issue Type: Task
>> Components: Collaborative Filtering
>> Affects Versions: 1.0
>> Reporter: Sebastian Schelter
>> Assignee: Sebastian Schelter
>>
>> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
[jira] [Commented] (MAHOUT-878) Provide better examples for the
parallel ALS recommender code
Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147103#comment-13147103 ]
Grant Ingersoll commented on MAHOUT-878:
----------------------------------------
See also the stuff I did for build-asf-email.sh. Would be nice to add into that.
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
> Key: MAHOUT-878
> URL: https://issues.apache.org/jira/browse/MAHOUT-878
> Project: Mahout
> Issue Type: Task
> Components: Collaborative Filtering
> Affects Versions: 1.0
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-878) Provide better examples for the
parallel ALS recommender code
Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148541#comment-13148541 ]
Grant Ingersoll commented on MAHOUT-878:
----------------------------------------
You might also do one for the Amazon product review data set at http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html That has 5.8M reviews. I've got some sequential preprocessing code that extracts out the items, converts ids to longs and gets the rating.
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
> Key: MAHOUT-878
> URL: https://issues.apache.org/jira/browse/MAHOUT-878
> Project: Mahout
> Issue Type: Task
> Components: Collaborative Filtering
> Affects Versions: 1.0
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Fix For: 0.6
>
> Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-878) Provide better examples for the
parallel ALS recommender code
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147930#comment-13147930 ]
Hudson commented on MAHOUT-878:
-------------------------------
Integrated in Mahout-Quality #1164 (See [https://builds.apache.org/job/Mahout-Quality/1164/])
MAHOUT-878 Provide better examples for the parallel ALS recommender code
ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1200366
Files :
* /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/FactorizationEvaluator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/ParallelALSFactorizationJob.java
* /mahout/trunk/examples/bin/factorize-netflix.sh
* /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/hadoop
* /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/hadoop/example
* /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/hadoop/example/als
* /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/hadoop/example/als/netflix
* /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/hadoop/example/als/netflix/NetflixDatasetConverter.java
* /mahout/trunk/math/src/main/java/org/apache/mahout/math/als/AlternatingLeastSquaresSolver.java
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
> Key: MAHOUT-878
> URL: https://issues.apache.org/jira/browse/MAHOUT-878
> Project: Mahout
> Issue Type: Task
> Components: Collaborative Filtering
> Affects Versions: 1.0
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Fix For: 0.6
>
> Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-878) Provide better examples for the
parallel ALS recommender code
Posted by "Sebastian Schelter (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148559#comment-13148559 ]
Sebastian Schelter commented on MAHOUT-878:
-------------------------------------------
Would make a nice usecase but 5.8 million is a bit small for a hadoop based solution.
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
> Key: MAHOUT-878
> URL: https://issues.apache.org/jira/browse/MAHOUT-878
> Project: Mahout
> Issue Type: Task
> Components: Collaborative Filtering
> Affects Versions: 1.0
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Fix For: 0.6
>
> Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-878) Provide better examples for the
parallel ALS recommender code
Posted by "Sebastian Schelter (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Schelter updated MAHOUT-878:
--------------------------------------
Attachment: MAHOUT-878.patch
shell script to run parallel ALS on the netflix dataset
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
> Key: MAHOUT-878
> URL: https://issues.apache.org/jira/browse/MAHOUT-878
> Project: Mahout
> Issue Type: Task
> Components: Collaborative Filtering
> Affects Versions: 1.0
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Fix For: 0.6
>
> Attachments: MAHOUT-878.patch
>
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira