You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org> on 2011/11/09 16:35:51 UTC

[jira] [Commented] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

    [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147103#comment-13147103 ] 

Grant Ingersoll commented on MAHOUT-878:
----------------------------------------

See also the stuff I did for build-asf-email.sh.  Would be nice to add into that.
                
> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
>                 Key: MAHOUT-878
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
>             Project: Mahout
>          Issue Type: Task
>          Components: Collaborative Filtering
>    Affects Versions: 1.0
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>
> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Posted by Grant Ingersoll <gs...@apache.org>.
On Nov 9, 2011, at 10:16 AM, Sebastian Schelter wrote:

> Hi Grant,
> 
> I'm currently looking into MailToRecMapper to understand the data you
> extract from the ASF email archives. (Haven't had the time to actually
> run it yet)
> 
> As far as I understand it outputs
> 
> from,msgId,1
> 
> for each mail. What exactly is the msgId here?

It's the mail message-id header

> 
> I'm searching for an example where I have implicit feedback data in the form
> 
> <user> <item> <number of observed interactions>
> 
> It would be important to have different numbers of interaction as the
> algorithm I'm trying to exemplify uses this number to calculate a
> "confidence" for the data point. E.g. if a user has never seen some
> movie, you would see 0 interactions, which could mean that he doesn't
> like the movie, but it could also mean he just doesn't know it exists,
> so we have low confidence in the observation. On the other hand if he
> watched the movie 20 times, we can be pretty sure he likes it.
> 
> Would it be possible to extract data in the form
> 
> <email> <thread> <number of responses>

Yeah, I think so.  That was my original plan, but then decided not to, but the code should be simple.

> 
> from the asf email archives? I recall a discussion stating that
> identifying a thread is pretty hard task...
> 
> Best,
> Sebastian
> 
> 
> 
> On 09.11.2011 16:35, Grant Ingersoll (Commented) (JIRA) wrote:
>> 
>>    [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147103#comment-13147103 ] 
>> 
>> Grant Ingersoll commented on MAHOUT-878:
>> ----------------------------------------
>> 
>> See also the stuff I did for build-asf-email.sh.  Would be nice to add into that.
>> 
>>> Provide better examples for the parallel ALS recommender code
>>> -------------------------------------------------------------
>>> 
>>>                Key: MAHOUT-878
>>>                URL: https://issues.apache.org/jira/browse/MAHOUT-878
>>>            Project: Mahout
>>>         Issue Type: Task
>>>         Components: Collaborative Filtering
>>>   Affects Versions: 1.0
>>>           Reporter: Sebastian Schelter
>>>           Assignee: Sebastian Schelter
>>> 
>>> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
>> 
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>> 
>> 
> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com




Re: [jira] [Commented] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Grant,

I'm currently looking into MailToRecMapper to understand the data you
extract from the ASF email archives. (Haven't had the time to actually
run it yet)

As far as I understand it outputs

from,msgId,1

for each mail. What exactly is the msgId here?

I'm searching for an example where I have implicit feedback data in the form

<user> <item> <number of observed interactions>

It would be important to have different numbers of interaction as the
algorithm I'm trying to exemplify uses this number to calculate a
"confidence" for the data point. E.g. if a user has never seen some
movie, you would see 0 interactions, which could mean that he doesn't
like the movie, but it could also mean he just doesn't know it exists,
so we have low confidence in the observation. On the other hand if he
watched the movie 20 times, we can be pretty sure he likes it.

Would it be possible to extract data in the form

<email> <thread> <number of responses>

from the asf email archives? I recall a discussion stating that
identifying a thread is pretty hard task...

Best,
Sebastian



On 09.11.2011 16:35, Grant Ingersoll (Commented) (JIRA) wrote:
> 
>     [ https://issues.apache.org/jira/browse/MAHOUT-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147103#comment-13147103 ] 
> 
> Grant Ingersoll commented on MAHOUT-878:
> ----------------------------------------
> 
> See also the stuff I did for build-asf-email.sh.  Would be nice to add into that.
>                 
>> Provide better examples for the parallel ALS recommender code
>> -------------------------------------------------------------
>>
>>                 Key: MAHOUT-878
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
>>             Project: Mahout
>>          Issue Type: Task
>>          Components: Collaborative Filtering
>>    Affects Versions: 1.0
>>            Reporter: Sebastian Schelter
>>            Assignee: Sebastian Schelter
>>
>> We should provide examples that show how to apply the parallel ALS recommender to the Netflix or KDD2011 datasets.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
>