You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Nick Jordan (Created) (JIRA)" <ji...@apache.org> on 2012/02/05 20:57:53 UTC

[jira] [Created] (MAHOUT-972) Implement Taste DynamoDBDataModel

Implement Taste DynamoDBDataModel
---------------------------------

                 Key: MAHOUT-972
                 URL: https://issues.apache.org/jira/browse/MAHOUT-972
             Project: Mahout
          Issue Type: Improvement
          Components: Collaborative Filtering
            Reporter: Nick Jordan
            Assignee: Sean Owen
            Priority: Minor


Implement Amazon's DynamoDB as a data model to be used for collaborative filtering Taste models.

I've actually begun work on this, but have never submitted to an ASF project before.  I'll submit the patch when I've done enough testing that I think it is ready.  If anyone has any hints/tips that will make the patch/submission process easier I'd be happy to hear them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Created] (MAHOUT-972) Implement Taste DynamoDBDataModel

Posted by Ted Dunning <te...@gmail.com>.
Go ahead and submit works in progress.  That lets you get feedback early.

Putting a fork up on github is another way to get feedback.  Just put a
link on the JIRA in a comment.

On Sun, Feb 5, 2012 at 11:57 AM, Nick Jordan (Created) (JIRA) <
jira@apache.org> wrote:

> Implement Taste DynamoDBDataModel
> ---------------------------------
>
>                 Key: MAHOUT-972
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-972
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Nick Jordan
>            Assignee: Sean Owen
>            Priority: Minor
>
>
> Implement Amazon's DynamoDB as a data model to be used for collaborative
> filtering Taste models.
>
> I've actually begun work on this, but have never submitted to an ASF
> project before.  I'll submit the patch when I've done enough testing that I
> think it is ready.  If anyone has any hints/tips that will make the
> patch/submission process easier I'd be happy to hear them.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>

Re: [jira] [Commented] (MAHOUT-972) Implement Taste DynamoDBDataModel

Posted by Nick Jordan <ni...@influen.se>.
It is in memory, and typical return times are 1-3 ms (also in line with
what I would expect with Cassandra).  That on top of those results being
cached via Taste I would think things would work quicker than this.  I'll
continue to investigate.

On Tue, Feb 28, 2012 at 9:55 AM, Sean Owen <sr...@gmail.com> wrote:

> That's way too long. :) I haven't looked at your implementation, but
> is it in-memory? if not, it's never going to be fast. A single
> recommendation request will generate thousands of hits to the NoSQL
> store and that's just not going to be fast. It has to act like a
> cache.
>
> These algos are generally pretty intensive in random data access.
> That's why parallelizing them is hard, but, when done well can be very
> handy.
>
> I don't think there's anything so special about knn in this regard.
>
> Sean
>
> On Sun, Feb 26, 2012 at 4:29 PM, Nick Jordan <ni...@influen.se> wrote:
> > I've continued working on this.  Everything appears to return correctly,
> > but in doing some debugging by using it in my own application I'm seeing
> > some performance issues.
> >
> > Specifically when I run it as the data model as part of
> > a KnnItemBasedRecommender the results are taking on the order of hours
> for
> > a single recommendation to come back.  I've looked at the Caching to see
> if
> > I could the problem there (and have even primed the cache with every
> > user/item) and the performance is still atrocious.
> >
> > I had originally modeled this after the CassandraDataModel and it doesn't
> > seem that once the cache is primed that this has anything to do with
> > accessing the data in DynamoDB.  Are KnnItemBasedRecommenders generally
> > slow for something like this?  I used to run this off of a flat file and
> > never had performance problems.
> >
> > Thanks.
> >
> > Nick
> >
> > On Thu, Feb 9, 2012 at 9:05 AM, Sean Owen (Commented) (JIRA) <
> > jira@apache.org> wrote:
> >
> >>
> >>    [
> >>
> https://issues.apache.org/jira/browse/MAHOUT-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204538#comment-13204538
> ]
> >>
> >> Sean Owen commented on MAHOUT-972:
> >> ----------------------------------
> >>
> >> Ok, good start. This will go in integration/ and it will need to refer
> to
> >> Amazon libs in pom.xml. When done you'll want to add copyright headers
> and
> >> standardize the format and all that, but that's a detail. Ping when
> you've
> >> got something you feel is committable.
> >>
> >> > Implement Taste DynamoDBDataModel
> >> > ---------------------------------
> >> >
> >> >                 Key: MAHOUT-972
> >> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-972
> >> >             Project: Mahout
> >> >          Issue Type: Improvement
> >> >          Components: Collaborative Filtering
> >> >    Affects Versions: 0.6
> >> >            Reporter: Nick Jordan
> >> >            Priority: Minor
> >> >              Labels: datamodel
> >> >         Attachments: DynamoDBDataModel.java
> >> >
> >> >   Original Estimate: 504h
> >> >  Remaining Estimate: 504h
> >> >
> >> > Implement Amazon's DynamoDB as a data model to be used for
> collaborative
> >> filtering Taste models.
> >> > I've actually begun work on this, but have never submitted to an ASF
> >> project before.  I'll submit the patch when I've done enough testing
> that I
> >> think it is ready.  If anyone has any hints/tips that will make the
> >> patch/submission process easier I'd be happy to hear them.
> >>
> >> --
> >> This message is automatically generated by JIRA.
> >> If you think it was sent incorrectly, please contact your JIRA
> >> administrators:
> >>
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> >> For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >>
> >>
> >>
>

Re: [jira] [Commented] (MAHOUT-972) Implement Taste DynamoDBDataModel

Posted by Sean Owen <sr...@gmail.com>.
That's way too long. :) I haven't looked at your implementation, but
is it in-memory? if not, it's never going to be fast. A single
recommendation request will generate thousands of hits to the NoSQL
store and that's just not going to be fast. It has to act like a
cache.

These algos are generally pretty intensive in random data access.
That's why parallelizing them is hard, but, when done well can be very
handy.

I don't think there's anything so special about knn in this regard.

Sean

On Sun, Feb 26, 2012 at 4:29 PM, Nick Jordan <ni...@influen.se> wrote:
> I've continued working on this.  Everything appears to return correctly,
> but in doing some debugging by using it in my own application I'm seeing
> some performance issues.
>
> Specifically when I run it as the data model as part of
> a KnnItemBasedRecommender the results are taking on the order of hours for
> a single recommendation to come back.  I've looked at the Caching to see if
> I could the problem there (and have even primed the cache with every
> user/item) and the performance is still atrocious.
>
> I had originally modeled this after the CassandraDataModel and it doesn't
> seem that once the cache is primed that this has anything to do with
> accessing the data in DynamoDB.  Are KnnItemBasedRecommenders generally
> slow for something like this?  I used to run this off of a flat file and
> never had performance problems.
>
> Thanks.
>
> Nick
>
> On Thu, Feb 9, 2012 at 9:05 AM, Sean Owen (Commented) (JIRA) <
> jira@apache.org> wrote:
>
>>
>>    [
>> https://issues.apache.org/jira/browse/MAHOUT-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204538#comment-13204538]
>>
>> Sean Owen commented on MAHOUT-972:
>> ----------------------------------
>>
>> Ok, good start. This will go in integration/ and it will need to refer to
>> Amazon libs in pom.xml. When done you'll want to add copyright headers and
>> standardize the format and all that, but that's a detail. Ping when you've
>> got something you feel is committable.
>>
>> > Implement Taste DynamoDBDataModel
>> > ---------------------------------
>> >
>> >                 Key: MAHOUT-972
>> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-972
>> >             Project: Mahout
>> >          Issue Type: Improvement
>> >          Components: Collaborative Filtering
>> >    Affects Versions: 0.6
>> >            Reporter: Nick Jordan
>> >            Priority: Minor
>> >              Labels: datamodel
>> >         Attachments: DynamoDBDataModel.java
>> >
>> >   Original Estimate: 504h
>> >  Remaining Estimate: 504h
>> >
>> > Implement Amazon's DynamoDB as a data model to be used for collaborative
>> filtering Taste models.
>> > I've actually begun work on this, but have never submitted to an ASF
>> project before.  I'll submit the patch when I've done enough testing that I
>> think it is ready.  If anyone has any hints/tips that will make the
>> patch/submission process easier I'd be happy to hear them.
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators:
>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>
>>
>>

Re: [jira] [Commented] (MAHOUT-972) Implement Taste DynamoDBDataModel

Posted by Nick Jordan <ni...@influen.se>.
I've continued working on this.  Everything appears to return correctly,
but in doing some debugging by using it in my own application I'm seeing
some performance issues.

Specifically when I run it as the data model as part of
a KnnItemBasedRecommender the results are taking on the order of hours for
a single recommendation to come back.  I've looked at the Caching to see if
I could the problem there (and have even primed the cache with every
user/item) and the performance is still atrocious.

I had originally modeled this after the CassandraDataModel and it doesn't
seem that once the cache is primed that this has anything to do with
accessing the data in DynamoDB.  Are KnnItemBasedRecommenders generally
slow for something like this?  I used to run this off of a flat file and
never had performance problems.

Thanks.

Nick

On Thu, Feb 9, 2012 at 9:05 AM, Sean Owen (Commented) (JIRA) <
jira@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/MAHOUT-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204538#comment-13204538]
>
> Sean Owen commented on MAHOUT-972:
> ----------------------------------
>
> Ok, good start. This will go in integration/ and it will need to refer to
> Amazon libs in pom.xml. When done you'll want to add copyright headers and
> standardize the format and all that, but that's a detail. Ping when you've
> got something you feel is committable.
>
> > Implement Taste DynamoDBDataModel
> > ---------------------------------
> >
> >                 Key: MAHOUT-972
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-972
> >             Project: Mahout
> >          Issue Type: Improvement
> >          Components: Collaborative Filtering
> >    Affects Versions: 0.6
> >            Reporter: Nick Jordan
> >            Priority: Minor
> >              Labels: datamodel
> >         Attachments: DynamoDBDataModel.java
> >
> >   Original Estimate: 504h
> >  Remaining Estimate: 504h
> >
> > Implement Amazon's DynamoDB as a data model to be used for collaborative
> filtering Taste models.
> > I've actually begun work on this, but have never submitted to an ASF
> project before.  I'll submit the patch when I've done enough testing that I
> think it is ready.  If anyone has any hints/tips that will make the
> patch/submission process easier I'd be happy to hear them.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>

[jira] [Commented] (MAHOUT-972) Implement Taste DynamoDBDataModel

Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204538#comment-13204538 ] 

Sean Owen commented on MAHOUT-972:
----------------------------------

Ok, good start. This will go in integration/ and it will need to refer to Amazon libs in pom.xml. When done you'll want to add copyright headers and standardize the format and all that, but that's a detail. Ping when you've got something you feel is committable.
                
> Implement Taste DynamoDBDataModel
> ---------------------------------
>
>                 Key: MAHOUT-972
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-972
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Nick Jordan
>            Priority: Minor
>              Labels: datamodel
>         Attachments: DynamoDBDataModel.java
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Implement Amazon's DynamoDB as a data model to be used for collaborative filtering Taste models.
> I've actually begun work on this, but have never submitted to an ASF project before.  I'll submit the patch when I've done enough testing that I think it is ready.  If anyone has any hints/tips that will make the patch/submission process easier I'd be happy to hear them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-972) Implement Taste DynamoDBDataModel

Posted by "Nick Jordan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202623#comment-13202623 ] 

Nick Jordan commented on MAHOUT-972:
------------------------------------

Fixing up a few small bugs and am going to test using a few different model types.  Should have something to post soon.
                
> Implement Taste DynamoDBDataModel
> ---------------------------------
>
>                 Key: MAHOUT-972
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-972
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Nick Jordan
>            Priority: Minor
>              Labels: datamodel
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Implement Amazon's DynamoDB as a data model to be used for collaborative filtering Taste models.
> I've actually begun work on this, but have never submitted to an ASF project before.  I'll submit the patch when I've done enough testing that I think it is ready.  If anyone has any hints/tips that will make the patch/submission process easier I'd be happy to hear them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-972) Implement Taste DynamoDBDataModel

Posted by "Nick Jordan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Jordan updated MAHOUT-972:
-------------------------------

    Attachment: DynamoDBDataModel.java

Attached is an early version of this that should be fully functional.

I still need to add comments, and develop as part of the latest snapshot (this was developed as part of a separate package), but I'd thought I'd post what I have for other to comment on to make sure I'm moving in the right direction.

I'm out of the country and generally away from this computer for the next eight days.
                
> Implement Taste DynamoDBDataModel
> ---------------------------------
>
>                 Key: MAHOUT-972
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-972
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Nick Jordan
>            Priority: Minor
>              Labels: datamodel
>         Attachments: DynamoDBDataModel.java
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Implement Amazon's DynamoDB as a data model to be used for collaborative filtering Taste models.
> I've actually begun work on this, but have never submitted to an ASF project before.  I'll submit the patch when I've done enough testing that I think it is ready.  If anyone has any hints/tips that will make the patch/submission process easier I'd be happy to hear them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAHOUT-972) Implement Taste DynamoDBDataModel

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-972.
------------------------------

    Resolution: Won't Fix

I think this timed out, but we can reopen if there is ever a finished-ish patch
                
> Implement Taste DynamoDBDataModel
> ---------------------------------
>
>                 Key: MAHOUT-972
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-972
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Nick Jordan
>            Priority: Minor
>              Labels: datamodel
>         Attachments: DynamoDBDataModel.java
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Implement Amazon's DynamoDB as a data model to be used for collaborative filtering Taste models.
> I've actually begun work on this, but have never submitted to an ASF project before.  I'll submit the patch when I've done enough testing that I think it is ready.  If anyone has any hints/tips that will make the patch/submission process easier I'd be happy to hear them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-972) Implement Taste DynamoDBDataModel

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-972:
--------------------------------------

    Fix Version/s:     (was: Backlog)
    
> Implement Taste DynamoDBDataModel
> ---------------------------------
>
>                 Key: MAHOUT-972
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-972
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Nick Jordan
>            Priority: Minor
>              Labels: datamodel
>         Attachments: DynamoDBDataModel.java
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Implement Amazon's DynamoDB as a data model to be used for collaborative filtering Taste models.
> I've actually begun work on this, but have never submitted to an ASF project before.  I'll submit the patch when I've done enough testing that I think it is ready.  If anyone has any hints/tips that will make the patch/submission process easier I'd be happy to hear them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-972) Implement Taste DynamoDBDataModel

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-972:
--------------------------------------

    Fix Version/s: Backlog
    
> Implement Taste DynamoDBDataModel
> ---------------------------------
>
>                 Key: MAHOUT-972
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-972
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Nick Jordan
>            Priority: Minor
>              Labels: datamodel
>             Fix For: Backlog
>
>         Attachments: DynamoDBDataModel.java
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Implement Amazon's DynamoDB as a data model to be used for collaborative filtering Taste models.
> I've actually begun work on this, but have never submitted to an ASF project before.  I'll submit the patch when I've done enough testing that I think it is ready.  If anyone has any hints/tips that will make the patch/submission process easier I'd be happy to hear them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-972) Implement Taste DynamoDBDataModel

Posted by "Sean Owen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-972:
-----------------------------

    Affects Version/s: 0.6
             Assignee:     (was: Sean Owen)

Happy to review whenever you have some code to post.
                
> Implement Taste DynamoDBDataModel
> ---------------------------------
>
>                 Key: MAHOUT-972
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-972
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Nick Jordan
>            Priority: Minor
>              Labels: datamodel
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Implement Amazon's DynamoDB as a data model to be used for collaborative filtering Taste models.
> I've actually begun work on this, but have never submitted to an ASF project before.  I'll submit the patch when I've done enough testing that I think it is ready.  If anyone has any hints/tips that will make the patch/submission process easier I'd be happy to hear them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira