You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2008/09/01 08:38:58 UTC

Re: Taste on Hbase?

AFAIK, Since hadoop doesn't provide file-append function, Current
Hbase have a problem of data loss when Hbase crashed.

BTW, We also think about CF for example -
http://wiki.apache.org/hama/TraditionalCollaborativeFiltering

If you have some advanced idea, please let me know.

Regards, Edward

On Mon, Sep 1, 2008 at 9:16 AM, Sean Owen <sr...@gmail.com> wrote:
> I looked at this some more and I am not sure HBase will work out... it
> doesn't support anything like a query it seems. Really is just a
> distributed sorted map, which is a bit less than BigTable. Not even a
> "how many rows are in the table method" it seems.
>
> On Fri, Aug 29, 2008 at 6:20 PM, Sean Owen <sr...@gmail.com> wrote:
>> Nah, pretty much random -- well, most of the queries are like "show me
>> all ratings for item ID x" or "... from user ID x" though some are a
>> bit more complex. I think you can get 90% of what's needed from two
>> HBase tables, one keyed by user and the other by item, though you end
>> up duplicating a lot of data. Perhaps there are answers to that, and
>> to the other sorts of queries that are needed. It could be that it's
>> just not a fit but seems like there might be some way to use it
>> effectively for this purpose.
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: Taste on Hbase?

Posted by Sean Owen <sr...@gmail.com>.

Yes, this is a sketch of basic user-based collaborative filtering,
using a cosine-measure correlation as a similarity metric? (I think it
needs to divide out by the size of the two vectors?).

The analog in Mahout would be
org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender,
and org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity

I agree that one could parallelize computation of the user-user
similarity. Indeed I think any scalable recommender is going to have
to do a lot of intense precomputation, via something like Hadoop, and
then relatively little at runtime.

On Mon, Sep 1, 2008 at 7:38 AM, Edward J. Yoon <ed...@apache.org> wrote:
> BTW, We also think about CF for example -
> http://wiki.apache.org/hama/TraditionalCollaborativeFiltering
>
> If you have some advanced idea, please let me know.

Re: Taste on Hbase?

Posted by Karl Wettin <ka...@gmail.com>.

1 sep 2008 kl. 08.38 skrev Edward J. Yoon:

> AFAIK, Since hadoop doesn't provide file-append function, Current
> Hbase have a problem of data loss when Hbase crashed.

Actually, hdfs handle append since not too long ago. Not much support  
for it yet though.


      karl

Re: Taste on Hbase?

Posted by Sean Owen <sr...@gmail.com>.

Yes, this is a sketch of basic user-based collaborative filtering,
using a cosine-measure correlation as a similarity metric? (I think it
needs to divide out by the size of the two vectors?).

The analog in Mahout would be
org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender,
and org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity

I agree that one could parallelize computation of the user-user
similarity. Indeed I think any scalable recommender is going to have
to do a lot of intense precomputation, via something like Hadoop, and
then relatively little at runtime.

On Mon, Sep 1, 2008 at 7:38 AM, Edward J. Yoon <ed...@apache.org> wrote:
> BTW, We also think about CF for example -
> http://wiki.apache.org/hama/TraditionalCollaborativeFiltering
>
> If you have some advanced idea, please let me know.

Re: Taste on Hbase?

Posted by Sean Owen <sr...@gmail.com>.

Yes, this is a sketch of basic user-based collaborative filtering,
using a cosine-measure correlation as a similarity metric? (I think it
needs to divide out by the size of the two vectors?).

The analog in Mahout would be
org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender,
and org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity

I agree that one could parallelize computation of the user-user
similarity. Indeed I think any scalable recommender is going to have
to do a lot of intense precomputation, via something like Hadoop, and
then relatively little at runtime.

On Mon, Sep 1, 2008 at 7:38 AM, Edward J. Yoon <ed...@apache.org> wrote:
> BTW, We also think about CF for example -
> http://wiki.apache.org/hama/TraditionalCollaborativeFiltering
>
> If you have some advanced idea, please let me know.