You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Mark <st...@gmail.com> on 2012/02/08 21:03:29 UTC

Set like functionality

We would like to maintain a history of all product views by a given 
user. We are currently using a row key like USER_ID_ID/TIMESTAMP. This 
works however we would like to maintain a unique list of these users to 
product views.

So if i have rows like:

mark/1328731167014262  = { data => 'Product 123' }
mark/1328731162502304  = { data => 'Product 456' }
mark/1328731157711375  = { data => 'Product 789' }

And I view Product 789 again I want it to be like:

mark/1328731292355173  = { data => 'Product 789' }
mark/1328731167014262  = { data => 'Product 123' }
mark/1328731162502304  = { data => 'Product 456' }

So it basically replaces the old value. How can this be accomplished?

Thanks

Re: Set like functionality

Posted by Nicolas Spiegelberg <ns...@fb.com>.
A lot of your design depends on your read/write rate & the amount of
duplication in your inserts.  For example, if your read rate is really low
and your write rate is really high with a low dedupe, you could try:

Row = USER_ID
Column Qualifier = PRODUCT_ID
MAX_VERSIONS = 1

Setting the max versions for a CF to 1 basically allows the dedupe kick in
& treat your column qualifier as a set.  Putting the data in the CF
instead of the value feed means that you'll dedupe on read demand instead
of read-modify-write.  That said, RMW works better with high dedupe or a
high read rate because you'd otherwise write unnecessary duplicate values
on flush.  Also, with read-modify-write, consider using bloom filters if
you have a high miss rate.  It's cheaper to do a bloom filter query of a
really large file if the key doesn't exist most of the time.  We used this
to store unique email thread UUIDs for our messaging application.

I'm guessing this might be a little too advanced for your question if your
just getting up and going.  I'm more trying to help you understand that
you should think about how your read/write/re-write/modify data flow is
going to look because HBase has a lot off knobs to optimize for a wide
variety of flow situations.

Nicolas

On 2/10/12 4:45 AM, "weichao" <nn...@gmail.com> wrote:

>Maybe you can build a index-table,   like
>
>rowkey:[USER_ID/ProductID] = { rk => main-table's rowkey}
>
>when view a product, check Index, find the rk, use the rk to get row from
>Main-talbe. delete this row, modify index-talbe's rk.
>
>of cause, use coprocessor to handle this may make it simple...
>
>
>2012/2/9 Mark <st...@gmail.com>
>
>> We would like to maintain a history of all product views by a given
>>user.
>> We are currently using a row key like USER_ID_ID/TIMESTAMP. This works
>> however we would like to maintain a unique list of these users to
>>product
>> views.
>>
>> So if i have rows like:
>>
>> mark/1328731167014262  = { data => 'Product 123' }
>> mark/1328731162502304  = { data => 'Product 456' }
>> mark/1328731157711375  = { data => 'Product 789' }
>>
>> And I view Product 789 again I want it to be like:
>>
>> mark/1328731292355173  = { data => 'Product 789' }
>> mark/1328731167014262  = { data => 'Product 123' }
>> mark/1328731162502304  = { data => 'Product 456' }
>>
>> So it basically replaces the old value. How can this be accomplished?
>>
>> Thanks
>>


Re: Set like functionality

Posted by weichao <nn...@gmail.com>.
Maybe you can build a index-table,   like

rowkey:[USER_ID/ProductID] = { rk => main-table's rowkey}

when view a product, check Index, find the rk, use the rk to get row from
Main-talbe. delete this row, modify index-talbe's rk.

of cause, use coprocessor to handle this may make it simple...


2012/2/9 Mark <st...@gmail.com>

> We would like to maintain a history of all product views by a given user.
> We are currently using a row key like USER_ID_ID/TIMESTAMP. This works
> however we would like to maintain a unique list of these users to product
> views.
>
> So if i have rows like:
>
> mark/1328731167014262  = { data => 'Product 123' }
> mark/1328731162502304  = { data => 'Product 456' }
> mark/1328731157711375  = { data => 'Product 789' }
>
> And I view Product 789 again I want it to be like:
>
> mark/1328731292355173  = { data => 'Product 789' }
> mark/1328731167014262  = { data => 'Product 123' }
> mark/1328731162502304  = { data => 'Product 456' }
>
> So it basically replaces the old value. How can this be accomplished?
>
> Thanks
>