You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Mark <st...@gmail.com> on 2012/02/08 21:03:29 UTC
Set like functionality
We would like to maintain a history of all product views by a given
user. We are currently using a row key like USER_ID_ID/TIMESTAMP. This
works however we would like to maintain a unique list of these users to
product views.
So if i have rows like:
mark/1328731167014262 = { data => 'Product 123' }
mark/1328731162502304 = { data => 'Product 456' }
mark/1328731157711375 = { data => 'Product 789' }
And I view Product 789 again I want it to be like:
mark/1328731292355173 = { data => 'Product 789' }
mark/1328731167014262 = { data => 'Product 123' }
mark/1328731162502304 = { data => 'Product 456' }
So it basically replaces the old value. How can this be accomplished?
Thanks
Re: Set like functionality
Posted by Nicolas Spiegelberg <ns...@fb.com>.
A lot of your design depends on your read/write rate & the amount of
duplication in your inserts. For example, if your read rate is really low
and your write rate is really high with a low dedupe, you could try:
Row = USER_ID
Column Qualifier = PRODUCT_ID
MAX_VERSIONS = 1
Setting the max versions for a CF to 1 basically allows the dedupe kick in
& treat your column qualifier as a set. Putting the data in the CF
instead of the value feed means that you'll dedupe on read demand instead
of read-modify-write. That said, RMW works better with high dedupe or a
high read rate because you'd otherwise write unnecessary duplicate values
on flush. Also, with read-modify-write, consider using bloom filters if
you have a high miss rate. It's cheaper to do a bloom filter query of a
really large file if the key doesn't exist most of the time. We used this
to store unique email thread UUIDs for our messaging application.
I'm guessing this might be a little too advanced for your question if your
just getting up and going. I'm more trying to help you understand that
you should think about how your read/write/re-write/modify data flow is
going to look because HBase has a lot off knobs to optimize for a wide
variety of flow situations.
Nicolas
On 2/10/12 4:45 AM, "weichao" <nn...@gmail.com> wrote:
>Maybe you can build a index-table, like
>
>rowkey:[USER_ID/ProductID] = { rk => main-table's rowkey}
>
>when view a product, check Index, find the rk, use the rk to get row from
>Main-talbe. delete this row, modify index-talbe's rk.
>
>of cause, use coprocessor to handle this may make it simple...
>
>
>2012/2/9 Mark <st...@gmail.com>
>
>> We would like to maintain a history of all product views by a given
>>user.
>> We are currently using a row key like USER_ID_ID/TIMESTAMP. This works
>> however we would like to maintain a unique list of these users to
>>product
>> views.
>>
>> So if i have rows like:
>>
>> mark/1328731167014262 = { data => 'Product 123' }
>> mark/1328731162502304 = { data => 'Product 456' }
>> mark/1328731157711375 = { data => 'Product 789' }
>>
>> And I view Product 789 again I want it to be like:
>>
>> mark/1328731292355173 = { data => 'Product 789' }
>> mark/1328731167014262 = { data => 'Product 123' }
>> mark/1328731162502304 = { data => 'Product 456' }
>>
>> So it basically replaces the old value. How can this be accomplished?
>>
>> Thanks
>>
Re: Set like functionality
Posted by weichao <nn...@gmail.com>.
Maybe you can build a index-table, like
rowkey:[USER_ID/ProductID] = { rk => main-table's rowkey}
when view a product, check Index, find the rk, use the rk to get row from
Main-talbe. delete this row, modify index-talbe's rk.
of cause, use coprocessor to handle this may make it simple...
2012/2/9 Mark <st...@gmail.com>
> We would like to maintain a history of all product views by a given user.
> We are currently using a row key like USER_ID_ID/TIMESTAMP. This works
> however we would like to maintain a unique list of these users to product
> views.
>
> So if i have rows like:
>
> mark/1328731167014262 = { data => 'Product 123' }
> mark/1328731162502304 = { data => 'Product 456' }
> mark/1328731157711375 = { data => 'Product 789' }
>
> And I view Product 789 again I want it to be like:
>
> mark/1328731292355173 = { data => 'Product 789' }
> mark/1328731167014262 = { data => 'Product 123' }
> mark/1328731162502304 = { data => 'Product 456' }
>
> So it basically replaces the old value. How can this be accomplished?
>
> Thanks
>