You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Doug Meil <do...@explorysmedical.com> on 2012/01/11 22:08:28 UTC

rowlock advice for book

Hey dev-list, regarding this...

re:  "Be careful using hbase row locks.  They are (unofficially -- we need
to make it official) deprecated."




... is this the official advice?  Should I update the book with this?




On 1/3/12 4:37 PM, "Stack" <st...@duboce.net> wrote:

>On Tue, Jan 3, 2012 at 12:38 PM, Joe Stein
><ch...@allthingshadoop.com> wrote:
>> when the event happened so if we see something from November 3rd today
>>then
>> we will only keep it for 4 more months (and for events that we see today
>> those stay for 6 months) .  so it sounds like this might be a viable
>>option
>> and when we set the timestamp in our checkAndPut we make the timestamp
>>be
>> the value that represents it as November 3rd, right? cool
>>
>
>This should be fine.
>
>You might want to protect against future dates.
>
>> well what i was thinking is that my client code would know to use the
>> november table and put the data in the november table (it is all just
>> strings) but I am leaning now towards the TTL option (need to futz with
>>it
>> all more though).  One issue/concern with TTL is when all of a sudden we
>> want to keep things for only 4 months or maybe 8 months and then having
>>to
>> re-TTL trillions of rows =8^( (which is nagging thought in the back of
>>my
>> head about ttls, requirements change)....
>
>This schema attribute is kept at the table level, not per row.  You'll
>have to change the table schema which in 0.90.x hbase means offlining
>table (in 0.92 hbase, there is an online schema edit but needs to be
>enabled and can be problematic in the face of splitting.... more on
>this later).
>
>> That makes sense why a narrow long schema works well then, got it (I am
>>use
>> to Cassandra and do lots of wide column range slices on those columns
>>this
>> is like inverting everything up on myself but the row locks and
>>checkAndPut
>> (and co-processors) hit so many of my uses cases (as Cassandra still
>>does
>> also)
>>
>
>Be careful using hbase row locks.  They are (unofficially -- we need
>to make it official) deprecated.  You can lock yourself out of a
>regionserver if all incoming handlers end up waiting on a particular
>row lock to clear.  Check back in this mailing list for other rowlock
>downsides.
>
>You can column range slices in hbase if you use filters (if you need to).
>
>checkAndPut shouldn't care if row is wide or not?
>
>
>> right now I am on 0.90.4 but right now I am going back and forth in
>> changing our hadoop cluster, HBase is the primary driver for that so I
>>am
>> currently wrestling on the decision with upgrading from existing cluster
>> CDH2 to CDH3 or going with MapR ...
>
>Go to CDH3 if you are on CDH2.  Does CDH2 have a working sync?
>(CDH3u3 when it arrives has some nice perf improvements).
>
>> my preference is to run my own version
>> of HBase (like I do with Kafka and Cassandra) I feel I can do this
>>though I
>> am not comfortable with running my own Hadoop build (already overloaded
>> with things).  0.92 is exciting for co-processors too and it is cool
>>system
>> to hack on, maybe I will pull from trunk build and test it out some too.
>>
>
>Don't do hbase trunk.  Do tip of 0.92 branch if you want to hack.
>HBase Trunk is different from 0.92 already and will get even more
>"differenter"; it'll be hard to help you if you are pulling from trunk
>
>St.Ack
>

Re: rowlock advice for book

Posted by Stack <st...@duboce.net>.

On Thu, Jan 12, 2012 at 7:36 AM, Doug Meil
<do...@explorysmedical.com> wrote:
>
> Ok.  At the very least, I'll put in a big warning in the book.
>

https://issues.apache.org/jira/browse/HBASE-2332 is an old issue about
removing them.
St.Ack

Re: rowlock advice for book

Posted by Doug Meil <do...@explorysmedical.com>.

Ok.  At the very least, I'll put in a big warning in the book.





On 1/12/12 1:05 AM, "Stack" <st...@duboce.net> wrote:

>On Wed, Jan 11, 2012 at 3:39 PM, lars hofhansl <lh...@yahoo.com>
>wrote:
>> Shouldn't we deprecate them?
>>
>
>Yes.
>
>But what to replace them with?  I think they haven't been deprecated
>up to this because we've not done the work to put in place an
>alternative.  We could just drop the functionality (after
>deprecating).  I'd be fine w/ that.
>
>St.Ack
>
>> They are only guaranteed to work for the duration of a region server
>>operation.
>> If there are splits/etc the lock would be silently gone.
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>>  From: Jean-Daniel Cryans <jd...@apache.org>
>> To: dev@hbase.apache.org
>> Sent: Wednesday, January 11, 2012 2:36 PM
>> Subject: Re: rowlock advice for book
>>
>> Row locks aren't deprecated officially but it's really that type of
>> feature that you should use at your own risk / you might not enjoy
>> life after that.
>>
>> J-D
>>
>> On Wed, Jan 11, 2012 at 1:08 PM, Doug Meil
>> <do...@explorysmedical.com> wrote:
>>>
>>> Hey dev-list, regarding this...
>>>
>>> re:  "Be careful using hbase row locks.  They are (unofficially -- we
>>>need
>>> to make it official) deprecated."
>>>
>>>
>>>
>>>
>>> ... is this the official advice?  Should I update the book with this?
>>>
>>>
>>>
>>>
>>> On 1/3/12 4:37 PM, "Stack" <st...@duboce.net> wrote:
>>>
>>>>On Tue, Jan 3, 2012 at 12:38 PM, Joe Stein
>>>><ch...@allthingshadoop.com> wrote:
>>>>> when the event happened so if we see something from November 3rd
>>>>>today
>>>>>then
>>>>> we will only keep it for 4 more months (and for events that we see
>>>>>today
>>>>> those stay for 6 months) .  so it sounds like this might be a viable
>>>>>option
>>>>> and when we set the timestamp in our checkAndPut we make the
>>>>>timestamp
>>>>>be
>>>>> the value that represents it as November 3rd, right? cool
>>>>>
>>>>
>>>>This should be fine.
>>>>
>>>>You might want to protect against future dates.
>>>>
>>>>> well what i was thinking is that my client code would know to use the
>>>>> november table and put the data in the november table (it is all just
>>>>> strings) but I am leaning now towards the TTL option (need to futz
>>>>>with
>>>>>it
>>>>> all more though).  One issue/concern with TTL is when all of a
>>>>>sudden we
>>>>> want to keep things for only 4 months or maybe 8 months and then
>>>>>having
>>>>>to
>>>>> re-TTL trillions of rows =8^( (which is nagging thought in the back
>>>>>of
>>>>>my
>>>>> head about ttls, requirements change)....
>>>>
>>>>This schema attribute is kept at the table level, not per row.  You'll
>>>>have to change the table schema which in 0.90.x hbase means offlining
>>>>table (in 0.92 hbase, there is an online schema edit but needs to be
>>>>enabled and can be problematic in the face of splitting.... more on
>>>>this later).
>>>>
>>>>> That makes sense why a narrow long schema works well then, got it (I
>>>>>am
>>>>>use
>>>>> to Cassandra and do lots of wide column range slices on those columns
>>>>>this
>>>>> is like inverting everything up on myself but the row locks and
>>>>>checkAndPut
>>>>> (and co-processors) hit so many of my uses cases (as Cassandra still
>>>>>does
>>>>> also)
>>>>>
>>>>
>>>>Be careful using hbase row locks.  They are (unofficially -- we need
>>>>to make it official) deprecated.  You can lock yourself out of a
>>>>regionserver if all incoming handlers end up waiting on a particular
>>>>row lock to clear.  Check back in this mailing list for other rowlock
>>>>downsides.
>>>>
>>>>You can column range slices in hbase if you use filters (if you need
>>>>to).
>>>>
>>>>checkAndPut shouldn't care if row is wide or not?
>>>>
>>>>
>>>>> right now I am on 0.90.4 but right now I am going back and forth in
>>>>> changing our hadoop cluster, HBase is the primary driver for that so
>>>>>I
>>>>>am
>>>>> currently wrestling on the decision with upgrading from existing
>>>>>cluster
>>>>> CDH2 to CDH3 or going with MapR ...
>>>>
>>>>Go to CDH3 if you are on CDH2.  Does CDH2 have a working sync?
>>>>(CDH3u3 when it arrives has some nice perf improvements).
>>>>
>>>>> my preference is to run my own version
>>>>> of HBase (like I do with Kafka and Cassandra) I feel I can do this
>>>>>though I
>>>>> am not comfortable with running my own Hadoop build (already
>>>>>overloaded
>>>>> with things).  0.92 is exciting for co-processors too and it is cool
>>>>>system
>>>>> to hack on, maybe I will pull from trunk build and test it out some
>>>>>too.
>>>>>
>>>>
>>>>Don't do hbase trunk.  Do tip of 0.92 branch if you want to hack.
>>>>HBase Trunk is different from 0.92 already and will get even more
>>>>"differenter"; it'll be hard to help you if you are pulling from trunk
>>>>
>>>>St.Ack
>>>>
>>>
>>>
>

Re: rowlock advice for book

Posted by Stack <st...@duboce.net>.

On Wed, Jan 11, 2012 at 3:39 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Shouldn't we deprecate them?
>

Yes.

But what to replace them with?  I think they haven't been deprecated
up to this because we've not done the work to put in place an
alternative.  We could just drop the functionality (after
deprecating).  I'd be fine w/ that.

St.Ack

> They are only guaranteed to work for the duration of a region server operation.
> If there are splits/etc the lock would be silently gone.
>
> -- Lars
>
>
>
> ________________________________
>  From: Jean-Daniel Cryans <jd...@apache.org>
> To: dev@hbase.apache.org
> Sent: Wednesday, January 11, 2012 2:36 PM
> Subject: Re: rowlock advice for book
>
> Row locks aren't deprecated officially but it's really that type of
> feature that you should use at your own risk / you might not enjoy
> life after that.
>
> J-D
>
> On Wed, Jan 11, 2012 at 1:08 PM, Doug Meil
> <do...@explorysmedical.com> wrote:
>>
>> Hey dev-list, regarding this...
>>
>> re:  "Be careful using hbase row locks.  They are (unofficially -- we need
>> to make it official) deprecated."
>>
>>
>>
>>
>> ... is this the official advice?  Should I update the book with this?
>>
>>
>>
>>
>> On 1/3/12 4:37 PM, "Stack" <st...@duboce.net> wrote:
>>
>>>On Tue, Jan 3, 2012 at 12:38 PM, Joe Stein
>>><ch...@allthingshadoop.com> wrote:
>>>> when the event happened so if we see something from November 3rd today
>>>>then
>>>> we will only keep it for 4 more months (and for events that we see today
>>>> those stay for 6 months) .  so it sounds like this might be a viable
>>>>option
>>>> and when we set the timestamp in our checkAndPut we make the timestamp
>>>>be
>>>> the value that represents it as November 3rd, right? cool
>>>>
>>>
>>>This should be fine.
>>>
>>>You might want to protect against future dates.
>>>
>>>> well what i was thinking is that my client code would know to use the
>>>> november table and put the data in the november table (it is all just
>>>> strings) but I am leaning now towards the TTL option (need to futz with
>>>>it
>>>> all more though).  One issue/concern with TTL is when all of a sudden we
>>>> want to keep things for only 4 months or maybe 8 months and then having
>>>>to
>>>> re-TTL trillions of rows =8^( (which is nagging thought in the back of
>>>>my
>>>> head about ttls, requirements change)....
>>>
>>>This schema attribute is kept at the table level, not per row.  You'll
>>>have to change the table schema which in 0.90.x hbase means offlining
>>>table (in 0.92 hbase, there is an online schema edit but needs to be
>>>enabled and can be problematic in the face of splitting.... more on
>>>this later).
>>>
>>>> That makes sense why a narrow long schema works well then, got it (I am
>>>>use
>>>> to Cassandra and do lots of wide column range slices on those columns
>>>>this
>>>> is like inverting everything up on myself but the row locks and
>>>>checkAndPut
>>>> (and co-processors) hit so many of my uses cases (as Cassandra still
>>>>does
>>>> also)
>>>>
>>>
>>>Be careful using hbase row locks.  They are (unofficially -- we need
>>>to make it official) deprecated.  You can lock yourself out of a
>>>regionserver if all incoming handlers end up waiting on a particular
>>>row lock to clear.  Check back in this mailing list for other rowlock
>>>downsides.
>>>
>>>You can column range slices in hbase if you use filters (if you need to).
>>>
>>>checkAndPut shouldn't care if row is wide or not?
>>>
>>>
>>>> right now I am on 0.90.4 but right now I am going back and forth in
>>>> changing our hadoop cluster, HBase is the primary driver for that so I
>>>>am
>>>> currently wrestling on the decision with upgrading from existing cluster
>>>> CDH2 to CDH3 or going with MapR ...
>>>
>>>Go to CDH3 if you are on CDH2.  Does CDH2 have a working sync?
>>>(CDH3u3 when it arrives has some nice perf improvements).
>>>
>>>> my preference is to run my own version
>>>> of HBase (like I do with Kafka and Cassandra) I feel I can do this
>>>>though I
>>>> am not comfortable with running my own Hadoop build (already overloaded
>>>> with things).  0.92 is exciting for co-processors too and it is cool
>>>>system
>>>> to hack on, maybe I will pull from trunk build and test it out some too.
>>>>
>>>
>>>Don't do hbase trunk.  Do tip of 0.92 branch if you want to hack.
>>>HBase Trunk is different from 0.92 already and will get even more
>>>"differenter"; it'll be hard to help you if you are pulling from trunk
>>>
>>>St.Ack
>>>
>>
>>

Re: rowlock advice for book

Posted by lars hofhansl <lh...@yahoo.com>.

Shouldn't we deprecate them? 

They are only guaranteed to work for the duration of a region server operation.
If there are splits/etc the lock would be silently gone.

-- Lars



________________________________
 From: Jean-Daniel Cryans <jd...@apache.org>
To: dev@hbase.apache.org 
Sent: Wednesday, January 11, 2012 2:36 PM
Subject: Re: rowlock advice for book
 
Row locks aren't deprecated officially but it's really that type of
feature that you should use at your own risk / you might not enjoy
life after that.

J-D

On Wed, Jan 11, 2012 at 1:08 PM, Doug Meil
<do...@explorysmedical.com> wrote:
>
> Hey dev-list, regarding this...
>
> re:  "Be careful using hbase row locks.  They are (unofficially -- we need
> to make it official) deprecated."
>
>
>
>
> ... is this the official advice?  Should I update the book with this?
>
>
>
>
> On 1/3/12 4:37 PM, "Stack" <st...@duboce.net> wrote:
>
>>On Tue, Jan 3, 2012 at 12:38 PM, Joe Stein
>><ch...@allthingshadoop.com> wrote:
>>> when the event happened so if we see something from November 3rd today
>>>then
>>> we will only keep it for 4 more months (and for events that we see today
>>> those stay for 6 months) .  so it sounds like this might be a viable
>>>option
>>> and when we set the timestamp in our checkAndPut we make the timestamp
>>>be
>>> the value that represents it as November 3rd, right? cool
>>>
>>
>>This should be fine.
>>
>>You might want to protect against future dates.
>>
>>> well what i was thinking is that my client code would know to use the
>>> november table and put the data in the november table (it is all just
>>> strings) but I am leaning now towards the TTL option (need to futz with
>>>it
>>> all more though).  One issue/concern with TTL is when all of a sudden we
>>> want to keep things for only 4 months or maybe 8 months and then having
>>>to
>>> re-TTL trillions of rows =8^( (which is nagging thought in the back of
>>>my
>>> head about ttls, requirements change)....
>>
>>This schema attribute is kept at the table level, not per row.  You'll
>>have to change the table schema which in 0.90.x hbase means offlining
>>table (in 0.92 hbase, there is an online schema edit but needs to be
>>enabled and can be problematic in the face of splitting.... more on
>>this later).
>>
>>> That makes sense why a narrow long schema works well then, got it (I am
>>>use
>>> to Cassandra and do lots of wide column range slices on those columns
>>>this
>>> is like inverting everything up on myself but the row locks and
>>>checkAndPut
>>> (and co-processors) hit so many of my uses cases (as Cassandra still
>>>does
>>> also)
>>>
>>
>>Be careful using hbase row locks.  They are (unofficially -- we need
>>to make it official) deprecated.  You can lock yourself out of a
>>regionserver if all incoming handlers end up waiting on a particular
>>row lock to clear.  Check back in this mailing list for other rowlock
>>downsides.
>>
>>You can column range slices in hbase if you use filters (if you need to).
>>
>>checkAndPut shouldn't care if row is wide or not?
>>
>>
>>> right now I am on 0.90.4 but right now I am going back and forth in
>>> changing our hadoop cluster, HBase is the primary driver for that so I
>>>am
>>> currently wrestling on the decision with upgrading from existing cluster
>>> CDH2 to CDH3 or going with MapR ...
>>
>>Go to CDH3 if you are on CDH2.  Does CDH2 have a working sync?
>>(CDH3u3 when it arrives has some nice perf improvements).
>>
>>> my preference is to run my own version
>>> of HBase (like I do with Kafka and Cassandra) I feel I can do this
>>>though I
>>> am not comfortable with running my own Hadoop build (already overloaded
>>> with things).  0.92 is exciting for co-processors too and it is cool
>>>system
>>> to hack on, maybe I will pull from trunk build and test it out some too.
>>>
>>
>>Don't do hbase trunk.  Do tip of 0.92 branch if you want to hack.
>>HBase Trunk is different from 0.92 already and will get even more
>>"differenter"; it'll be hard to help you if you are pulling from trunk
>>
>>St.Ack
>>
>
>

Re: rowlock advice for book

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Row locks aren't deprecated officially but it's really that type of
feature that you should use at your own risk / you might not enjoy
life after that.

J-D

On Wed, Jan 11, 2012 at 1:08 PM, Doug Meil
<do...@explorysmedical.com> wrote:
>
> Hey dev-list, regarding this...
>
> re:  "Be careful using hbase row locks.  They are (unofficially -- we need
> to make it official) deprecated."
>
>
>
>
> ... is this the official advice?  Should I update the book with this?
>
>
>
>
> On 1/3/12 4:37 PM, "Stack" <st...@duboce.net> wrote:
>
>>On Tue, Jan 3, 2012 at 12:38 PM, Joe Stein
>><ch...@allthingshadoop.com> wrote:
>>> when the event happened so if we see something from November 3rd today
>>>then
>>> we will only keep it for 4 more months (and for events that we see today
>>> those stay for 6 months) .  so it sounds like this might be a viable
>>>option
>>> and when we set the timestamp in our checkAndPut we make the timestamp
>>>be
>>> the value that represents it as November 3rd, right? cool
>>>
>>
>>This should be fine.
>>
>>You might want to protect against future dates.
>>
>>> well what i was thinking is that my client code would know to use the
>>> november table and put the data in the november table (it is all just
>>> strings) but I am leaning now towards the TTL option (need to futz with
>>>it
>>> all more though).  One issue/concern with TTL is when all of a sudden we
>>> want to keep things for only 4 months or maybe 8 months and then having
>>>to
>>> re-TTL trillions of rows =8^( (which is nagging thought in the back of
>>>my
>>> head about ttls, requirements change)....
>>
>>This schema attribute is kept at the table level, not per row.  You'll
>>have to change the table schema which in 0.90.x hbase means offlining
>>table (in 0.92 hbase, there is an online schema edit but needs to be
>>enabled and can be problematic in the face of splitting.... more on
>>this later).
>>
>>> That makes sense why a narrow long schema works well then, got it (I am
>>>use
>>> to Cassandra and do lots of wide column range slices on those columns
>>>this
>>> is like inverting everything up on myself but the row locks and
>>>checkAndPut
>>> (and co-processors) hit so many of my uses cases (as Cassandra still
>>>does
>>> also)
>>>
>>
>>Be careful using hbase row locks.  They are (unofficially -- we need
>>to make it official) deprecated.  You can lock yourself out of a
>>regionserver if all incoming handlers end up waiting on a particular
>>row lock to clear.  Check back in this mailing list for other rowlock
>>downsides.
>>
>>You can column range slices in hbase if you use filters (if you need to).
>>
>>checkAndPut shouldn't care if row is wide or not?
>>
>>
>>> right now I am on 0.90.4 but right now I am going back and forth in
>>> changing our hadoop cluster, HBase is the primary driver for that so I
>>>am
>>> currently wrestling on the decision with upgrading from existing cluster
>>> CDH2 to CDH3 or going with MapR ...
>>
>>Go to CDH3 if you are on CDH2.  Does CDH2 have a working sync?
>>(CDH3u3 when it arrives has some nice perf improvements).
>>
>>> my preference is to run my own version
>>> of HBase (like I do with Kafka and Cassandra) I feel I can do this
>>>though I
>>> am not comfortable with running my own Hadoop build (already overloaded
>>> with things).  0.92 is exciting for co-processors too and it is cool
>>>system
>>> to hack on, maybe I will pull from trunk build and test it out some too.
>>>
>>
>>Don't do hbase trunk.  Do tip of 0.92 branch if you want to hack.
>>HBase Trunk is different from 0.92 already and will get even more
>>"differenter"; it'll be hard to help you if you are pulling from trunk
>>
>>St.Ack
>>
>
>