You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Erik Holstad <er...@gmail.com> on 2009/06/02 19:34:40 UTC

Uses cases for checkAndSave?

Hi!
I'm working on putting checkAndSave back into 0.20 and just want to check
with the people that are using it how they are using it
so that I can make it as good as possible for these users.

Since the API has changed from earlier versions there are some things that
one need to think about.
For now in the new API there are now Updates, just Put and Delete, so for
now I need to know if users used to delete in the old batchUpdate
or just put?

The new return format Result might seem like a good way to send in the data
to be used as "actual", but there is no super easy way to build that
on the client side for now, so would be good to know how you are doing this.
If you do a get, save the result and then use it for the check or if you
just create new structures on the client?

Regards Erik

Re: Uses cases for checkAndSave?

Posted by Erik Holstad <er...@gmail.com>.

@Ryan
I kind of agree with you in general about the interface put I think 3 input
classes where 2 has to match is not going to be very easy to deal with.
Maybe we should have checkAndPut(List<KeyValues> kvs, Put put) where the
list of KeyValues are the expected keys+values.

@Guilherme
I pushed a version of checkAndPut() to HBASE-1304 so you are more than
welcome to have a look at it, test it and come with feedback when you have
time.

Erik

Re: Uses cases for checkAndSave?

Posted by Erik Holstad <er...@gmail.com>.

On Tue, Jun 2, 2009 at 4:51 PM, Guilherme Germoglio <ge...@gmail.com>wrote:

> Hello!
>
> On Tue, Jun 2, 2009 at 3:58 PM, Erik Holstad <er...@gmail.com>
> wrote:
>
> > Hi!
> >
> > On Tue, Jun 2, 2009 at 11:17 AM, Guilherme Germoglio <
> germoglio@gmail.com
> > >wrote:
> >
> > > Hi Erik,
> > >
> > > For now, I'm using checkAndSave in order to make sure that a row is
> only
> > > created but not overwritten by multiple threads. So, checkAndSave is
> > mostly
> > > invoked with a new structure created on the client. Actually, I'm
> > checking
> > > if a specific "deleted" column in empty. If the "deleted" column is not
> > > empty, then the row creation cannot be performed. There are another few
> > > tricky cases I'm using it, but I'm sure that making that Result object
> > more
> > > difficult to create than putting values on a map would be bad for me.
> :-)
> >
> > So you have a row with family and qualifier that you check to see if it
> is
> > empty
> > and if it is you insert a new row? So basically you use it as an atomic
> > rowExist
> > checker or? Are you usually batching this checks or would it be ok with
> > something like:
> >
> > public boolean checkAndPut(byte[] row, byte[] family, byte[] qualifier,
> > byte[] value, Put put){}
> > or
> > public boolean checkAndPut(KeyValue checkKv, Put put){}
> > for now?
> >
>
> Yes. It is ok for me to use the methods above for now.


Sweet, will make a version today, so you can test it out and maybe after
that we can work
on it together to make things work for you.

>
>
> Just in case you are curious on how I'll be using them, there are two cases
> where I'm using checkAndSave:
>
> The first is like the atomic rowExist checker and it represents 90% of the
> use of checkAndSave. Exactly as you said, I've got a column
> attributes:deleted for every row. When creating a new row, the creation
> only
> happens if this column is empty. When the row creation happens, it is
> assigned a 'false' value to this column. When this column receives a 'true'
> value, that is, the row is to be deleted, the 'hard' removal (a HTable's
> Delete) of the row will be performed asynchronously. Until the 'hard'
> removal happens, a software layer that uses HTable will prevent the use of
> any 'soft' deleted row by checking the attributes:deleted column.
>
> The second case of using checkAndSave is to trigger some actions when a
> specific column is updated. So, I don't check for emptiness, but if a
> previous value continues the same when I'm updating the row. For example,
> let's say I have a users table where I will serialize a User object and put
> it into a row. Among other things, the User object contains an e-mail
> attribute and its change must trigger verification actions, changes on
> other
> tables, whatever. I realized that performing a get for every User update
> just to check whether their e-mail changed or not might not be the better
> approach, since changing e-mail is not a very common operation. So, I
> thought it is better to checkAndSave an user expecting their current e-mail
> value will be the same the one already in the table since this will occur
> many many times more than the opposite. However, if it is the case that the
> current e-mail value is different from the one in the table, triggers are
> fired and then a new update is performed.
>
>
>
> >
> > >
> > > However, here's an idea. What if Put and Delete objects have a field
> > > "condition" (maybe, "onlyIf" would be a better name) which is exactly
> the
> > > map with columns and expected values. So, a given Put or Delete of an
> > > updates list will only happen if those expected values match.
> > >
> >
> > Puts and deletes are pretty much just List<KeyValue> which is basically a
> > List<byte[]>.
> > I don't think that we want to add complexity for puts and deletes now
> that
> > we have worked
> > so hard to make it faster and more bare bone.
> >
>
> no problem. (sorry!)
>
You don't have to be sorry, just happy that we are going to have a faster
HBase soon :)


>
>
> >
> >
> > > Also, maybe it should be possible to indicate common expected values
> for
> > > all
> > > updates of a list too, so a client won't have to put in all updates the
> > > same
> > > values if needed. But we must remember to solve the conflicts of
> expected
> > > values.
> > >
> > Not really sure if you mean that we would check the value of a key before
> > inserting the new
> > value? That would mean that you would have to do a get for every
> put/delete
> > which is not
> > something we want in the general case.
> >
> >
> > >
> > > (By the way, I haven't seen the guts of new Puts and Deletes, so I
> don't
> > > know how difficult would it be to implement it -- but I can help, if
> > > necessary)
> > >
> > > Thanks,
> > >
> > > On Tue, Jun 2, 2009 at 2:34 PM, Erik Holstad <er...@gmail.com>
> > > wrote:
> > >
> > > > Hi!
> > > > I'm working on putting checkAndSave back into 0.20 and just want to
> > check
> > > > with the people that are using it how they are using it
> > > > so that I can make it as good as possible for these users.
> > > >
> > > > Since the API has changed from earlier versions there are some things
> > > that
> > > > one need to think about.
> > > > For now in the new API there are now Updates, just Put and Delete, so
> > for
> > > > now I need to know if users used to delete in the old batchUpdate
> > > > or just put?
> > > >
> > > > The new return format Result might seem like a good way to send in
> the
> > > data
> > > > to be used as "actual", but there is no super easy way to build that
> > > > on the client side for now, so would be good to know how you are
> doing
> > > > this.
> > > > If you do a get, save the result and then use it for the check or if
> > you
> > > > just create new structures on the client?
> > > >
> > > > Regards Erik
> > > >
> > >
> > >
> > >
> > > --
> > > Guilherme
> > >
> > > msn: guigermoglio@hotmail.com
> > > homepage: http://germoglio.googlepages.com
> > >
> >
> > Regards Erik
> >
>
>
>
> --
> Guilherme
>
> msn: guigermoglio@hotmail.com
> homepage: http://germoglio.googlepages.com
>

Re: Uses cases for checkAndSave?

Posted by Ryan Rawson <ry...@gmail.com>.

The way I think about checkAndSave might work like this:

Takes a Get() object to specify which row and column to affect
Takes a Result object to verify said data.  This should match the Get()
Takes a Put or maybe Delete to apply if the previous two worked.

-ryan

On Tue, Jun 2, 2009 at 4:51 PM, Guilherme Germoglio <ge...@gmail.com>wrote:

> Hello!
>
> On Tue, Jun 2, 2009 at 3:58 PM, Erik Holstad <er...@gmail.com>
> wrote:
>
> > Hi!
> >
> > On Tue, Jun 2, 2009 at 11:17 AM, Guilherme Germoglio <
> germoglio@gmail.com
> > >wrote:
> >
> > > Hi Erik,
> > >
> > > For now, I'm using checkAndSave in order to make sure that a row is
> only
> > > created but not overwritten by multiple threads. So, checkAndSave is
> > mostly
> > > invoked with a new structure created on the client. Actually, I'm
> > checking
> > > if a specific "deleted" column in empty. If the "deleted" column is not
> > > empty, then the row creation cannot be performed. There are another few
> > > tricky cases I'm using it, but I'm sure that making that Result object
> > more
> > > difficult to create than putting values on a map would be bad for me.
> :-)
> >
> > So you have a row with family and qualifier that you check to see if it
> is
> > empty
> > and if it is you insert a new row? So basically you use it as an atomic
> > rowExist
> > checker or? Are you usually batching this checks or would it be ok with
> > something like:
> >
> > public boolean checkAndPut(byte[] row, byte[] family, byte[] qualifier,
> > byte[] value, Put put){}
> > or
> > public boolean checkAndPut(KeyValue checkKv, Put put){}
> > for now?
> >
>
> Yes. It is ok for me to use the methods above for now.
>
> Just in case you are curious on how I'll be using them, there are two cases
> where I'm using checkAndSave:
>
> The first is like the atomic rowExist checker and it represents 90% of the
> use of checkAndSave. Exactly as you said, I've got a column
> attributes:deleted for every row. When creating a new row, the creation
> only
> happens if this column is empty. When the row creation happens, it is
> assigned a 'false' value to this column. When this column receives a 'true'
> value, that is, the row is to be deleted, the 'hard' removal (a HTable's
> Delete) of the row will be performed asynchronously. Until the 'hard'
> removal happens, a software layer that uses HTable will prevent the use of
> any 'soft' deleted row by checking the attributes:deleted column.
>
> The second case of using checkAndSave is to trigger some actions when a
> specific column is updated. So, I don't check for emptiness, but if a
> previous value continues the same when I'm updating the row. For example,
> let's say I have a users table where I will serialize a User object and put
> it into a row. Among other things, the User object contains an e-mail
> attribute and its change must trigger verification actions, changes on
> other
> tables, whatever. I realized that performing a get for every User update
> just to check whether their e-mail changed or not might not be the better
> approach, since changing e-mail is not a very common operation. So, I
> thought it is better to checkAndSave an user expecting their current e-mail
> value will be the same the one already in the table since this will occur
> many many times more than the opposite. However, if it is the case that the
> current e-mail value is different from the one in the table, triggers are
> fired and then a new update is performed.
>
>
>
> >
> > >
> > > However, here's an idea. What if Put and Delete objects have a field
> > > "condition" (maybe, "onlyIf" would be a better name) which is exactly
> the
> > > map with columns and expected values. So, a given Put or Delete of an
> > > updates list will only happen if those expected values match.
> > >
> >
> > Puts and deletes are pretty much just List<KeyValue> which is basically a
> > List<byte[]>.
> > I don't think that we want to add complexity for puts and deletes now
> that
> > we have worked
> > so hard to make it faster and more bare bone.
> >
>
> no problem. (sorry!)
>
>
> >
> >
> > > Also, maybe it should be possible to indicate common expected values
> for
> > > all
> > > updates of a list too, so a client won't have to put in all updates the
> > > same
> > > values if needed. But we must remember to solve the conflicts of
> expected
> > > values.
> > >
> > Not really sure if you mean that we would check the value of a key before
> > inserting the new
> > value? That would mean that you would have to do a get for every
> put/delete
> > which is not
> > something we want in the general case.
> >
> >
> > >
> > > (By the way, I haven't seen the guts of new Puts and Deletes, so I
> don't
> > > know how difficult would it be to implement it -- but I can help, if
> > > necessary)
> > >
> > > Thanks,
> > >
> > > On Tue, Jun 2, 2009 at 2:34 PM, Erik Holstad <er...@gmail.com>
> > > wrote:
> > >
> > > > Hi!
> > > > I'm working on putting checkAndSave back into 0.20 and just want to
> > check
> > > > with the people that are using it how they are using it
> > > > so that I can make it as good as possible for these users.
> > > >
> > > > Since the API has changed from earlier versions there are some things
> > > that
> > > > one need to think about.
> > > > For now in the new API there are now Updates, just Put and Delete, so
> > for
> > > > now I need to know if users used to delete in the old batchUpdate
> > > > or just put?
> > > >
> > > > The new return format Result might seem like a good way to send in
> the
> > > data
> > > > to be used as "actual", but there is no super easy way to build that
> > > > on the client side for now, so would be good to know how you are
> doing
> > > > this.
> > > > If you do a get, save the result and then use it for the check or if
> > you
> > > > just create new structures on the client?
> > > >
> > > > Regards Erik
> > > >
> > >
> > >
> > >
> > > --
> > > Guilherme
> > >
> > > msn: guigermoglio@hotmail.com
> > > homepage: http://germoglio.googlepages.com
> > >
> >
> > Regards Erik
> >
>
>
>
> --
> Guilherme
>
> msn: guigermoglio@hotmail.com
> homepage: http://germoglio.googlepages.com
>

Re: Uses cases for checkAndSave?

Posted by Guilherme Germoglio <ge...@gmail.com>.

Hello!

On Tue, Jun 2, 2009 at 3:58 PM, Erik Holstad <er...@gmail.com> wrote:

> Hi!
>
> On Tue, Jun 2, 2009 at 11:17 AM, Guilherme Germoglio <germoglio@gmail.com
> >wrote:
>
> > Hi Erik,
> >
> > For now, I'm using checkAndSave in order to make sure that a row is only
> > created but not overwritten by multiple threads. So, checkAndSave is
> mostly
> > invoked with a new structure created on the client. Actually, I'm
> checking
> > if a specific "deleted" column in empty. If the "deleted" column is not
> > empty, then the row creation cannot be performed. There are another few
> > tricky cases I'm using it, but I'm sure that making that Result object
> more
> > difficult to create than putting values on a map would be bad for me. :-)
>
> So you have a row with family and qualifier that you check to see if it is
> empty
> and if it is you insert a new row? So basically you use it as an atomic
> rowExist
> checker or? Are you usually batching this checks or would it be ok with
> something like:
>
> public boolean checkAndPut(byte[] row, byte[] family, byte[] qualifier,
> byte[] value, Put put){}
> or
> public boolean checkAndPut(KeyValue checkKv, Put put){}
> for now?
>

Yes. It is ok for me to use the methods above for now.

Just in case you are curious on how I'll be using them, there are two cases
where I'm using checkAndSave:

The first is like the atomic rowExist checker and it represents 90% of the
use of checkAndSave. Exactly as you said, I've got a column
attributes:deleted for every row. When creating a new row, the creation only
happens if this column is empty. When the row creation happens, it is
assigned a 'false' value to this column. When this column receives a 'true'
value, that is, the row is to be deleted, the 'hard' removal (a HTable's
Delete) of the row will be performed asynchronously. Until the 'hard'
removal happens, a software layer that uses HTable will prevent the use of
any 'soft' deleted row by checking the attributes:deleted column.

The second case of using checkAndSave is to trigger some actions when a
specific column is updated. So, I don't check for emptiness, but if a
previous value continues the same when I'm updating the row. For example,
let's say I have a users table where I will serialize a User object and put
it into a row. Among other things, the User object contains an e-mail
attribute and its change must trigger verification actions, changes on other
tables, whatever. I realized that performing a get for every User update
just to check whether their e-mail changed or not might not be the better
approach, since changing e-mail is not a very common operation. So, I
thought it is better to checkAndSave an user expecting their current e-mail
value will be the same the one already in the table since this will occur
many many times more than the opposite. However, if it is the case that the
current e-mail value is different from the one in the table, triggers are
fired and then a new update is performed.

>
> >
> > However, here's an idea. What if Put and Delete objects have a field
> > "condition" (maybe, "onlyIf" would be a better name) which is exactly the
> > map with columns and expected values. So, a given Put or Delete of an
> > updates list will only happen if those expected values match.
> >
>
> Puts and deletes are pretty much just List<KeyValue> which is basically a
> List<byte[]>.
> I don't think that we want to add complexity for puts and deletes now that
> we have worked
> so hard to make it faster and more bare bone.
>

no problem. (sorry!)

>
>
> > Also, maybe it should be possible to indicate common expected values for
> > all
> > updates of a list too, so a client won't have to put in all updates the
> > same
> > values if needed. But we must remember to solve the conflicts of expected
> > values.
> >
> Not really sure if you mean that we would check the value of a key before
> inserting the new
> value? That would mean that you would have to do a get for every put/delete
> which is not
> something we want in the general case.
>
>
> >
> > (By the way, I haven't seen the guts of new Puts and Deletes, so I don't
> > know how difficult would it be to implement it -- but I can help, if
> > necessary)
> >
> > Thanks,
> >
> > On Tue, Jun 2, 2009 at 2:34 PM, Erik Holstad <er...@gmail.com>
> > wrote:
> >
> > > Hi!
> > > I'm working on putting checkAndSave back into 0.20 and just want to
> check
> > > with the people that are using it how they are using it
> > > so that I can make it as good as possible for these users.
> > >
> > > Since the API has changed from earlier versions there are some things
> > that
> > > one need to think about.
> > > For now in the new API there are now Updates, just Put and Delete, so
> for
> > > now I need to know if users used to delete in the old batchUpdate
> > > or just put?
> > >
> > > The new return format Result might seem like a good way to send in the
> > data
> > > to be used as "actual", but there is no super easy way to build that
> > > on the client side for now, so would be good to know how you are doing
> > > this.
> > > If you do a get, save the result and then use it for the check or if
> you
> > > just create new structures on the client?
> > >
> > > Regards Erik
> > >
> >
> >
> >
> > --
> > Guilherme
> >
> > msn: guigermoglio@hotmail.com
> > homepage: http://germoglio.googlepages.com
> >
>
> Regards Erik
>

-- 
Guilherme

msn: guigermoglio@hotmail.com
homepage: http://germoglio.googlepages.com

Re: Uses cases for checkAndSave?

Posted by Erik Holstad <er...@gmail.com>.

Hi!

On Tue, Jun 2, 2009 at 11:17 AM, Guilherme Germoglio <ge...@gmail.com>wrote:

> Hi Erik,
>
> For now, I'm using checkAndSave in order to make sure that a row is only
> created but not overwritten by multiple threads. So, checkAndSave is mostly
> invoked with a new structure created on the client. Actually, I'm checking
> if a specific "deleted" column in empty. If the "deleted" column is not
> empty, then the row creation cannot be performed. There are another few
> tricky cases I'm using it, but I'm sure that making that Result object more
> difficult to create than putting values on a map would be bad for me. :-)

So you have a row with family and qualifier that you check to see if it is
empty
and if it is you insert a new row? So basically you use it as an atomic
rowExist
checker or? Are you usually batching this checks or would it be ok with
something like:

public boolean checkAndPut(byte[] row, byte[] family, byte[] qualifier,
byte[] value, Put put){}
or
public boolean checkAndPut(KeyValue checkKv, Put put){}
for now?


>
> However, here's an idea. What if Put and Delete objects have a field
> "condition" (maybe, "onlyIf" would be a better name) which is exactly the
> map with columns and expected values. So, a given Put or Delete of an
> updates list will only happen if those expected values match.
>

Puts and deletes are pretty much just List<KeyValue> which is basically a
List<byte[]>.
I don't think that we want to add complexity for puts and deletes now that
we have worked
so hard to make it faster and more bare bone.


> Also, maybe it should be possible to indicate common expected values for
> all
> updates of a list too, so a client won't have to put in all updates the
> same
> values if needed. But we must remember to solve the conflicts of expected
> values.
>
Not really sure if you mean that we would check the value of a key before
inserting the new
value? That would mean that you would have to do a get for every put/delete
which is not
something we want in the general case.


>
> (By the way, I haven't seen the guts of new Puts and Deletes, so I don't
> know how difficult would it be to implement it -- but I can help, if
> necessary)
>
> Thanks,
>
> On Tue, Jun 2, 2009 at 2:34 PM, Erik Holstad <er...@gmail.com>
> wrote:
>
> > Hi!
> > I'm working on putting checkAndSave back into 0.20 and just want to check
> > with the people that are using it how they are using it
> > so that I can make it as good as possible for these users.
> >
> > Since the API has changed from earlier versions there are some things
> that
> > one need to think about.
> > For now in the new API there are now Updates, just Put and Delete, so for
> > now I need to know if users used to delete in the old batchUpdate
> > or just put?
> >
> > The new return format Result might seem like a good way to send in the
> data
> > to be used as "actual", but there is no super easy way to build that
> > on the client side for now, so would be good to know how you are doing
> > this.
> > If you do a get, save the result and then use it for the check or if you
> > just create new structures on the client?
> >
> > Regards Erik
> >
>
>
>
> --
> Guilherme
>
> msn: guigermoglio@hotmail.com
> homepage: http://germoglio.googlepages.com
>

Regards Erik

Re: Uses cases for checkAndSave?

Posted by Guilherme Germoglio <ge...@gmail.com>.

Hi Erik,

For now, I'm using checkAndSave in order to make sure that a row is only
created but not overwritten by multiple threads. So, checkAndSave is mostly
invoked with a new structure created on the client. Actually, I'm checking
if a specific "deleted" column in empty. If the "deleted" column is not
empty, then the row creation cannot be performed. There are another few
tricky cases I'm using it, but I'm sure that making that Result object more
difficult to create than putting values on a map would be bad for me. :-)

However, here's an idea. What if Put and Delete objects have a field
"condition" (maybe, "onlyIf" would be a better name) which is exactly the
map with columns and expected values. So, a given Put or Delete of an
updates list will only happen if those expected values match.

Also, maybe it should be possible to indicate common expected values for all
updates of a list too, so a client won't have to put in all updates the same
values if needed. But we must remember to solve the conflicts of expected
values.

(By the way, I haven't seen the guts of new Puts and Deletes, so I don't
know how difficult would it be to implement it -- but I can help, if
necessary)

Thanks,

On Tue, Jun 2, 2009 at 2:34 PM, Erik Holstad <er...@gmail.com> wrote:

> Hi!
> I'm working on putting checkAndSave back into 0.20 and just want to check
> with the people that are using it how they are using it
> so that I can make it as good as possible for these users.
>
> Since the API has changed from earlier versions there are some things that
> one need to think about.
> For now in the new API there are now Updates, just Put and Delete, so for
> now I need to know if users used to delete in the old batchUpdate
> or just put?
>
> The new return format Result might seem like a good way to send in the data
> to be used as "actual", but there is no super easy way to build that
> on the client side for now, so would be good to know how you are doing
> this.
> If you do a get, save the result and then use it for the check or if you
> just create new structures on the client?
>
> Regards Erik
>

-- 
Guilherme

msn: guigermoglio@hotmail.com
homepage: http://germoglio.googlepages.com