You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Slava Gorelik <sl...@gmail.com> on 2008/09/17 20:49:53 UTC

BatchUpdate

Hi.Few small questions:
1) BatchUpdate.*getTimestamp<http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp()>
*() - If i understand correct, this method should return the timestamp that
row will be committed with.
  But how the BatchUpdate will now the timestamp ? Isn't this timestamp
should be only known after the row is written ?
  Any way, the value returned is always the same and not correct.

2) Delete Cell - i saw in the FAQ that need to add a delete record and
commit it with exactly the same timestamp like the original
   row, but i didn't found any commit method with timestamp.

3) For my update operation i need to check if the row that my application
holds is still contains most recent data and only in this
   case i'll update some cells, to do this i need to lock the row -> check
the timestamp of the particular cell -> update it if
   timestamp is the same that application holds. All those operation, if
they are perform on HTable will be perform by numbers of
   RPC. I think, if it's possible to do those operation directly on
HRegsionServer, will help me to get rid off all extra RPCs. Is
   there some way to work with specific HRegionServer that row is belongs to
it ? If yes - how can i get the HRegionServer for this
   specific row.


Thank You and Best Regards.
Slava.

Re: BatchUpdate

Posted by stack <st...@duboce.net>.
Slava Gorelik wrote:
> Yes, exactly what i'm trying to implement by myself, but in the link i
> didn't found any notification in which version this functionality will be
> implemented.
>   
Slava:  The issue doesn't have a version nor person assigned so won't be 
done till someone takes up the cause.

> P.S What i'm trying to implement is the same, but if i'll work with HTable i
> will consume much more RPC than if i'll do it directly in HRegionServer.
>   

Agreed.  Anything in the related issue, hbase-803, that you might work 
with getting a patch together either for yourself or to apply to hbase?

Thanks,
St.Ack

> On Thu, Sep 18, 2008 at 9:38 PM, Billy Pearson
> <sa...@pearsonwholesale.com>wrote:
>
>   
>> I thank what you are looking for is here
>> HBASE-493
>> https://issues.apache.org/jira/browse/HBASE-493
>>
>> Billy Pearson
>>
>> "Slava Gorelik" <sl...@gmail.com> wrote in message
>> news:fdc46e690809181053l1a14459fv55389f6c564cfd46@mail.gmail.com...
>>
>>  Hi.Thank You for a quick response.
>>     
>>> About question 3, i want to clarify my self:
>>> For example, i have a row that i need to update (latest one), i read the
>>> row, proceed some operations on some cells and now i want to update,
>>> before
>>> i'm going to update i want to check may be another user (application
>>> instance) already changed this specific row and my update will written
>>> over
>>> his changes, that will lead to loose his data. So avoid this i want to
>>> check
>>> i row (specific cells) that i'm going to update has the same timestamp
>>> that
>>> i hold and nobody changed them.
>>>
>>> Best Regards.
>>>
>>>
>>> On Thu, Sep 18, 2008 at 7:50 PM, Jean-Daniel Cryans <jdcryans@apache.org
>>>       
>>>> wrote:
>>>>         
>>>  Slava,
>>>       
>>>> Answers in-line.
>>>>
>>>> J-D
>>>>
>>>> On Wed, Sep 17, 2008 at 2:49 PM, Slava Gorelik <slava.gorelik@gmail.com
>>>>         
>>>>> wrote:
>>>>>           
>>>>> Hi.Few small questions:
>>>>> 1) BatchUpdate.*getTimestamp<
>>>>>
>>>>>           
>>>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp()
>>>> <
>>>>
>>>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp%28%29
>>>>         
>>>>> *() - If i understand correct, this method should return the timestamp
>>>>>           
>>>> that
>>>>         
>>>>> row will be committed with.
>>>>>  But how the BatchUpdate will now the timestamp ? Isn't this timestamp
>>>>> should be only known after the row is written ?
>>>>>  Any way, the value returned is always the same and not correct.
>>>>>           
>>>> If you do not specify a timestamp, the value returned will be
>>>> HConstants.LATEST_TIMESTAMP which is Long.MAX_VALUE. HBase interprets
>>>> this
>>>> as "if BU.timestamp = LATEST_TIMESTAMP, replace it with current
>>>> timestamp".
>>>> The timestamp returned will be different if you created the BatchUpdate
>>>> with
>>>> a specified timestamp, see my answer to your second question.
>>>>
>>>>
>>>>         
>>>>> 2) Delete Cell - i saw in the FAQ that need to add a delete record and
>>>>> commit it with exactly the same timestamp like the original
>>>>>   row, but i didn't found any commit method with timestamp.
>>>>>           
>>>> See the BatchUpdate
>>>> constructor<
>>>>
>>>> http://hadoop.apache.org/hbase/docs/r0.2.1/api/org/apache/hadoop/hbase/io/BatchUpdate.html#BatchUpdate%28java.lang.String,%20long%29
>>>>         
>>>>> that
>>>>>           
>>>> uses a timestamp.
>>>>
>>>>
>>>>         
>>>>> 3) For my update operation i need to check if the row that my >
>>>>>           
>>>> application
>>>>         
>>>>> holds is still contains most recent data and only in this
>>>>>   case i'll update some cells, to do this i need to lock the row -> >
>>>>>           
>>>> check
>>>>         
>>>>> the timestamp of the particular cell -> update it if
>>>>>   timestamp is the same that application holds. All those operation, if
>>>>> they are perform on HTable will be perform by numbers of
>>>>>   RPC. I think, if it's possible to do those operation directly on
>>>>> HRegsionServer, will help me to get rid off all extra RPCs. Is
>>>>>   there some way to work with specific HRegionServer that row is >
>>>>>           
>>>> belongs
>>>> to
>>>>         
>>>>> it ? If yes - how can i get the HRegionServer for this
>>>>>   specific row.
>>>>>           
>>>> It is best to abstract how HBase works in client or this could be a mess.
>>>> For example, you would have to reimplement the finding of a region server
>>>> for a region, with retries. Instead of updating by deleting/inserting,
>>>> you
>>>> should just do a put so it will be inserted with current timestamp and,
>>>> by
>>>> default, HBase retrieves the cell with the latest timestamp for a get or
>>>> a
>>>> scan. How HBase works is very different from your typical RDBMS ;)
>>>>
>>>>
>>>>         
>>>>>
>>>>> Thank You and Best Regards.
>>>>> Slava.
>>>>>
>>>>>           
>>>>         
>>     
>
>   


Re: BatchUpdate

Posted by Slava Gorelik <sl...@gmail.com>.
Yes, exactly what i'm trying to implement by myself, but in the link i
didn't found any notification in which version this functionality will be
implemented.
P.S What i'm trying to implement is the same, but if i'll work with HTable i
will consume much more RPC than if i'll do it directly in HRegionServer.

On Thu, Sep 18, 2008 at 9:38 PM, Billy Pearson
<sa...@pearsonwholesale.com>wrote:

> I thank what you are looking for is here
> HBASE-493
> https://issues.apache.org/jira/browse/HBASE-493
>
> Billy Pearson
>
> "Slava Gorelik" <sl...@gmail.com> wrote in message
> news:fdc46e690809181053l1a14459fv55389f6c564cfd46@mail.gmail.com...
>
>  Hi.Thank You for a quick response.
>> About question 3, i want to clarify my self:
>> For example, i have a row that i need to update (latest one), i read the
>> row, proceed some operations on some cells and now i want to update,
>> before
>> i'm going to update i want to check may be another user (application
>> instance) already changed this specific row and my update will written
>> over
>> his changes, that will lead to loose his data. So avoid this i want to
>> check
>> i row (specific cells) that i'm going to update has the same timestamp
>> that
>> i hold and nobody changed them.
>>
>> Best Regards.
>>
>>
>> On Thu, Sep 18, 2008 at 7:50 PM, Jean-Daniel Cryans <jdcryans@apache.org
>> >wrote:
>>
>>  Slava,
>>>
>>> Answers in-line.
>>>
>>> J-D
>>>
>>> On Wed, Sep 17, 2008 at 2:49 PM, Slava Gorelik <slava.gorelik@gmail.com
>>> >wrote:
>>>
>>> > Hi.Few small questions:
>>> > 1) BatchUpdate.*getTimestamp<
>>> >
>>>
>>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp()
>>> <
>>>
>>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp%28%29
>>> >
>>> > >
>>> > *() - If i understand correct, this method should return the timestamp
>>> that
>>> > row will be committed with.
>>> >  But how the BatchUpdate will now the timestamp ? Isn't this timestamp
>>> > should be only known after the row is written ?
>>> >  Any way, the value returned is always the same and not correct.
>>>
>>>
>>> If you do not specify a timestamp, the value returned will be
>>> HConstants.LATEST_TIMESTAMP which is Long.MAX_VALUE. HBase interprets
>>> this
>>> as "if BU.timestamp = LATEST_TIMESTAMP, replace it with current
>>> timestamp".
>>> The timestamp returned will be different if you created the BatchUpdate
>>> with
>>> a specified timestamp, see my answer to your second question.
>>>
>>>
>>> >
>>> >
>>> > 2) Delete Cell - i saw in the FAQ that need to add a delete record and
>>> > commit it with exactly the same timestamp like the original
>>> >   row, but i didn't found any commit method with timestamp.
>>>
>>>
>>> See the BatchUpdate
>>> constructor<
>>>
>>> http://hadoop.apache.org/hbase/docs/r0.2.1/api/org/apache/hadoop/hbase/io/BatchUpdate.html#BatchUpdate%28java.lang.String,%20long%29
>>> >that
>>> uses a timestamp.
>>>
>>>
>>> >
>>> >
>>> > 3) For my update operation i need to check if the row that my >
>>> application
>>> > holds is still contains most recent data and only in this
>>> >   case i'll update some cells, to do this i need to lock the row -> >
>>> check
>>> > the timestamp of the particular cell -> update it if
>>> >   timestamp is the same that application holds. All those operation, if
>>> > they are perform on HTable will be perform by numbers of
>>> >   RPC. I think, if it's possible to do those operation directly on
>>> > HRegsionServer, will help me to get rid off all extra RPCs. Is
>>> >   there some way to work with specific HRegionServer that row is >
>>> belongs
>>> to
>>> > it ? If yes - how can i get the HRegionServer for this
>>> >   specific row.
>>>
>>>
>>> It is best to abstract how HBase works in client or this could be a mess.
>>> For example, you would have to reimplement the finding of a region server
>>> for a region, with retries. Instead of updating by deleting/inserting,
>>> you
>>> should just do a put so it will be inserted with current timestamp and,
>>> by
>>> default, HBase retrieves the cell with the latest timestamp for a get or
>>> a
>>> scan. How HBase works is very different from your typical RDBMS ;)
>>>
>>>
>>> >
>>> >
>>> >
>>> > Thank You and Best Regards.
>>> > Slava.
>>> >
>>>
>>>
>>
>
>

Re: BatchUpdate

Posted by Billy Pearson <sa...@pearsonwholesale.com>.
I thank what you are looking for is here
HBASE-493
https://issues.apache.org/jira/browse/HBASE-493

Billy Pearson

"Slava Gorelik" <sl...@gmail.com> 
wrote in message 
news:fdc46e690809181053l1a14459fv55389f6c564cfd46@mail.gmail.com...
> Hi.Thank You for a quick response.
> About question 3, i want to clarify my self:
> For example, i have a row that i need to update (latest one), i read the
> row, proceed some operations on some cells and now i want to update, 
> before
> i'm going to update i want to check may be another user (application
> instance) already changed this specific row and my update will written 
> over
> his changes, that will lead to loose his data. So avoid this i want to 
> check
> i row (specific cells) that i'm going to update has the same timestamp 
> that
> i hold and nobody changed them.
>
> Best Regards.
>
>
> On Thu, Sep 18, 2008 at 7:50 PM, Jean-Daniel Cryans 
> <jd...@apache.org>wrote:
>
>> Slava,
>>
>> Answers in-line.
>>
>> J-D
>>
>> On Wed, Sep 17, 2008 at 2:49 PM, Slava Gorelik 
>> <slava.gorelik@gmail.com
>> >wrote:
>>
>> > Hi.Few small questions:
>> > 1) BatchUpdate.*getTimestamp<
>> >
>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp()
>> <
>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp%28%29
>> >
>> > >
>> > *() - If i understand correct, this method should return the timestamp
>> that
>> > row will be committed with.
>> >  But how the BatchUpdate will now the timestamp ? Isn't this timestamp
>> > should be only known after the row is written ?
>> >  Any way, the value returned is always the same and not correct.
>>
>>
>> If you do not specify a timestamp, the value returned will be
>> HConstants.LATEST_TIMESTAMP which is Long.MAX_VALUE. HBase interprets 
>> this
>> as "if BU.timestamp = LATEST_TIMESTAMP, replace it with current 
>> timestamp".
>> The timestamp returned will be different if you created the BatchUpdate
>> with
>> a specified timestamp, see my answer to your second question.
>>
>>
>> >
>> >
>> > 2) Delete Cell - i saw in the FAQ that need to add a delete record and
>> > commit it with exactly the same timestamp like the original
>> >   row, but i didn't found any commit method with timestamp.
>>
>>
>> See the BatchUpdate
>> constructor<
>> http://hadoop.apache.org/hbase/docs/r0.2.1/api/org/apache/hadoop/hbase/io/BatchUpdate.html#BatchUpdate%28java.lang.String,%20long%29
>> >that
>> uses a timestamp.
>>
>>
>> >
>> >
>> > 3) For my update operation i need to check if the row that my 
>> > application
>> > holds is still contains most recent data and only in this
>> >   case i'll update some cells, to do this i need to lock the row -> 
>> > check
>> > the timestamp of the particular cell -> update it if
>> >   timestamp is the same that application holds. All those operation, if
>> > they are perform on HTable will be perform by numbers of
>> >   RPC. I think, if it's possible to do those operation directly on
>> > HRegsionServer, will help me to get rid off all extra RPCs. Is
>> >   there some way to work with specific HRegionServer that row is 
>> > belongs
>> to
>> > it ? If yes - how can i get the HRegionServer for this
>> >   specific row.
>>
>>
>> It is best to abstract how HBase works in client or this could be a mess.
>> For example, you would have to reimplement the finding of a region server
>> for a region, with retries. Instead of updating by deleting/inserting, 
>> you
>> should just do a put so it will be inserted with current timestamp and, 
>> by
>> default, HBase retrieves the cell with the latest timestamp for a get or 
>> a
>> scan. How HBase works is very different from your typical RDBMS ;)
>>
>>
>> >
>> >
>> >
>> > Thank You and Best Regards.
>> > Slava.
>> >
>>
> 



Re: BatchUpdate

Posted by Slava Gorelik <sl...@gmail.com>.
Hi.Thank You for a quick response.
About question 3, i want to clarify my self:
For example, i have a row that i need to update (latest one), i read the
row, proceed some operations on some cells and now i want to update, before
i'm going to update i want to check may be another user (application
instance) already changed this specific row and my update will written over
his changes, that will lead to loose his data. So avoid this i want to check
i row (specific cells) that i'm going to update has the same timestamp that
i hold and nobody changed them.

Best Regards.


On Thu, Sep 18, 2008 at 7:50 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Slava,
>
> Answers in-line.
>
> J-D
>
> On Wed, Sep 17, 2008 at 2:49 PM, Slava Gorelik <slava.gorelik@gmail.com
> >wrote:
>
> > Hi.Few small questions:
> > 1) BatchUpdate.*getTimestamp<
> >
> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp()
> <
> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp%28%29
> >
> > >
> > *() - If i understand correct, this method should return the timestamp
> that
> > row will be committed with.
> >  But how the BatchUpdate will now the timestamp ? Isn't this timestamp
> > should be only known after the row is written ?
> >  Any way, the value returned is always the same and not correct.
>
>
> If you do not specify a timestamp, the value returned will be
> HConstants.LATEST_TIMESTAMP which is Long.MAX_VALUE. HBase interprets this
> as "if BU.timestamp = LATEST_TIMESTAMP, replace it with current timestamp".
> The timestamp returned will be different if you created the BatchUpdate
> with
> a specified timestamp, see my answer to your second question.
>
>
> >
> >
> > 2) Delete Cell - i saw in the FAQ that need to add a delete record and
> > commit it with exactly the same timestamp like the original
> >   row, but i didn't found any commit method with timestamp.
>
>
> See the BatchUpdate
> constructor<
> http://hadoop.apache.org/hbase/docs/r0.2.1/api/org/apache/hadoop/hbase/io/BatchUpdate.html#BatchUpdate%28java.lang.String,%20long%29
> >that
> uses a timestamp.
>
>
> >
> >
> > 3) For my update operation i need to check if the row that my application
> > holds is still contains most recent data and only in this
> >   case i'll update some cells, to do this i need to lock the row -> check
> > the timestamp of the particular cell -> update it if
> >   timestamp is the same that application holds. All those operation, if
> > they are perform on HTable will be perform by numbers of
> >   RPC. I think, if it's possible to do those operation directly on
> > HRegsionServer, will help me to get rid off all extra RPCs. Is
> >   there some way to work with specific HRegionServer that row is belongs
> to
> > it ? If yes - how can i get the HRegionServer for this
> >   specific row.
>
>
> It is best to abstract how HBase works in client or this could be a mess.
> For example, you would have to reimplement the finding of a region server
> for a region, with retries. Instead of updating by deleting/inserting, you
> should just do a put so it will be inserted with current timestamp and, by
> default, HBase retrieves the cell with the latest timestamp for a get or a
> scan. How HBase works is very different from your typical RDBMS ;)
>
>
> >
> >
> >
> > Thank You and Best Regards.
> > Slava.
> >
>

Re: BatchUpdate

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Slava,

Answers in-line.

J-D

On Wed, Sep 17, 2008 at 2:49 PM, Slava Gorelik <sl...@gmail.com>wrote:

> Hi.Few small questions:
> 1) BatchUpdate.*getTimestamp<
> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp()<http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp%28%29>
> >
> *() - If i understand correct, this method should return the timestamp that
> row will be committed with.
>  But how the BatchUpdate will now the timestamp ? Isn't this timestamp
> should be only known after the row is written ?
>  Any way, the value returned is always the same and not correct.


If you do not specify a timestamp, the value returned will be
HConstants.LATEST_TIMESTAMP which is Long.MAX_VALUE. HBase interprets this
as "if BU.timestamp = LATEST_TIMESTAMP, replace it with current timestamp".
The timestamp returned will be different if you created the BatchUpdate with
a specified timestamp, see my answer to your second question.


>
>
> 2) Delete Cell - i saw in the FAQ that need to add a delete record and
> commit it with exactly the same timestamp like the original
>   row, but i didn't found any commit method with timestamp.


See the BatchUpdate
constructor<http://hadoop.apache.org/hbase/docs/r0.2.1/api/org/apache/hadoop/hbase/io/BatchUpdate.html#BatchUpdate%28java.lang.String,%20long%29>that
uses a timestamp.


>
>
> 3) For my update operation i need to check if the row that my application
> holds is still contains most recent data and only in this
>   case i'll update some cells, to do this i need to lock the row -> check
> the timestamp of the particular cell -> update it if
>   timestamp is the same that application holds. All those operation, if
> they are perform on HTable will be perform by numbers of
>   RPC. I think, if it's possible to do those operation directly on
> HRegsionServer, will help me to get rid off all extra RPCs. Is
>   there some way to work with specific HRegionServer that row is belongs to
> it ? If yes - how can i get the HRegionServer for this
>   specific row.


It is best to abstract how HBase works in client or this could be a mess.
For example, you would have to reimplement the finding of a region server
for a region, with retries. Instead of updating by deleting/inserting, you
should just do a put so it will be inserted with current timestamp and, by
default, HBase retrieves the cell with the latest timestamp for a get or a
scan. How HBase works is very different from your typical RDBMS ;)


>
>
>
> Thank You and Best Regards.
> Slava.
>