You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Zhou <ne...@gmail.com> on 2008/04/15 09:56:42 UTC

Is the latest version of Hbase support multiple updates on same row at the same time?

Hi,

Currently, I'm using the HBase version inside Hadoop 0.16.0 package
I access HBase with a multi-threaded application.
It appears that only one update of a row could be in progress at a time or there
would be an exception.
The document said that it would be fixed in version 0.2.0.
Is there already a version that fixes this?
Either released or still in development version?

Thanks,
Zhou


RE: Is the latest version of Hbase support multiple updates on same row at the same time?

Posted by Jim Kellerman <ji...@powerset.com>.
> -----Original Message-----
> From: news [mailto:news@ger.gmane.org] On Behalf Of Zhou
> Sent: Thursday, April 17, 2008 7:39 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Is the latest version of Hbase support multiple
> updates on same row at the same time?
>
> Jim Kellerman <ji...@...> writes:
>
> >
> > I'm not sure what you mean by server,
> > but any particular row is only served
> > by one HBase server. Multiple clients can submit batch
> updates for the
> > same row and they will all be handled by a single HBase server.
> >
>
> When I say server, I actually mean machine.
> There could be multiple clients running on different machines.
> There could be cases that two clients submit batch updates of
> same row to the same HBase server at the same time.
> Then at the HBase Server,
> the batch updates from one client would execute first.
> The other would wait for all of them to be finished, rather
> than return an exception.
> Is that right?

Correct.

> > > Each of them have one BatchUpdate class of their own. I doubt it
> > > would still cause the "update in progress" exception.
> >
> > In 0.16 (and also in the hbase-0.1.x releases) the client
> API supports
> > only one batch update operation at a time. So if a single
> thread did
> > two startUpdate calls or if multiple threads did a single
> startUpdate
> > call, you will get the "update in progress"
> > exception.
> >
> > This has changed in HBase trunk. A single thread or
> multiple threads
> > can create a separate BatchUpdate object for each row they want to
> > update. When all the changes have been added to the
> BatchUpdate, it is
> > sent to the server by calling
> > HTable.commit(BatchUpdate)
> >
>
> I misunderstood the reason of the  "update in progress"
> exception before.
> I thought it does not allow two startUpdate calls on the same
> row simultaneously.
> In fact, as you has explained,
> it does not allow two  startUpdate calls on any rows simultaneously.

Yes. This is a client side problem with 0.16 and 0.1

> >
> > Not sure I understand the problem. The updates collected in a
> > BatchUpdate are sent via a single RPC call. The row gets
> locked on the
> > server and each update is written to the redo log before it
> is cached.
> > When the cache fills it is flushed to disk. If the server crashes
> > before the cache is flushed, the data can be recovered from
> the redo
> > log.
> >
>
> So at the client side, commit operation returns after RPC
> call to server has returned.
> At the time that commit returns, redo logs has already been
> written to the disk.
> Am I right?

Correct

> If that is true, there is no problem of Durability any more.
>
> > > BatchUpdate would not work at lest for massive size of
> data or high
> > > load.
> >
> > Actually it works pretty well. We have several applications
> that have
> > tens of millions of rows on 10 to 20 servers that are
> storing tens of
> > gigabytes of data currently.
> >
> > One user loaded 1.3 billion rows into HBase as a test.
> >
>
> The misunderstood of how BatchUpdate class works direct me to
> that argument. Glad that I'm wrong.
>
> > > I hope HBase could fix the problem in the near future.
> >
> > It is fixed in hbase trunk which has not yet been released.
> >
> > > Is any version of HBase allows concurrent updates while
> what we need
> > > to do is only type table.commit(id)?
> >
> > There is no released version that supports this. It is only
> in hbase
> > trunk which will be released as hbase-0.2.0 in a few weeks.
> >
> > By the way, you know that HBase is now a subproject of
> Hadoop and now
> > has a separate svn repository? All development of hbase-0.1.x and
> > hbase-trunk happens there and not in the hadoop svn. You
> can find the
> > hbase source at:
> >
> > http://svn.apache.org/repos/asf/hadoop/hbase
> >
>
> I am currently doing research on hosting web applications
> data in non-relational DBMS.
> For web application, concurrent access to data happens a lot!
> I really need the concurrent update feature to host a
> scalable web application on HBase.
> I would be looking forward for the release of hbase-0.2.0 Now
> I will try current trunk version first.
>
> Thanks for the explanation.
> It helps me a lot!

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.1/1385 - Release Date: 4/18/2008 9:30 AM


Re: Is the latest version of Hbase support multiple updates on same row at the same time?

Posted by Zhou <ne...@gmail.com>.
Jim Kellerman <ji...@...> writes:

> 
> I'm not sure what you mean by server, 
> but any particular row is only served
> by one HBase server. Multiple clients can 
> submit batch updates for the
> same row and they will all be handled 
> by a single HBase server.
> 

When I say server, I actually mean machine.
There could be multiple clients running on different machines.
There could be cases that two clients submit batch updates of 
same row to the same HBase server at the same time.
Then at the HBase Server, 
the batch updates from one client would execute first.
The other would wait for all of them to be finished, 
rather than return an exception.
Is that right?

> > Each of them have one BatchUpdate class of their own. I doubt
> > it would still cause the "update in progress" exception.
> 
> In 0.16 (and also in the hbase-0.1.x releases) the client API
> supports only one batch update operation at a time. So if a single
> thread did two startUpdate calls or if multiple threads did a
> single startUpdate call, you will get the "update in progress"
> exception.
> 
> This has changed in HBase trunk. A single thread or multiple
> threads can create a separate BatchUpdate object for each row
> they want to update. When all the changes have been added to
> the BatchUpdate, it is sent to the server by calling
> HTable.commit(BatchUpdate)
> 

I misunderstood the reason of the  "update in progress" 
exception before. 
I thought it does not allow two startUpdate calls 
on the same row simultaneously. 
In fact, as you has explained, 
it does not allow two  startUpdate calls 
on any rows simultaneously.

> 
> Not sure I understand the problem. The updates collected in
> a BatchUpdate are sent via a single RPC call. The row gets
> locked on the server and each update is written to the redo
> log before it is cached. When the cache fills it is flushed
> to disk. If the server crashes before the cache is flushed,
> the data can be recovered from the redo log.
> 

So at the client side, commit operation returns after RPC call to 
server has returned.
At the time that commit returns, redo logs has already been 
written to the disk. 
Am I right?
If that is true, there is no problem of Durability any more.

> > BatchUpdate would not work at lest for massive size of data
> > or high load.
> 
> Actually it works pretty well. We have several applications that
> have tens of millions of rows on 10 to 20 servers that are storing
> tens of gigabytes of data currently.
> 
> One user loaded 1.3 billion rows into HBase as a test.
> 

The misunderstood of how BatchUpdate class works direct me 
to that argument. Glad that I'm wrong.

> > I hope HBase could fix the problem in the near future.
> 
> It is fixed in hbase trunk which has not yet been released.
> 
> > Is any version of HBase allows concurrent updates while what
> > we need to do is only type table.commit(id)?
> 
> There is no released version that supports this. It is only
> in hbase trunk which will be released as hbase-0.2.0 in a
> few weeks.
> 
> By the way, you know that HBase is now a subproject of
> Hadoop and now has a separate svn repository? All development
> of hbase-0.1.x and hbase-trunk happens there and not in
> the hadoop svn. You can find the hbase source at:
> 
> http://svn.apache.org/repos/asf/hadoop/hbase
> 

I am currently doing research on hosting web applications data 
in non-relational DBMS.
For web application, concurrent access to data happens a lot!
I really need the concurrent update feature to host 
a scalable web application on HBase.
I would be looking forward for the release of hbase-0.2.0
Now I will try current trunk version first.

Thanks for the explanation.
It helps me a lot!



RE: Is the latest version of Hbase support multiple updates on same row at the same time?

Posted by Jim Kellerman <ji...@powerset.com>.
> -----Original Message-----
> From: news [mailto:news@ger.gmane.org] On Behalf Of Zhou
> Sent: Thursday, April 17, 2008 7:44 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Is the latest version of Hbase support multiple
> updates on same row at the same time?

<snip>

> I've look at the source code of BatchUpdate class.
> I believe it collects update operations of one specified row.
> And submit to the HRegionServer which locates this row via
> one RPC call.
> Am I right?

Correct.

> So in one server, I can actually cache all updates of a
> specified row to one BatchUpdate object. And it might work
> for one process on one server.
> However, how about multiple processes running concurrently on
> different servers?

I'm not sure what you mean by server, but any particular row is only served
by one HBase server. Multiple clients can submit batch updates for the
same row and they will all be handled by a single HBase server.

> Each of them have one BatchUpdate class of their own. I doubt
> it would still cause the "update in progress" exception.

In 0.16 (and also in the hbase-0.1.x releases) the client API
supports only one batch update operation at a time. So if a single
thread did two startUpdate calls or if multiple threads did a
single startUpdate call, you will get the "update in progress"
exception.

This has changed in HBase trunk. A single thread or multiple
threads can create a separate BatchUpdate object for each row
they want to update. When all the changes have been added to
the BatchUpdate, it is sent to the server by calling
HTable.commit(BatchUpdate)

> Even though I assume it works, since one row one BatchUpdate
> object, if I have millions of rows, I would have to create
> millions of object.
> I don't think it is workable.

BatchUpdate objects are very inexpensive. The largest part of
any batch update are the column values for put operations.

> And how many batch operations should I cached in the
> BatchUpdate object before commit?

As many as you want to, provided they are for the same row.

> What if the updates requires immediate Durability requirement
> (D in ACID)?

Not sure I understand the problem. The updates collected in
a BatchUpdate are sent via a single RPC call. The row gets
locked on the server and each update is written to the redo
log before it is cached. When the cache fills it is flushed
to disk. If the server crashes before the cache is flushed,
the data can be recovered from the redo log.

> I believe It is better to solve the concurrent update problem
> at the server-side.

And that is exactly what happens in HBase trunk. HBase 0.16 and
hbase-0.1.x do not do that as you have discovered.

> BatchUpdate would not work at lest for massive size of data
> or high load.

Actually it works pretty well. We have several applications that
have tens of millions of rows on 10 to 20 servers that are storing
tens of gigabytes of data currently.

One user loaded 1.3 billion rows into HBase as a test.

> I hope HBase could fix the problem in the near future.

It is fixed in hbase trunk which has not yet been released.

> Is any version of HBase allows concurrent updates while what
> we need to do is only type table.commit(id)?

There is no released version that supports this. It is only
in hbase trunk which will be released as hbase-0.2.0 in a
few weeks.

By the way, you know that HBase is now a subproject of
Hadoop and now has a separate svn repository? All development
of hbase-0.1.x and hbase-trunk happens there and not in
the hadoop svn. You can find the hbase source at:

http://svn.apache.org/repos/asf/hadoop/hbase


No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.0/1383 - Release Date: 4/17/2008 9:00 AM


Re: Is the latest version of Hbase support multiple updates on same row at the same time?

Posted by Zhou <ne...@gmail.com>.
Bryan Duxbury <br...@...> writes:

> 
> Yes. Take a look at the BatchUpdate class in TRUNK.
> -Bryan
> 
> On Apr 15, 2008, at 12:56 AM, Zhou wrote:
> 
> > Hi,
> >
> > Currently, I'm using the HBase version inside Hadoop 0.16.0 package
> > I access HBase with a multi-threaded application.
> > It appears that only one update of a row could be in progress at a  
> > time or there
> > would be an exception.
> > The document said that it would be fixed in version 0.2.0.
> > Is there already a version that fixes this?
> > Either released or still in development version?
> >
> > Thanks,
> > Zhou
> >
> 
> 


I've look at the source code of BatchUpdate class.
I believe it collects update operations of one specified row.
And submit to the HRegionServer which locates this row via one RPC call.
Am I right?

So in one server, I can actually cache all updates of a specified row to one
BatchUpdate object. And it might work for one process on one server.
However, how about multiple processes running concurrently on different servers?
Each of them have one BatchUpdate class of their own. I doubt it would still
cause the "update in progress" exception.

Even though I assume it works, since one row one BatchUpdate object,
if I have millions of rows, I would have to create millions of object.
I don't think it is workable.
And how many batch operations should I cached in the BatchUpdate object before
commit?
What if the updates requires immediate Durability requirement (D in ACID)?

I believe It is better to solve the concurrent update problem at the server-side.
BatchUpdate would not work at lest for massive size of data or high load.
I hope HBase could fix the problem in the near future.

Is any version of HBase allows concurrent updates while what we need to do is
only type
table.commit(id)?

Thanks.



Re: Is the latest version of Hbase support multiple updates on same row at the same time?

Posted by Bryan Duxbury <br...@rapleaf.com>.
Yes. Take a look at the BatchUpdate class in TRUNK.
-Bryan

On Apr 15, 2008, at 12:56 AM, Zhou wrote:

> Hi,
>
> Currently, I'm using the HBase version inside Hadoop 0.16.0 package
> I access HBase with a multi-threaded application.
> It appears that only one update of a row could be in progress at a  
> time or there
> would be an exception.
> The document said that it would be fixed in version 0.2.0.
> Is there already a version that fixes this?
> Either released or still in development version?
>
> Thanks,
> Zhou
>