You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by yun peng <pe...@gmail.com> on 2012/12/06 14:02:59 UTC

Is Put() operation a synchronous call on server side?

Hi, since on client side HBase can immediately send Put() by turning off
setAutoFlush(), I am wondering if Put() in HBase server side is executed in
synchronous way? Be a bit more specific, given a Put() that already arrives
at HRegion, will it wait (or be blocking) until all put-related operations
are done, such as write to WAL and write to memstore, or even flush to disk
(though may not on every time). Or it just triggers put-related operations
and immediately returns....


Besides, in research of this problem, I found it not very easy to find the
code that perform RPC in HBase, for example, how does client-side
HTable.put() invoke the server-side HRegion.put().... Can anyone points to
me the related code path on this regards? Thanks...
regards,
Yun

Re: Is Put() operation a synchronous call on server side?

Posted by daidong <da...@gmail.com>.
2012/12/6 yun peng <pe...@gmail.com>

> Hi, since on client side HBase can immediately send Put() by turning off
> setAutoFlush(), I am wondering if Put() in HBase server side is executed in
> synchronous way? Be a bit more specific, given a Put() that already arrives
> at HRegion, will it wait (or be blocking) until all put-related operations
> are done, such as write to WAL and write to memstore, or even flush to disk
> (though may not on every time). Or it just triggers put-related operations
> and immediately returns....
>
>
> Besides, in research of this problem, I found it not very easy to find the
> code that perform RPC in HBase, for example, how does client-side
> HTable.put() invoke the server-side HRegion.put().... Can anyone points to
> me the related code path on this regards? Thanks...
> regards,
> Yun
>

I think the path should be like this:
put() -> flushCommits() -> processBatch() -> processBatchCallback() ->
submit() -> ExecutorService.submit()

The callback we submit contains a "connect" and "call" function, which
really
does the RPC stuff. See ProtobufUtil.java and its "multi" method. About the
question how
to get the ClientProtocol, you can see HConnectionManager.getProtocol()
method.

Hope it helps, and I also want somebody can give more inform about how
Protobuf and HBaseRPC work together. :)

Re: Is Put() operation a synchronous call on server side?

Posted by Jimmy Xiang <jx...@cloudera.com>.
For a single put, yes, there is some overhead.  There is a jira to
remove the overhead: HBASE-6739, which is still open.

Thanks,
Jimmy

On Thu, Dec 6, 2012 at 10:32 AM, yun peng <pe...@gmail.com> wrote:
> Hi, Dong and Harsh, thanks for your detailed explanations. Based on Dong's
> answers, I summarise the call path a little bit.
> HTable.flushCommits()->HConnection.processBatch()->HConnectionManager#HConnectionImplementation.processBatch()->processBatchCallback()->ExecutorService.submit()
> I think there are some scheduling overheads in ExecutorService.submit().
>
> Dong, I didn't find any code related to protobuf, and my codebase is on
> HBase0.94.2. Maybe I have not use the most up-to-date version. By the way,
> I have an other and somewhat related question, but I will post in a
> seperate thread.
> Regards,
> Yun
>
> On Thu, Dec 6, 2012 at 9:06 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> Hi Yun,
>>
>> Yes, a single Put call is safely synchronous in nature. A Put is
>> placed on the WAL, added to the MemStore, and then returned back as a
>> success to the client if all went well.
>>
>> A Put does not directly go to disk, and gets flushed from the MemStore
>> based on regular flushing patterns or based on manual invocations of
>> flush called on its region or its table.
>>
>> The path to follow is quite simple - A Put goes from a Client (HTable)
>> to a RegionServer (HRegionServer). You've already read the Client
>> areas, so if you read HRegionServer#put(…) method(s), which is the
>> server-end of it, you'll see the Server-RPC end of it.
>>
>> On Thu, Dec 6, 2012 at 6:32 PM, yun peng <pe...@gmail.com> wrote:
>> > Hi, since on client side HBase can immediately send Put() by turning off
>> > setAutoFlush(), I am wondering if Put() in HBase server side is executed
>> in
>> > synchronous way? Be a bit more specific, given a Put() that already
>> arrives
>> > at HRegion, will it wait (or be blocking) until all put-related
>> operations
>> > are done, such as write to WAL and write to memstore, or even flush to
>> disk
>> > (though may not on every time). Or it just triggers put-related
>> operations
>> > and immediately returns....
>> >
>> >
>> > Besides, in research of this problem, I found it not very easy to find
>> the
>> > code that perform RPC in HBase, for example, how does client-side
>> > HTable.put() invoke the server-side HRegion.put().... Can anyone points
>> to
>> > me the related code path on this regards? Thanks...
>> > regards,
>> > Yun
>>
>>
>>
>> --
>> Harsh J
>>

Re: Is Put() operation a synchronous call on server side?

Posted by yun peng <pe...@gmail.com>.
Hi, Dong and Harsh, thanks for your detailed explanations. Based on Dong's
answers, I summarise the call path a little bit.
HTable.flushCommits()->HConnection.processBatch()->HConnectionManager#HConnectionImplementation.processBatch()->processBatchCallback()->ExecutorService.submit()
I think there are some scheduling overheads in ExecutorService.submit().

Dong, I didn't find any code related to protobuf, and my codebase is on
HBase0.94.2. Maybe I have not use the most up-to-date version. By the way,
I have an other and somewhat related question, but I will post in a
seperate thread.
Regards,
Yun

On Thu, Dec 6, 2012 at 9:06 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi Yun,
>
> Yes, a single Put call is safely synchronous in nature. A Put is
> placed on the WAL, added to the MemStore, and then returned back as a
> success to the client if all went well.
>
> A Put does not directly go to disk, and gets flushed from the MemStore
> based on regular flushing patterns or based on manual invocations of
> flush called on its region or its table.
>
> The path to follow is quite simple - A Put goes from a Client (HTable)
> to a RegionServer (HRegionServer). You've already read the Client
> areas, so if you read HRegionServer#put(…) method(s), which is the
> server-end of it, you'll see the Server-RPC end of it.
>
> On Thu, Dec 6, 2012 at 6:32 PM, yun peng <pe...@gmail.com> wrote:
> > Hi, since on client side HBase can immediately send Put() by turning off
> > setAutoFlush(), I am wondering if Put() in HBase server side is executed
> in
> > synchronous way? Be a bit more specific, given a Put() that already
> arrives
> > at HRegion, will it wait (or be blocking) until all put-related
> operations
> > are done, such as write to WAL and write to memstore, or even flush to
> disk
> > (though may not on every time). Or it just triggers put-related
> operations
> > and immediately returns....
> >
> >
> > Besides, in research of this problem, I found it not very easy to find
> the
> > code that perform RPC in HBase, for example, how does client-side
> > HTable.put() invoke the server-side HRegion.put().... Can anyone points
> to
> > me the related code path on this regards? Thanks...
> > regards,
> > Yun
>
>
>
> --
> Harsh J
>

Re: Is Put() operation a synchronous call on server side?

Posted by Harsh J <ha...@cloudera.com>.
Hi Yun,

Yes, a single Put call is safely synchronous in nature. A Put is
placed on the WAL, added to the MemStore, and then returned back as a
success to the client if all went well.

A Put does not directly go to disk, and gets flushed from the MemStore
based on regular flushing patterns or based on manual invocations of
flush called on its region or its table.

The path to follow is quite simple - A Put goes from a Client (HTable)
to a RegionServer (HRegionServer). You've already read the Client
areas, so if you read HRegionServer#put(…) method(s), which is the
server-end of it, you'll see the Server-RPC end of it.

On Thu, Dec 6, 2012 at 6:32 PM, yun peng <pe...@gmail.com> wrote:
> Hi, since on client side HBase can immediately send Put() by turning off
> setAutoFlush(), I am wondering if Put() in HBase server side is executed in
> synchronous way? Be a bit more specific, given a Put() that already arrives
> at HRegion, will it wait (or be blocking) until all put-related operations
> are done, such as write to WAL and write to memstore, or even flush to disk
> (though may not on every time). Or it just triggers put-related operations
> and immediately returns....
>
>
> Besides, in research of this problem, I found it not very easy to find the
> code that perform RPC in HBase, for example, how does client-side
> HTable.put() invoke the server-side HRegion.put().... Can anyone points to
> me the related code path on this regards? Thanks...
> regards,
> Yun



-- 
Harsh J