You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Joe Pallas <jo...@oracle.com> on 2012/08/21 03:18:20 UTC

Thrift2 interface

Anyone out there actively using the thrift2 interface in 0.94?  Thrift bindings for C++ don’t seem to handle optional arguments too well (that is to say, it seems that optional arguments are not optional).  Unfortunately, checkAndPut uses an optional argument for value to distinguish between the two cases (value must match vs no cell with that column qualifier).  Any clues on how to work around that difficulty would be welcome.

Thanks.
joe


Re: Thrift2 interface

Posted by Joe Pallas <jo...@oracle.com>.
Thanks for the info, Karthik (and sorry that I didn’t see it for so long, it got auto-filed).

I think the reasoning behind the native client approach makes sense.  I don’t know how much of the extra hop overhead is network and how much is serialization/deserialization, so for now I have been hoping that co-locating the thrift proxy with the client will give adequate performance.  

Of course, putting knowledge about .META. into the client creates a strong coupling between the client and the server, which means changes that affect .META. may break compatibility.  That is the price to pay for avoiding the extra hop.

The driving force behind our move to thrift2 is checkAndPut/checkAndDelete.  I see that there is checkAndMutate support in the native client thrift file at <https://github.com/facebook/native-cpp-hbase-client/blob/master/hbase/hbase.thrift>, but I honestly don’t understand how that works with the embedded thrift server.  I don’t understand the relationship between that thrift interface file and the thrift interface exported by the embedded server.  Is the native client actually able to use those routines?

On a side note, that file  also describes an API that seems to use prefixes as generalized column families (like a column family + filter).  That looks like it would be really handy.

Thanks.
joe

On Aug 23, 2012, at 9:15 AM, Karthik Ranganathan wrote:

> Hey Joe,
> 
> We have tried a few different things wrt the C++ clients and thrift. Just
> putting out some of out thoughts here.
> 
> First, we used the existing Thrift proxy as a separate tier (Thrift proxy
> tier). The issue there was that we just didn't get enough throughput (for
> various reasons). Indepedently, adoption of HBase from C++ was increasing
> - so we thought it made sense to write a native client.
> 
> So we wrote the native C++ client and embedded the thrift proxy into the
> region server (embedded thrift proxy). Cutting the redirect from the
> client was one gain (as the native client is a smart client), but the real
> advantage came from short-circuiting the flow. In the thrift proxy tier
> case, the Thrift client would talk to the proxy using Thrift
> serialization, proxy would deserialize the Thrift call and re-serialize it
> into the Java client format, then send it to the region server which would
> deserialize the java formatted buffers again. But in the embedded proxy +
> native client, we can short-circuit on the embedded proxy and make a
> function call to the region server which is running in the same JVM (which
> helps cut one round of serialization and deserialization).
> 
> The issues, however, with the thrift based approach are that the Java
> objects (Htable, scan, get, put, etc) are not thrift definitions, so they
> need to be updated as a separate (and often very different) set of api's
> every time there is an enhancement to the Java side of things. The proxy
> tier has to be separately configured/tuned/bug fixed from the region
> server to make sure it is as performant as the region server - as the
> overall system will perform like the slowest component in the stack.
> 
> The ideal solution (IMHO) is to have a C++ client which has a compatible
> protocol with the Java client, so that there are no significant perf
> differences between the two approaches, and there is no separate proxy to
> tune. Just a though of course, might be hard to achieve. Of course we have
> just talked about this :) but with the move to protocol buffers in trunk,
> this should be easier.
> 
> Out of curiosity, why thrift2 - do you specifically need thrift api's to
> region servers? Why not " efficient C/C++ client for HBase"?
> 
> Thanks
> Karthik
> 
> 
> 
> On 8/22/12 4:06 PM, "Joe Pallas" <jo...@oracle.com> wrote:
> 
>> 
>> On Aug 21, 2012, at 9:29 AM, Stack wrote:
>> 
>>> On Mon, Aug 20, 2012 at 6:18 PM, Joe Pallas <jo...@oracle.com>
>>> wrote:
>>>> Anyone out there actively using the thrift2 interface in 0.94?  Thrift
>>>> bindings for C++ don¹t seem to handle optional arguments too well (that
>>>> is to say, it seems that optional arguments are not optional).
>>>> Unfortunately, checkAndPut uses an optional argument for value to
>>>> distinguish between the two cases (value must match vs no cell with
>>>> that column qualifier).  Any clues on how to work around that
>>>> difficulty would be welcome.
>>>> 
>>> 
>>> If you make a patch, we'll commit it Joe.
>> 
>> Well, I think the patch really needs to be in Thrift; the only workaround
>> I can see is to restructure the hbase.thrift interface file to avoid
>> having routines with optional arguments.  It seems a shame to break
>> compatibility with existing clients for that, and I am not sure if there
>> is a way to do it without breaking compatibility.  (On the other hand,
>> we¹re talking about thrift2, so it isn¹t like there are many existing
>> clients.)
>> 
>> The state of Thrift documentation is lamentable.  The original white
>> paper is the most detailed information I can find about compatibility
>> rules.  It has enough information to tell me that Thrift doesn¹t support
>> overloading of routine names within a service, because the names are the
>> identifiers used to identify the routines.  I think that means it isn¹t
>> possible to make a compatible change that would only affect the client
>> side.
>> 
>>> Have you seen this?
>>> https://github.com/facebook/native-cpp-hbase-client  Would it help?
>> 
>> The native client stuff is certainly interesting, but, as near as I can
>> tell, it expects the in-region-server Thrift server, which I would like
>> to give a chance to mature a bit before playing with.  I¹m also puzzled
>> by the hbase.thrift file in that repository.  It seems to be based on the
>> older HBase Thrift interface, but it adds some functions.  I can¹t see
>> how a client could use them, though, since there are no HBase-side
>> patches.
>> 
>> Anyone involved with FB¹s native client efforts care to enlighten me?
>> 
>> joe
>> 
> 


Re: Thrift2 interface

Posted by Karthik Ranganathan <kr...@fb.com>.
Hey Joe,

We have tried a few different things wrt the C++ clients and thrift. Just
putting out some of out thoughts here.

First, we used the existing Thrift proxy as a separate tier (Thrift proxy
tier). The issue there was that we just didn't get enough throughput (for
various reasons). Indepedently, adoption of HBase from C++ was increasing
- so we thought it made sense to write a native client.

So we wrote the native C++ client and embedded the thrift proxy into the
region server (embedded thrift proxy). Cutting the redirect from the
client was one gain (as the native client is a smart client), but the real
advantage came from short-circuiting the flow. In the thrift proxy tier
case, the Thrift client would talk to the proxy using Thrift
serialization, proxy would deserialize the Thrift call and re-serialize it
into the Java client format, then send it to the region server which would
deserialize the java formatted buffers again. But in the embedded proxy +
native client, we can short-circuit on the embedded proxy and make a
function call to the region server which is running in the same JVM (which
helps cut one round of serialization and deserialization).

The issues, however, with the thrift based approach are that the Java
objects (Htable, scan, get, put, etc) are not thrift definitions, so they
need to be updated as a separate (and often very different) set of api's
every time there is an enhancement to the Java side of things. The proxy
tier has to be separately configured/tuned/bug fixed from the region
server to make sure it is as performant as the region server - as the
overall system will perform like the slowest component in the stack.

The ideal solution (IMHO) is to have a C++ client which has a compatible
protocol with the Java client, so that there are no significant perf
differences between the two approaches, and there is no separate proxy to
tune. Just a though of course, might be hard to achieve. Of course we have
just talked about this :) but with the move to protocol buffers in trunk,
this should be easier.

Out of curiosity, why thrift2 - do you specifically need thrift api's to
region servers? Why not " efficient C/C++ client for HBase"?

Thanks
Karthik



On 8/22/12 4:06 PM, "Joe Pallas" <jo...@oracle.com> wrote:

>
>On Aug 21, 2012, at 9:29 AM, Stack wrote:
>
>> On Mon, Aug 20, 2012 at 6:18 PM, Joe Pallas <jo...@oracle.com>
>>wrote:
>>> Anyone out there actively using the thrift2 interface in 0.94?  Thrift
>>>bindings for C++ don¹t seem to handle optional arguments too well (that
>>>is to say, it seems that optional arguments are not optional).
>>>Unfortunately, checkAndPut uses an optional argument for value to
>>>distinguish between the two cases (value must match vs no cell with
>>>that column qualifier).  Any clues on how to work around that
>>>difficulty would be welcome.
>>> 
>> 
>> If you make a patch, we'll commit it Joe.
>
>Well, I think the patch really needs to be in Thrift; the only workaround
>I can see is to restructure the hbase.thrift interface file to avoid
>having routines with optional arguments.  It seems a shame to break
>compatibility with existing clients for that, and I am not sure if there
>is a way to do it without breaking compatibility.  (On the other hand,
>we¹re talking about thrift2, so it isn¹t like there are many existing
>clients.)
>
>The state of Thrift documentation is lamentable.  The original white
>paper is the most detailed information I can find about compatibility
>rules.  It has enough information to tell me that Thrift doesn¹t support
>overloading of routine names within a service, because the names are the
>identifiers used to identify the routines.  I think that means it isn¹t
>possible to make a compatible change that would only affect the client
>side.
> 
>> Have you seen this?
>> https://github.com/facebook/native-cpp-hbase-client  Would it help?
>
>The native client stuff is certainly interesting, but, as near as I can
>tell, it expects the in-region-server Thrift server, which I would like
>to give a chance to mature a bit before playing with.  I¹m also puzzled
>by the hbase.thrift file in that repository.  It seems to be based on the
>older HBase Thrift interface, but it adds some functions.  I can¹t see
>how a client could use them, though, since there are no HBase-side
>patches.
>
>Anyone involved with FB¹s native client efforts care to enlighten me?
>
>joe
>


Re: Thrift2 interface

Posted by Joe Pallas <jo...@oracle.com>.
On Aug 21, 2012, at 9:29 AM, Stack wrote:

> On Mon, Aug 20, 2012 at 6:18 PM, Joe Pallas <jo...@oracle.com> wrote:
>> Anyone out there actively using the thrift2 interface in 0.94?  Thrift bindings for C++ don’t seem to handle optional arguments too well (that is to say, it seems that optional arguments are not optional).  Unfortunately, checkAndPut uses an optional argument for value to distinguish between the two cases (value must match vs no cell with that column qualifier).  Any clues on how to work around that difficulty would be welcome.
>> 
> 
> If you make a patch, we'll commit it Joe.

Well, I think the patch really needs to be in Thrift; the only workaround I can see is to restructure the hbase.thrift interface file to avoid having routines with optional arguments.  It seems a shame to break compatibility with existing clients for that, and I am not sure if there is a way to do it without breaking compatibility.  (On the other hand, we’re talking about thrift2, so it isn’t like there are many existing clients.)

The state of Thrift documentation is lamentable.  The original white paper is the most detailed information I can find about compatibility rules.  It has enough information to tell me that Thrift doesn’t support overloading of routine names within a service, because the names are the identifiers used to identify the routines.  I think that means it isn’t possible to make a compatible change that would only affect the client side.
 
> Have you seen this?
> https://github.com/facebook/native-cpp-hbase-client  Would it help?

The native client stuff is certainly interesting, but, as near as I can tell, it expects the in-region-server Thrift server, which I would like to give a chance to mature a bit before playing with.  I’m also puzzled by the hbase.thrift file in that repository.  It seems to be based on the older HBase Thrift interface, but it adds some functions.  I can’t see how a client could use them, though, since there are no HBase-side patches.

Anyone involved with FB’s native client efforts care to enlighten me?

joe


Re: Thrift2 interface

Posted by Stack <st...@duboce.net>.
On Mon, Aug 20, 2012 at 6:18 PM, Joe Pallas <jo...@oracle.com> wrote:
> Anyone out there actively using the thrift2 interface in 0.94?  Thrift bindings for C++ don’t seem to handle optional arguments too well (that is to say, it seems that optional arguments are not optional).  Unfortunately, checkAndPut uses an optional argument for value to distinguish between the two cases (value must match vs no cell with that column qualifier).  Any clues on how to work around that difficulty would be welcome.
>

If you make a patch, we'll commit it Joe.

Have you seen this?
https://github.com/facebook/native-cpp-hbase-client  Would it help?

St.Ack