You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Mayan Moudgill <ma...@bestweb.net> on 2010/03/15 17:25:10 UTC

Server performance and thrift

I'd appreciate some feedback from users about where they want 
improvements in the Thrift model. This may help me focus where I want to 
go with cthrift.

I've been thinking about where to go with cthrift. Clearly, at some 
point soon, cthrift will be able to generate fairly efficient 
client-side code. Thrift2c will allow use of Thrift IDL, or perhaps 
assist in the migration to C based stub-specification.

On the server side, I _think_ that the current approach of splitting a 
RPC call into 3 actions:
1. recieve the message
2. call the RPC specified in the message
3. transmit the RPC response, if any
combined with the ability to add queues between these actions should 
allow for the implementation of almost any kind of server models.

Clearly, one possible future direction is to implement multiple 
skeletons for different kinds of servers, so that it becomes more 
cut-and-paste to implement a server. The target would be to implement a 
epoll OR select based multi-socket kind of server, possibly with the 
receive and transmit being implemented as co-routines (or threads) that 
switch if blocked on a part-message.

Another possible direction is to have cthrift also generate the code to 
export the stubs to Python/Java/etc. as built-in modules/JNI/etc. The 
flip side, of course, would be to embed Python instead of extending it. 
Would anyone have a preference?

Hmmm... if followed through to its ultimate conclusion, we'll have the 
ability to go from Thrift IDL to a particular language interface, either 
in Thrift or in cthrift. The difference will be that in Thrift the 
generated code will be native (i.e. Python for --gen python etc.) whilst 
in cthrift the generated code will be in C with enough glue to allow 
calls from the desired upper layer language.

Another difference, of course, will be that in the case of cthrift, the 
transport/protocol choices will be more hardwired than in Thrift.

Another question: what be the second most important transport? Is it 
TFileTransport?



Re: Server performance and thrift

Posted by ma...@bestweb.net.

> On Mon, Mar 15, 2010 at 11:27 AM, <ma...@bestweb.net> wrote:

<ideas to be implemented in cthrift>

>>
> All very interesting ideas. I'm left somewhat confused, though, why these
> things are happening in a satellite project instead of as normal patches
> on
> the Thrift JIRA?
>
> -Todd
>

I have a question - do you think that these can be implemented in the
current Thrift implementation? I suspect that even though it may be
possible in a technical sense, it might be philosophically incompatible.

Mayan


Re: Server performance and thrift

Posted by ma...@bestweb.net.

> On Mon, Mar 15, 2010 at 11:27 AM, <ma...@bestweb.net> wrote:

<ideas to be implemented in cthrift>

>>
> All very interesting ideas. I'm left somewhat confused, though, why these
> things are happening in a satellite project instead of as normal patches
> on
> the Thrift JIRA?
>
> -Todd
>

I have a question - do you think that these can be implemented in the
current Thrift implementation? I suspect that even though it may be
possible in a technical sense, it might be philosophically incompatible.

Mayan


Re: Server performance and thrift

Posted by Todd Lipcon <to...@cloudera.com>.
On Mon, Mar 15, 2010 at 11:27 AM, <ma...@bestweb.net> wrote:

>
> I quite understand that the overhead of network latency may be the most
> important thing performance-wise. However, is there any proof of this?
> Will this also hold when we go to 10G interfaces with TCP offload?
>
>
Yep, this is a very interesting trend to watch for the coming few years. I
think you're right that Thrift and other RPC protocols will adapt a bit when
10G becomes commonplace in the DC.


> Wire version compatibility is an absolute requirement.
>
> Complexity - I'm not sure that cthrift will (eventually, specially when
> combined with thrift2c) make the higher layers any more complex; in fact,
> on the client side, I am 90% certain that we can keep the same interfaces.
>
>  Currently, cthrift tries to do the following:
> - minimize syscalls using write/writev
> - avoid blocking by using buffered reads and O_NONBLOCK sockets
> There isn't a lot more one can do on the client end that I can think of,
> unless you have multiple identical clients on the same box talking to the
> same server. Then you have the opportunity for batching etc., but that
> gets ... complicated ... very quickly.
>
> On the server side, the batching may be more natural, but admittedly I
> haven't given a lot of thought to that.
>
> Co-routine switch/thread-switch on partial reads on the server may be
> quite interesting to get performance (when combined with epoll/select or
> similar mechanisms).
>
>
All very interesting ideas. I'm left somewhat confused, though, why these
things are happening in a satellite project instead of as normal patches on
the Thrift JIRA?

-Todd


>
>
> > Hi Mayan,
> >
> > I haven't had time to look at your cthrift stuff yet, but wanted to give
> > my
> > quick input:
> >
> > For the vast majority of use cases where I've seen Thrift in use, the
> > overhead of Thrift itself is little to none compared to the overhead of
> > network latency and the operation on the other end of the connection. Of
> > course performance is important, but winning performance at the expense
> of
> > anything else (complexity, wire version compatibility, etc) is probably
> > not
> > on the top of anyone's lists.
> >
> > Like I said, I haven't looked at cthrift, but wanted to let you know
> where
> > I
> > think most users' priorities are.
> >
> > -Todd
> >
> > On Mon, Mar 15, 2010 at 9:25 AM, Mayan Moudgill <ma...@bestweb.net>
> wrote:
> >
> >> I'd appreciate some feedback from users about where they want
> >> improvements
> >> in the Thrift model. This may help me focus where I want to go with
> >> cthrift.
> >>
> >> I've been thinking about where to go with cthrift. Clearly, at some
> >> point
> >> soon, cthrift will be able to generate fairly efficient client-side
> >> code.
> >> Thrift2c will allow use of Thrift IDL, or perhaps assist in the
> >> migration to
> >> C based stub-specification.
> >>
> >> On the server side, I _think_ that the current approach of splitting a
> >> RPC
> >> call into 3 actions:
> >> 1. recieve the message
> >> 2. call the RPC specified in the message
> >> 3. transmit the RPC response, if any
> >> combined with the ability to add queues between these actions should
> >> allow
> >> for the implementation of almost any kind of server models.
> >>
> >> Clearly, one possible future direction is to implement multiple
> >> skeletons
> >> for different kinds of servers, so that it becomes more cut-and-paste to
> >> implement a server. The target would be to implement a epoll OR select
> >> based
> >> multi-socket kind of server, possibly with the receive and transmit
> >> being
> >> implemented as co-routines (or threads) that switch if blocked on a
> >> part-message.
> >>
> >> Another possible direction is to have cthrift also generate the code to
> >> export the stubs to Python/Java/etc. as built-in modules/JNI/etc. The
> >> flip
> >> side, of course, would be to embed Python instead of extending it. Would
> >> anyone have a preference?
> >>
> >> Hmmm... if followed through to its ultimate conclusion, we'll have the
> >> ability to go from Thrift IDL to a particular language interface, either
> >> in
> >> Thrift or in cthrift. The difference will be that in Thrift the
> >> generated
> >> code will be native (i.e. Python for --gen python etc.) whilst in
> >> cthrift
> >> the generated code will be in C with enough glue to allow calls from the
> >> desired upper layer language.
> >>
> >> Another difference, of course, will be that in the case of cthrift, the
> >> transport/protocol choices will be more hardwired than in Thrift.
> >>
> >> Another question: what be the second most important transport? Is it
> >> TFileTransport?
> >>
> >>
> >>
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Server performance and thrift

Posted by Todd Lipcon <to...@cloudera.com>.
On Mon, Mar 15, 2010 at 11:27 AM, <ma...@bestweb.net> wrote:

>
> I quite understand that the overhead of network latency may be the most
> important thing performance-wise. However, is there any proof of this?
> Will this also hold when we go to 10G interfaces with TCP offload?
>
>
Yep, this is a very interesting trend to watch for the coming few years. I
think you're right that Thrift and other RPC protocols will adapt a bit when
10G becomes commonplace in the DC.


> Wire version compatibility is an absolute requirement.
>
> Complexity - I'm not sure that cthrift will (eventually, specially when
> combined with thrift2c) make the higher layers any more complex; in fact,
> on the client side, I am 90% certain that we can keep the same interfaces.
>
>  Currently, cthrift tries to do the following:
> - minimize syscalls using write/writev
> - avoid blocking by using buffered reads and O_NONBLOCK sockets
> There isn't a lot more one can do on the client end that I can think of,
> unless you have multiple identical clients on the same box talking to the
> same server. Then you have the opportunity for batching etc., but that
> gets ... complicated ... very quickly.
>
> On the server side, the batching may be more natural, but admittedly I
> haven't given a lot of thought to that.
>
> Co-routine switch/thread-switch on partial reads on the server may be
> quite interesting to get performance (when combined with epoll/select or
> similar mechanisms).
>
>
All very interesting ideas. I'm left somewhat confused, though, why these
things are happening in a satellite project instead of as normal patches on
the Thrift JIRA?

-Todd


>
>
> > Hi Mayan,
> >
> > I haven't had time to look at your cthrift stuff yet, but wanted to give
> > my
> > quick input:
> >
> > For the vast majority of use cases where I've seen Thrift in use, the
> > overhead of Thrift itself is little to none compared to the overhead of
> > network latency and the operation on the other end of the connection. Of
> > course performance is important, but winning performance at the expense
> of
> > anything else (complexity, wire version compatibility, etc) is probably
> > not
> > on the top of anyone's lists.
> >
> > Like I said, I haven't looked at cthrift, but wanted to let you know
> where
> > I
> > think most users' priorities are.
> >
> > -Todd
> >
> > On Mon, Mar 15, 2010 at 9:25 AM, Mayan Moudgill <ma...@bestweb.net>
> wrote:
> >
> >> I'd appreciate some feedback from users about where they want
> >> improvements
> >> in the Thrift model. This may help me focus where I want to go with
> >> cthrift.
> >>
> >> I've been thinking about where to go with cthrift. Clearly, at some
> >> point
> >> soon, cthrift will be able to generate fairly efficient client-side
> >> code.
> >> Thrift2c will allow use of Thrift IDL, or perhaps assist in the
> >> migration to
> >> C based stub-specification.
> >>
> >> On the server side, I _think_ that the current approach of splitting a
> >> RPC
> >> call into 3 actions:
> >> 1. recieve the message
> >> 2. call the RPC specified in the message
> >> 3. transmit the RPC response, if any
> >> combined with the ability to add queues between these actions should
> >> allow
> >> for the implementation of almost any kind of server models.
> >>
> >> Clearly, one possible future direction is to implement multiple
> >> skeletons
> >> for different kinds of servers, so that it becomes more cut-and-paste to
> >> implement a server. The target would be to implement a epoll OR select
> >> based
> >> multi-socket kind of server, possibly with the receive and transmit
> >> being
> >> implemented as co-routines (or threads) that switch if blocked on a
> >> part-message.
> >>
> >> Another possible direction is to have cthrift also generate the code to
> >> export the stubs to Python/Java/etc. as built-in modules/JNI/etc. The
> >> flip
> >> side, of course, would be to embed Python instead of extending it. Would
> >> anyone have a preference?
> >>
> >> Hmmm... if followed through to its ultimate conclusion, we'll have the
> >> ability to go from Thrift IDL to a particular language interface, either
> >> in
> >> Thrift or in cthrift. The difference will be that in Thrift the
> >> generated
> >> code will be native (i.e. Python for --gen python etc.) whilst in
> >> cthrift
> >> the generated code will be in C with enough glue to allow calls from the
> >> desired upper layer language.
> >>
> >> Another difference, of course, will be that in the case of cthrift, the
> >> transport/protocol choices will be more hardwired than in Thrift.
> >>
> >> Another question: what be the second most important transport? Is it
> >> TFileTransport?
> >>
> >>
> >>
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Server performance and thrift

Posted by ma...@bestweb.net.
I quite understand that the overhead of network latency may be the most
important thing performance-wise. However, is there any proof of this?
Will this also hold when we go to 10G interfaces with TCP offload?

Wire version compatibility is an absolute requirement.

Complexity - I'm not sure that cthrift will (eventually, specially when
combined with thrift2c) make the higher layers any more complex; in fact,
on the client side, I am 90% certain that we can keep the same interfaces.

 Currently, cthrift tries to do the following:
- minimize syscalls using write/writev
- avoid blocking by using buffered reads and O_NONBLOCK sockets
There isn't a lot more one can do on the client end that I can think of,
unless you have multiple identical clients on the same box talking to the
same server. Then you have the opportunity for batching etc., but that
gets ... complicated ... very quickly.

On the server side, the batching may be more natural, but admittedly I
haven't given a lot of thought to that.

Co-routine switch/thread-switch on partial reads on the server may be
quite interesting to get performance (when combined with epoll/select or
similar mechanisms).



> Hi Mayan,
>
> I haven't had time to look at your cthrift stuff yet, but wanted to give
> my
> quick input:
>
> For the vast majority of use cases where I've seen Thrift in use, the
> overhead of Thrift itself is little to none compared to the overhead of
> network latency and the operation on the other end of the connection. Of
> course performance is important, but winning performance at the expense of
> anything else (complexity, wire version compatibility, etc) is probably
> not
> on the top of anyone's lists.
>
> Like I said, I haven't looked at cthrift, but wanted to let you know where
> I
> think most users' priorities are.
>
> -Todd
>
> On Mon, Mar 15, 2010 at 9:25 AM, Mayan Moudgill <ma...@bestweb.net> wrote:
>
>> I'd appreciate some feedback from users about where they want
>> improvements
>> in the Thrift model. This may help me focus where I want to go with
>> cthrift.
>>
>> I've been thinking about where to go with cthrift. Clearly, at some
>> point
>> soon, cthrift will be able to generate fairly efficient client-side
>> code.
>> Thrift2c will allow use of Thrift IDL, or perhaps assist in the
>> migration to
>> C based stub-specification.
>>
>> On the server side, I _think_ that the current approach of splitting a
>> RPC
>> call into 3 actions:
>> 1. recieve the message
>> 2. call the RPC specified in the message
>> 3. transmit the RPC response, if any
>> combined with the ability to add queues between these actions should
>> allow
>> for the implementation of almost any kind of server models.
>>
>> Clearly, one possible future direction is to implement multiple
>> skeletons
>> for different kinds of servers, so that it becomes more cut-and-paste to
>> implement a server. The target would be to implement a epoll OR select
>> based
>> multi-socket kind of server, possibly with the receive and transmit
>> being
>> implemented as co-routines (or threads) that switch if blocked on a
>> part-message.
>>
>> Another possible direction is to have cthrift also generate the code to
>> export the stubs to Python/Java/etc. as built-in modules/JNI/etc. The
>> flip
>> side, of course, would be to embed Python instead of extending it. Would
>> anyone have a preference?
>>
>> Hmmm... if followed through to its ultimate conclusion, we'll have the
>> ability to go from Thrift IDL to a particular language interface, either
>> in
>> Thrift or in cthrift. The difference will be that in Thrift the
>> generated
>> code will be native (i.e. Python for --gen python etc.) whilst in
>> cthrift
>> the generated code will be in C with enough glue to allow calls from the
>> desired upper layer language.
>>
>> Another difference, of course, will be that in the case of cthrift, the
>> transport/protocol choices will be more hardwired than in Thrift.
>>
>> Another question: what be the second most important transport? Is it
>> TFileTransport?
>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



Re: Server performance and thrift

Posted by ma...@bestweb.net.
I quite understand that the overhead of network latency may be the most
important thing performance-wise. However, is there any proof of this?
Will this also hold when we go to 10G interfaces with TCP offload?

Wire version compatibility is an absolute requirement.

Complexity - I'm not sure that cthrift will (eventually, specially when
combined with thrift2c) make the higher layers any more complex; in fact,
on the client side, I am 90% certain that we can keep the same interfaces.

 Currently, cthrift tries to do the following:
- minimize syscalls using write/writev
- avoid blocking by using buffered reads and O_NONBLOCK sockets
There isn't a lot more one can do on the client end that I can think of,
unless you have multiple identical clients on the same box talking to the
same server. Then you have the opportunity for batching etc., but that
gets ... complicated ... very quickly.

On the server side, the batching may be more natural, but admittedly I
haven't given a lot of thought to that.

Co-routine switch/thread-switch on partial reads on the server may be
quite interesting to get performance (when combined with epoll/select or
similar mechanisms).



> Hi Mayan,
>
> I haven't had time to look at your cthrift stuff yet, but wanted to give
> my
> quick input:
>
> For the vast majority of use cases where I've seen Thrift in use, the
> overhead of Thrift itself is little to none compared to the overhead of
> network latency and the operation on the other end of the connection. Of
> course performance is important, but winning performance at the expense of
> anything else (complexity, wire version compatibility, etc) is probably
> not
> on the top of anyone's lists.
>
> Like I said, I haven't looked at cthrift, but wanted to let you know where
> I
> think most users' priorities are.
>
> -Todd
>
> On Mon, Mar 15, 2010 at 9:25 AM, Mayan Moudgill <ma...@bestweb.net> wrote:
>
>> I'd appreciate some feedback from users about where they want
>> improvements
>> in the Thrift model. This may help me focus where I want to go with
>> cthrift.
>>
>> I've been thinking about where to go with cthrift. Clearly, at some
>> point
>> soon, cthrift will be able to generate fairly efficient client-side
>> code.
>> Thrift2c will allow use of Thrift IDL, or perhaps assist in the
>> migration to
>> C based stub-specification.
>>
>> On the server side, I _think_ that the current approach of splitting a
>> RPC
>> call into 3 actions:
>> 1. recieve the message
>> 2. call the RPC specified in the message
>> 3. transmit the RPC response, if any
>> combined with the ability to add queues between these actions should
>> allow
>> for the implementation of almost any kind of server models.
>>
>> Clearly, one possible future direction is to implement multiple
>> skeletons
>> for different kinds of servers, so that it becomes more cut-and-paste to
>> implement a server. The target would be to implement a epoll OR select
>> based
>> multi-socket kind of server, possibly with the receive and transmit
>> being
>> implemented as co-routines (or threads) that switch if blocked on a
>> part-message.
>>
>> Another possible direction is to have cthrift also generate the code to
>> export the stubs to Python/Java/etc. as built-in modules/JNI/etc. The
>> flip
>> side, of course, would be to embed Python instead of extending it. Would
>> anyone have a preference?
>>
>> Hmmm... if followed through to its ultimate conclusion, we'll have the
>> ability to go from Thrift IDL to a particular language interface, either
>> in
>> Thrift or in cthrift. The difference will be that in Thrift the
>> generated
>> code will be native (i.e. Python for --gen python etc.) whilst in
>> cthrift
>> the generated code will be in C with enough glue to allow calls from the
>> desired upper layer language.
>>
>> Another difference, of course, will be that in the case of cthrift, the
>> transport/protocol choices will be more hardwired than in Thrift.
>>
>> Another question: what be the second most important transport? Is it
>> TFileTransport?
>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



Re: Server performance and thrift

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Mayan,

I haven't had time to look at your cthrift stuff yet, but wanted to give my
quick input:

For the vast majority of use cases where I've seen Thrift in use, the
overhead of Thrift itself is little to none compared to the overhead of
network latency and the operation on the other end of the connection. Of
course performance is important, but winning performance at the expense of
anything else (complexity, wire version compatibility, etc) is probably not
on the top of anyone's lists.

Like I said, I haven't looked at cthrift, but wanted to let you know where I
think most users' priorities are.

-Todd

On Mon, Mar 15, 2010 at 9:25 AM, Mayan Moudgill <ma...@bestweb.net> wrote:

> I'd appreciate some feedback from users about where they want improvements
> in the Thrift model. This may help me focus where I want to go with cthrift.
>
> I've been thinking about where to go with cthrift. Clearly, at some point
> soon, cthrift will be able to generate fairly efficient client-side code.
> Thrift2c will allow use of Thrift IDL, or perhaps assist in the migration to
> C based stub-specification.
>
> On the server side, I _think_ that the current approach of splitting a RPC
> call into 3 actions:
> 1. recieve the message
> 2. call the RPC specified in the message
> 3. transmit the RPC response, if any
> combined with the ability to add queues between these actions should allow
> for the implementation of almost any kind of server models.
>
> Clearly, one possible future direction is to implement multiple skeletons
> for different kinds of servers, so that it becomes more cut-and-paste to
> implement a server. The target would be to implement a epoll OR select based
> multi-socket kind of server, possibly with the receive and transmit being
> implemented as co-routines (or threads) that switch if blocked on a
> part-message.
>
> Another possible direction is to have cthrift also generate the code to
> export the stubs to Python/Java/etc. as built-in modules/JNI/etc. The flip
> side, of course, would be to embed Python instead of extending it. Would
> anyone have a preference?
>
> Hmmm... if followed through to its ultimate conclusion, we'll have the
> ability to go from Thrift IDL to a particular language interface, either in
> Thrift or in cthrift. The difference will be that in Thrift the generated
> code will be native (i.e. Python for --gen python etc.) whilst in cthrift
> the generated code will be in C with enough glue to allow calls from the
> desired upper layer language.
>
> Another difference, of course, will be that in the case of cthrift, the
> transport/protocol choices will be more hardwired than in Thrift.
>
> Another question: what be the second most important transport? Is it
> TFileTransport?
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Server performance and thrift

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Mayan,

I haven't had time to look at your cthrift stuff yet, but wanted to give my
quick input:

For the vast majority of use cases where I've seen Thrift in use, the
overhead of Thrift itself is little to none compared to the overhead of
network latency and the operation on the other end of the connection. Of
course performance is important, but winning performance at the expense of
anything else (complexity, wire version compatibility, etc) is probably not
on the top of anyone's lists.

Like I said, I haven't looked at cthrift, but wanted to let you know where I
think most users' priorities are.

-Todd

On Mon, Mar 15, 2010 at 9:25 AM, Mayan Moudgill <ma...@bestweb.net> wrote:

> I'd appreciate some feedback from users about where they want improvements
> in the Thrift model. This may help me focus where I want to go with cthrift.
>
> I've been thinking about where to go with cthrift. Clearly, at some point
> soon, cthrift will be able to generate fairly efficient client-side code.
> Thrift2c will allow use of Thrift IDL, or perhaps assist in the migration to
> C based stub-specification.
>
> On the server side, I _think_ that the current approach of splitting a RPC
> call into 3 actions:
> 1. recieve the message
> 2. call the RPC specified in the message
> 3. transmit the RPC response, if any
> combined with the ability to add queues between these actions should allow
> for the implementation of almost any kind of server models.
>
> Clearly, one possible future direction is to implement multiple skeletons
> for different kinds of servers, so that it becomes more cut-and-paste to
> implement a server. The target would be to implement a epoll OR select based
> multi-socket kind of server, possibly with the receive and transmit being
> implemented as co-routines (or threads) that switch if blocked on a
> part-message.
>
> Another possible direction is to have cthrift also generate the code to
> export the stubs to Python/Java/etc. as built-in modules/JNI/etc. The flip
> side, of course, would be to embed Python instead of extending it. Would
> anyone have a preference?
>
> Hmmm... if followed through to its ultimate conclusion, we'll have the
> ability to go from Thrift IDL to a particular language interface, either in
> Thrift or in cthrift. The difference will be that in Thrift the generated
> code will be native (i.e. Python for --gen python etc.) whilst in cthrift
> the generated code will be in C with enough glue to allow calls from the
> desired upper layer language.
>
> Another difference, of course, will be that in the case of cthrift, the
> transport/protocol choices will be more hardwired than in Thrift.
>
> Another question: what be the second most important transport? Is it
> TFileTransport?
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera