You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Johan Stuyts <j....@zybber.nl> on 2008/06/19 20:34:47 UTC

Multiplexed, poolable transport

Hi,

Is anyone interested in writing a specification for a multiplexed,  
poolable transport?

My use case is this: I separate my applications in a server tier  
containing all the business logic, validation and security, and one or  
more presentation tiers. All functionality of the application is exposed  
using multiple Thrift services. The current implementation requires  
opening a new server socket for each service. When the number of services  
grows this becomes very cumbersome. Also a connection to the server can be  
used for only one invocation which makes invocations expensive because the  
TCP handshake has to be performed each time.

What I am proposing is similar to database connections. The client pools a  
number of connections to the server. When a request has to be made, a  
connection is retrieved from the pool, a function of an arbitrary service  
is invoked and the connection is returned to the pool.

The advantages are:
- more friendly to firewalls. Only one socket per application needs to be  
opened.
- better performance because connections are kept open indefinitely so the  
expensive TCP handshake can be omitted.

I have no experience writing protocols so I will need help to write the  
specification.

-- 
Kind regards,

Johan Stuyts

Re: Multiplexed, poolable transport

Posted by Johan Stuyts <j....@zybber.nl>.
> Would you mind making this a JIRA issue with the "new feature" type
> and attaching your patches to that?

Done: https://issues.apache.org/jira/browse/THRIFT-66

-- 
Kind regards,

Johan Stuyts

Re: Multiplexed, poolable transport

Posted by David Reiss <dr...@facebook.com>.
Would you mind making this a JIRA issue with the "new feature" type
and attaching your patches to that?

--David

Johan Stuyts wrote:
>> For #2, I'm of the opinion that this should be handled above the Thrift
>> level because it adds significant complexity to multiple components of
>> Thrift, it is not easy to add on a language-by-language basis, and I
>> don't think it can be done in a way that will be "right" for all users.
> 
> I have implemented a very simple client and a server in Java that wrap 
> 'CALL' messages with a message specifying the name of the service. The 
> latter message has a new message type: SERVICE_SELECTION, and the message 
> name is an arbitrary identifier which is used to lookup the processor in a 
> map on the server. On the client side I have to wrap an instance of the 
> static inner class 'Client' with a dynamic proxy that will wrap the 'CALL' 
> message for each invocation of a Thrift function. The responses are not 
> modified or wrapped.
> 
> I can do mixed synchronous and asynchronous calls of different functions 
> of different services over a single connection. If a request is made for a 
> service that does not exist, an exception is returned and the connection 
> is still usable for following calls because the 'CALL' message and the 
> struct for the parameters are skipped.
> 
> A minor disadvantage of this approach is that the output gets flushed 
> twice: by class 'Client' to flush the 'CALL' message and by the proxy to 
> flush the end of the 'SERVICE_SELECTION' message. Depending on the 
> protocol and transport used this may affect performance by causing two 
> instead of one network packet to be sent.
> 
> What do you think of this approach?
> 
>> For #3, I would recommend just setting a send/recv timeout in the client
>> transport (C++ and PHP support this for sure).  If the request times out,
>> an exception will be thrown.
>>
>> For #4, we depend on the TCP checksum to detect corruption at this point,
>> though a checksumming framed transport would be a nice feature.
> 
> I have not implemented any data corruption or timeout handling yet. I am 
> thinking about just dropping the connection on the side that detects the 
> error. This way I do not have to invent and implement a way for the client 
> and the server to negotiate a reset when one of them encounters an error.
> 
> Dropping a connection and reestabling a new one is expensive, but I think 
> a custom reset would probably require a sequence of 'magic' bytes, which 
> requires the escaping and unescaping of those sequences in the normal 
> data. This would add extra processing to all normal operations for the 
> unlikely event of a communication error. I also think that negotiating a 
> custom reset will probably be at least as expensive as the TCP traffic 
> needed for closing the current and opening a new connection.
> 
> There is no session or something similar associated with a connection, so 
> when the client reconnects it can simply continue where it left off (after 
> refreshing its state).
> 
> Is this a viable strategy for handling communication errors?
> 
> 
> I attached the source files of the implementation. To try them out you 
> have to include the generated JavaBean classes of the tutorial in the 
> class path, and add the following to 'TMessageType':
>    public static final byte SERVICE_SELECTION  = 4;
> 
> --
> Kind regards,
> 
> Johan Stuyts
> 

Re: Multiplexed, poolable transport

Posted by Johan Stuyts <j....@zybber.nl>.
> I attached the source files of the implementation.

The files didn't get through. They were probably blocked by the mailing  
list. If you want to have a look at them, send me an e-mail and I will  
send you the files.

-- 
Kind regards,

Johan Stuyts

Re: Multiplexed, poolable transport

Posted by Johan Stuyts <j....@zybber.nl>.
> For #2, I'm of the opinion that this should be handled above the Thrift
> level because it adds significant complexity to multiple components of
> Thrift, it is not easy to add on a language-by-language basis, and I
> don't think it can be done in a way that will be "right" for all users.

I have implemented a very simple client and a server in Java that wrap  
'CALL' messages with a message specifying the name of the service. The  
latter message has a new message type: SERVICE_SELECTION, and the message  
name is an arbitrary identifier which is used to lookup the processor in a  
map on the server. On the client side I have to wrap an instance of the  
static inner class 'Client' with a dynamic proxy that will wrap the 'CALL'  
message for each invocation of a Thrift function. The responses are not  
modified or wrapped.

I can do mixed synchronous and asynchronous calls of different functions  
of different services over a single connection. If a request is made for a  
service that does not exist, an exception is returned and the connection  
is still usable for following calls because the 'CALL' message and the  
struct for the parameters are skipped.

A minor disadvantage of this approach is that the output gets flushed  
twice: by class 'Client' to flush the 'CALL' message and by the proxy to  
flush the end of the 'SERVICE_SELECTION' message. Depending on the  
protocol and transport used this may affect performance by causing two  
instead of one network packet to be sent.

What do you think of this approach?

> For #3, I would recommend just setting a send/recv timeout in the client
> transport (C++ and PHP support this for sure).  If the request times out,
> an exception will be thrown.
>
> For #4, we depend on the TCP checksum to detect corruption at this point,
> though a checksumming framed transport would be a nice feature.

I have not implemented any data corruption or timeout handling yet. I am  
thinking about just dropping the connection on the side that detects the  
error. This way I do not have to invent and implement a way for the client  
and the server to negotiate a reset when one of them encounters an error.

Dropping a connection and reestabling a new one is expensive, but I think  
a custom reset would probably require a sequence of 'magic' bytes, which  
requires the escaping and unescaping of those sequences in the normal  
data. This would add extra processing to all normal operations for the  
unlikely event of a communication error. I also think that negotiating a  
custom reset will probably be at least as expensive as the TCP traffic  
needed for closing the current and opening a new connection.

There is no session or something similar associated with a connection, so  
when the client reconnects it can simply continue where it left off (after  
refreshing its state).

Is this a viable strategy for handling communication errors?


I attached the source files of the implementation. To try them out you  
have to include the generated JavaBean classes of the tutorial in the  
class path, and add the following to 'TMessageType':
   public static final byte SERVICE_SELECTION  = 4;

-- 
Kind regards,

Johan Stuyts

Re: Multiplexed, poolable transport

Posted by David Reiss <dr...@facebook.com>.
Sorry for the late response.  There are four separate issues here.

1/ Pooling and reusing connections.
2/ "virtual hosting" with service names prefixed to the message names.
3/ Timeout handling.
4/ Corruption detection and management.

With regards to #1, this is no different from any other code you would
use to maintain a pool of resources.  Take one out when you need it,
put it back when you don't.  I'll try to throw some code into contrib
that demonstrates this.

For #2, I'm of the opinion that this should be handled above the Thrift
level because it adds significant complexity to multiple components of
Thrift, it is not easy to add on a language-by-language basis, and I
don't think it can be done in a way that will be "right" for all users.

For #3, I would recommend just setting a send/recv timeout in the client
transport (C++ and PHP support this for sure).  If the request times out,
an exception will be thrown.

For #4, we depend on the TCP checksum to detect corruption at this point,
though a checksumming framed transport would be a nice feature.

--David

Johan Stuyts wrote:
>> It seems to me that this is something that can be done completely
>> outside of Thrift (and we have done it at least once at Facebook).
> 
> Could you write a wiki page about how you did it? Thanks.
> 
>> Currently, it is not possible to use the same connection for multiple
>> simultaneous requests, but the concept of a client pool should still
> 
> I do not want to have simultaneous requests over one connection, because I 
> assume building a transport that only supports serialized requests is much 
> easier. But as I said before I have no experience in writing protocols, so 
> I could be wrong.
> 
>> work.  If this is done in a general way, I'd be happy to include it
>> in the base Thrift distribution.
> 
> I am currently thinking in the direction of adding a TProtocol wrapper 
> that will add a service identifier prefix to each message name. On the 
> server side the message name will be split so the service and the function 
> can be looked up.
> 
> I would like to have some input about what the best way would be to handle 
> error conditions: timeouts, corrupt streams, etc.
> 
> --
> Kind regards,
> 
> Johan Stuyts
> 

Re: Multiplexed, poolable transport

Posted by Johan Stuyts <j....@zybber.nl>.
> It seems to me that this is something that can be done completely
> outside of Thrift (and we have done it at least once at Facebook).

Could you write a wiki page about how you did it? Thanks.

> Currently, it is not possible to use the same connection for multiple
> simultaneous requests, but the concept of a client pool should still

I do not want to have simultaneous requests over one connection, because I  
assume building a transport that only supports serialized requests is much  
easier. But as I said before I have no experience in writing protocols, so  
I could be wrong.

> work.  If this is done in a general way, I'd be happy to include it
> in the base Thrift distribution.

I am currently thinking in the direction of adding a TProtocol wrapper  
that will add a service identifier prefix to each message name. On the  
server side the message name will be split so the service and the function  
can be looked up.

I would like to have some input about what the best way would be to handle  
error conditions: timeouts, corrupt streams, etc.

-- 
Kind regards,

Johan Stuyts

Re: Multiplexed, poolable transport

Posted by David Reiss <dr...@facebook.com>.
It seems to me that this is something that can be done completely
outside of Thrift (and we have done it at least once at Facebook).
Currently, it is not possible to use the same connection for multiple
simultaneous requests, but the concept of a client pool should still
work.  If this is done in a general way, I'd be happy to include it
in the base Thrift distribution.

--David

Johan Stuyts wrote:
> Hi,
> 
> Is anyone interested in writing a specification for a multiplexed, 
> poolable transport?
> 
> My use case is this: I separate my applications in a server tier 
> containing all the business logic, validation and security, and one or 
> more presentation tiers. All functionality of the application is exposed 
> using multiple Thrift services. The current implementation requires 
> opening a new server socket for each service. When the number of services 
> grows this becomes very cumbersome. Also a connection to the server can be 
> used for only one invocation which makes invocations expensive because the 
> TCP handshake has to be performed each time.
> 
> What I am proposing is similar to database connections. The client pools a 
> number of connections to the server. When a request has to be made, a 
> connection is retrieved from the pool, a function of an arbitrary service 
> is invoked and the connection is returned to the pool.
> 
> The advantages are:
> - more friendly to firewalls. Only one socket per application needs to be 
> opened.
> - better performance because connections are kept open indefinitely so the 
> expensive TCP handshake can be omitted.
> 
> I have no experience writing protocols so I will need help to write the 
> specification.
> 
> --
> Kind regards,
> 
> Johan Stuyts
> 

Re: Multiplexed, poolable transport

Posted by Johan Stuyts <j....@zybber.nl>.
> Services can inherit from one another. Would it make sense for you to  
> have one meta-service inherit from all the separate services, and just  
> serve the meta-service?

No, this would result in a service containing hundreds, and even thousands  
for large applications, of functions.

> By the way, multiple invocations can happen in the same connection. You  
> don't need to create a new connection every time, though there is risk  
> of fairness issues if you keep persistent connections. This is something  
> we've solved with NonblockingServer in Ruby, and will be solving  
> similarly in Java soon.

I see how you can do multiple invocations now, I skimmed over the code too  
quickly before.

I read THRIFT-5 and understand the fairness issue, but I see this as an  
implementation detail of the transport I am proposing.

-- 
Kind regards,

Johan Stuyts

Re: Multiplexed, poolable transport

Posted by Bryan Duxbury <br...@rapleaf.com>.
Services can inherit from one another. Would it make sense for you to  
have one meta-service inherit from all the separate services, and  
just serve the meta-service?

By the way, multiple invocations can happen in the same connection.  
You don't need to create a new connection every time, though there is  
risk of fairness issues if you keep persistent connections. This is  
something we've solved with NonblockingServer in Ruby, and will be  
solving similarly in Java soon.

-Bryan

On Jun 19, 2008, at 11:34 AM, Johan Stuyts wrote:

> Hi,
>
> Is anyone interested in writing a specification for a multiplexed,  
> poolable transport?
>
> My use case is this: I separate my applications in a server tier  
> containing all the business logic, validation and security, and one  
> or more presentation tiers. All functionality of the application is  
> exposed using multiple Thrift services. The current implementation  
> requires opening a new server socket for each service. When the  
> number of services grows this becomes very cumbersome. Also a  
> connection to the server can be used for only one invocation which  
> makes invocations expensive because the TCP handshake has to be  
> performed each time.
>
> What I am proposing is similar to database connections. The client  
> pools a number of connections to the server. When a request has to  
> be made, a connection is retrieved from the pool, a function of an  
> arbitrary service is invoked and the connection is returned to the  
> pool.
>
> The advantages are:
> - more friendly to firewalls. Only one socket per application needs  
> to be opened.
> - better performance because connections are kept open indefinitely  
> so the expensive TCP handshake can be omitted.
>
> I have no experience writing protocols so I will need help to write  
> the specification.
>
> -- 
> Kind regards,
>
> Johan Stuyts