You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Johan Stuyts <j....@zybber.nl> on 2008/06/19 20:34:47 UTC
Multiplexed, poolable transport
Hi,
Is anyone interested in writing a specification for a multiplexed,
poolable transport?
My use case is this: I separate my applications in a server tier
containing all the business logic, validation and security, and one or
more presentation tiers. All functionality of the application is exposed
using multiple Thrift services. The current implementation requires
opening a new server socket for each service. When the number of services
grows this becomes very cumbersome. Also a connection to the server can be
used for only one invocation which makes invocations expensive because the
TCP handshake has to be performed each time.
What I am proposing is similar to database connections. The client pools a
number of connections to the server. When a request has to be made, a
connection is retrieved from the pool, a function of an arbitrary service
is invoked and the connection is returned to the pool.
The advantages are:
- more friendly to firewalls. Only one socket per application needs to be
opened.
- better performance because connections are kept open indefinitely so the
expensive TCP handshake can be omitted.
I have no experience writing protocols so I will need help to write the
specification.
--
Kind regards,
Johan Stuyts
Re: Multiplexed, poolable transport
Posted by Johan Stuyts <j....@zybber.nl>.
> Would you mind making this a JIRA issue with the "new feature" type
> and attaching your patches to that?
Done: https://issues.apache.org/jira/browse/THRIFT-66
--
Kind regards,
Johan Stuyts
Re: Multiplexed, poolable transport
Posted by David Reiss <dr...@facebook.com>.
Would you mind making this a JIRA issue with the "new feature" type
and attaching your patches to that?
--David
Johan Stuyts wrote:
>> For #2, I'm of the opinion that this should be handled above the Thrift
>> level because it adds significant complexity to multiple components of
>> Thrift, it is not easy to add on a language-by-language basis, and I
>> don't think it can be done in a way that will be "right" for all users.
>
> I have implemented a very simple client and a server in Java that wrap
> 'CALL' messages with a message specifying the name of the service. The
> latter message has a new message type: SERVICE_SELECTION, and the message
> name is an arbitrary identifier which is used to lookup the processor in a
> map on the server. On the client side I have to wrap an instance of the
> static inner class 'Client' with a dynamic proxy that will wrap the 'CALL'
> message for each invocation of a Thrift function. The responses are not
> modified or wrapped.
>
> I can do mixed synchronous and asynchronous calls of different functions
> of different services over a single connection. If a request is made for a
> service that does not exist, an exception is returned and the connection
> is still usable for following calls because the 'CALL' message and the
> struct for the parameters are skipped.
>
> A minor disadvantage of this approach is that the output gets flushed
> twice: by class 'Client' to flush the 'CALL' message and by the proxy to
> flush the end of the 'SERVICE_SELECTION' message. Depending on the
> protocol and transport used this may affect performance by causing two
> instead of one network packet to be sent.
>
> What do you think of this approach?
>
>> For #3, I would recommend just setting a send/recv timeout in the client
>> transport (C++ and PHP support this for sure). If the request times out,
>> an exception will be thrown.
>>
>> For #4, we depend on the TCP checksum to detect corruption at this point,
>> though a checksumming framed transport would be a nice feature.
>
> I have not implemented any data corruption or timeout handling yet. I am
> thinking about just dropping the connection on the side that detects the
> error. This way I do not have to invent and implement a way for the client
> and the server to negotiate a reset when one of them encounters an error.
>
> Dropping a connection and reestabling a new one is expensive, but I think
> a custom reset would probably require a sequence of 'magic' bytes, which
> requires the escaping and unescaping of those sequences in the normal
> data. This would add extra processing to all normal operations for the
> unlikely event of a communication error. I also think that negotiating a
> custom reset will probably be at least as expensive as the TCP traffic
> needed for closing the current and opening a new connection.
>
> There is no session or something similar associated with a connection, so
> when the client reconnects it can simply continue where it left off (after
> refreshing its state).
>
> Is this a viable strategy for handling communication errors?
>
>
> I attached the source files of the implementation. To try them out you
> have to include the generated JavaBean classes of the tutorial in the
> class path, and add the following to 'TMessageType':
> public static final byte SERVICE_SELECTION = 4;
>
> --
> Kind regards,
>
> Johan Stuyts
>
Re: Multiplexed, poolable transport
Posted by Johan Stuyts <j....@zybber.nl>.
> I attached the source files of the implementation.
The files didn't get through. They were probably blocked by the mailing
list. If you want to have a look at them, send me an e-mail and I will
send you the files.
--
Kind regards,
Johan Stuyts
Re: Multiplexed, poolable transport
Posted by Johan Stuyts <j....@zybber.nl>.
> For #2, I'm of the opinion that this should be handled above the Thrift
> level because it adds significant complexity to multiple components of
> Thrift, it is not easy to add on a language-by-language basis, and I
> don't think it can be done in a way that will be "right" for all users.
I have implemented a very simple client and a server in Java that wrap
'CALL' messages with a message specifying the name of the service. The
latter message has a new message type: SERVICE_SELECTION, and the message
name is an arbitrary identifier which is used to lookup the processor in a
map on the server. On the client side I have to wrap an instance of the
static inner class 'Client' with a dynamic proxy that will wrap the 'CALL'
message for each invocation of a Thrift function. The responses are not
modified or wrapped.
I can do mixed synchronous and asynchronous calls of different functions
of different services over a single connection. If a request is made for a
service that does not exist, an exception is returned and the connection
is still usable for following calls because the 'CALL' message and the
struct for the parameters are skipped.
A minor disadvantage of this approach is that the output gets flushed
twice: by class 'Client' to flush the 'CALL' message and by the proxy to
flush the end of the 'SERVICE_SELECTION' message. Depending on the
protocol and transport used this may affect performance by causing two
instead of one network packet to be sent.
What do you think of this approach?
> For #3, I would recommend just setting a send/recv timeout in the client
> transport (C++ and PHP support this for sure). If the request times out,
> an exception will be thrown.
>
> For #4, we depend on the TCP checksum to detect corruption at this point,
> though a checksumming framed transport would be a nice feature.
I have not implemented any data corruption or timeout handling yet. I am
thinking about just dropping the connection on the side that detects the
error. This way I do not have to invent and implement a way for the client
and the server to negotiate a reset when one of them encounters an error.
Dropping a connection and reestabling a new one is expensive, but I think
a custom reset would probably require a sequence of 'magic' bytes, which
requires the escaping and unescaping of those sequences in the normal
data. This would add extra processing to all normal operations for the
unlikely event of a communication error. I also think that negotiating a
custom reset will probably be at least as expensive as the TCP traffic
needed for closing the current and opening a new connection.
There is no session or something similar associated with a connection, so
when the client reconnects it can simply continue where it left off (after
refreshing its state).
Is this a viable strategy for handling communication errors?
I attached the source files of the implementation. To try them out you
have to include the generated JavaBean classes of the tutorial in the
class path, and add the following to 'TMessageType':
public static final byte SERVICE_SELECTION = 4;
--
Kind regards,
Johan Stuyts
Re: Multiplexed, poolable transport
Posted by David Reiss <dr...@facebook.com>.
Sorry for the late response. There are four separate issues here.
1/ Pooling and reusing connections.
2/ "virtual hosting" with service names prefixed to the message names.
3/ Timeout handling.
4/ Corruption detection and management.
With regards to #1, this is no different from any other code you would
use to maintain a pool of resources. Take one out when you need it,
put it back when you don't. I'll try to throw some code into contrib
that demonstrates this.
For #2, I'm of the opinion that this should be handled above the Thrift
level because it adds significant complexity to multiple components of
Thrift, it is not easy to add on a language-by-language basis, and I
don't think it can be done in a way that will be "right" for all users.
For #3, I would recommend just setting a send/recv timeout in the client
transport (C++ and PHP support this for sure). If the request times out,
an exception will be thrown.
For #4, we depend on the TCP checksum to detect corruption at this point,
though a checksumming framed transport would be a nice feature.
--David
Johan Stuyts wrote:
>> It seems to me that this is something that can be done completely
>> outside of Thrift (and we have done it at least once at Facebook).
>
> Could you write a wiki page about how you did it? Thanks.
>
>> Currently, it is not possible to use the same connection for multiple
>> simultaneous requests, but the concept of a client pool should still
>
> I do not want to have simultaneous requests over one connection, because I
> assume building a transport that only supports serialized requests is much
> easier. But as I said before I have no experience in writing protocols, so
> I could be wrong.
>
>> work. If this is done in a general way, I'd be happy to include it
>> in the base Thrift distribution.
>
> I am currently thinking in the direction of adding a TProtocol wrapper
> that will add a service identifier prefix to each message name. On the
> server side the message name will be split so the service and the function
> can be looked up.
>
> I would like to have some input about what the best way would be to handle
> error conditions: timeouts, corrupt streams, etc.
>
> --
> Kind regards,
>
> Johan Stuyts
>
Re: Multiplexed, poolable transport
Posted by Johan Stuyts <j....@zybber.nl>.
> It seems to me that this is something that can be done completely
> outside of Thrift (and we have done it at least once at Facebook).
Could you write a wiki page about how you did it? Thanks.
> Currently, it is not possible to use the same connection for multiple
> simultaneous requests, but the concept of a client pool should still
I do not want to have simultaneous requests over one connection, because I
assume building a transport that only supports serialized requests is much
easier. But as I said before I have no experience in writing protocols, so
I could be wrong.
> work. If this is done in a general way, I'd be happy to include it
> in the base Thrift distribution.
I am currently thinking in the direction of adding a TProtocol wrapper
that will add a service identifier prefix to each message name. On the
server side the message name will be split so the service and the function
can be looked up.
I would like to have some input about what the best way would be to handle
error conditions: timeouts, corrupt streams, etc.
--
Kind regards,
Johan Stuyts
Re: Multiplexed, poolable transport
Posted by David Reiss <dr...@facebook.com>.
It seems to me that this is something that can be done completely
outside of Thrift (and we have done it at least once at Facebook).
Currently, it is not possible to use the same connection for multiple
simultaneous requests, but the concept of a client pool should still
work. If this is done in a general way, I'd be happy to include it
in the base Thrift distribution.
--David
Johan Stuyts wrote:
> Hi,
>
> Is anyone interested in writing a specification for a multiplexed,
> poolable transport?
>
> My use case is this: I separate my applications in a server tier
> containing all the business logic, validation and security, and one or
> more presentation tiers. All functionality of the application is exposed
> using multiple Thrift services. The current implementation requires
> opening a new server socket for each service. When the number of services
> grows this becomes very cumbersome. Also a connection to the server can be
> used for only one invocation which makes invocations expensive because the
> TCP handshake has to be performed each time.
>
> What I am proposing is similar to database connections. The client pools a
> number of connections to the server. When a request has to be made, a
> connection is retrieved from the pool, a function of an arbitrary service
> is invoked and the connection is returned to the pool.
>
> The advantages are:
> - more friendly to firewalls. Only one socket per application needs to be
> opened.
> - better performance because connections are kept open indefinitely so the
> expensive TCP handshake can be omitted.
>
> I have no experience writing protocols so I will need help to write the
> specification.
>
> --
> Kind regards,
>
> Johan Stuyts
>
Re: Multiplexed, poolable transport
Posted by Johan Stuyts <j....@zybber.nl>.
> Services can inherit from one another. Would it make sense for you to
> have one meta-service inherit from all the separate services, and just
> serve the meta-service?
No, this would result in a service containing hundreds, and even thousands
for large applications, of functions.
> By the way, multiple invocations can happen in the same connection. You
> don't need to create a new connection every time, though there is risk
> of fairness issues if you keep persistent connections. This is something
> we've solved with NonblockingServer in Ruby, and will be solving
> similarly in Java soon.
I see how you can do multiple invocations now, I skimmed over the code too
quickly before.
I read THRIFT-5 and understand the fairness issue, but I see this as an
implementation detail of the transport I am proposing.
--
Kind regards,
Johan Stuyts
Re: Multiplexed, poolable transport
Posted by Bryan Duxbury <br...@rapleaf.com>.
Services can inherit from one another. Would it make sense for you to
have one meta-service inherit from all the separate services, and
just serve the meta-service?
By the way, multiple invocations can happen in the same connection.
You don't need to create a new connection every time, though there is
risk of fairness issues if you keep persistent connections. This is
something we've solved with NonblockingServer in Ruby, and will be
solving similarly in Java soon.
-Bryan
On Jun 19, 2008, at 11:34 AM, Johan Stuyts wrote:
> Hi,
>
> Is anyone interested in writing a specification for a multiplexed,
> poolable transport?
>
> My use case is this: I separate my applications in a server tier
> containing all the business logic, validation and security, and one
> or more presentation tiers. All functionality of the application is
> exposed using multiple Thrift services. The current implementation
> requires opening a new server socket for each service. When the
> number of services grows this becomes very cumbersome. Also a
> connection to the server can be used for only one invocation which
> makes invocations expensive because the TCP handshake has to be
> performed each time.
>
> What I am proposing is similar to database connections. The client
> pools a number of connections to the server. When a request has to
> be made, a connection is retrieved from the pool, a function of an
> arbitrary service is invoked and the connection is returned to the
> pool.
>
> The advantages are:
> - more friendly to firewalls. Only one socket per application needs
> to be opened.
> - better performance because connections are kept open indefinitely
> so the expensive TCP handshake can be omitted.
>
> I have no experience writing protocols so I will need help to write
> the specification.
>
> --
> Kind regards,
>
> Johan Stuyts