You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@thrift.apache.org by Akshat Aranya <aa...@gmail.com> on 2012/08/08 21:00:07 UTC

Using multi-threaded clients with Thrift

Hi,

I'm trying to implement a Thrift client and server where both are
multi-threaded.  Things are working fine on the server side, but I'm
getting "out of sequence" errors on the client.  I looked through the
code a little bit, and I realized that a synchronous client can only
have one outstanding message, so it is not possible to have multiple
clients make simultaneous calls.  This problem happens even when each
thread has its own Client, but they share a Protocol.  Is my
assessment correct?  If so, is the only way to make it work in a
multi-threaded environment is to use an independent connection (i.e.,
a new Transport) per thread?  That seems kind of wasteful and
inefficient.

Thanks,
Akshat

RE: Using multi-threaded clients with Thrift

Posted by Mark Slee <ms...@fb.com>.

>> sometimes having more threads than the number of cores is
desirable when you have a thread pool with worker threads doing
network/disk I/O

Yep, totally agreed, and this is certainly an intuitive way of programming.

But in all of these systems, either each worker thread has exclusive ownership over the file descriptors/sockets it uses, or you have to introduce locking around those shared resources.

The latter is typically a much larger performance drain, so that's why Thrift doesn't do it by default (or at all - it is up to the application layer to lock appropriately).

>> I suppose it is possible to keep most of code in a worker thread model
and just use a queue of requests at the lowest level where I need to
make Thrift RPC calls

Yep, exactly. Each thread can have a semaphore. You do all your application logic in worker threads. When a worker thread needs a network thing to happen, it places the request in a queue for the network thread, along with your semaphore. The network thread runs an infinite loop, pulling items off the queue and signaling the sempahore when they are complete. Your worker thread code can still *look* synchronous and not have any callbacks.

Here is really rough pseudocode:

network_queue;

worker_thread {
  semaphore s;
  run() {
    // do application logic
    request = thrift_request();
    request.sempahore = s;
    lock(network_queue) {
      network_queue.enqueue(request);
      s.increment();
    }

    // block until a networker signals
    s.wait();

    // request is now populated with our response data
    // more application logic
    return whatever;
  }
}

network_thread {
  while (true) {
    lock(network_queue) {
      request = network_queue.dequeue();
    }
    request.process();
    request.semaphore.signal();
  }
}

You can have multiple network threads all working on this queue. Typically makes sense to have as many networker threads as you have cores.

Cheers,
mcslee
________________________________________
From: Akshat Aranya [aaranya@gmail.com]
Sent: Wednesday, August 08, 2012 2:06 PM
To: user@thrift.apache.org
Subject: Re: Using multi-threaded clients with Thrift

On Wed, Aug 8, 2012 at 4:55 PM, Mark Slee <ms...@fb.com> wrote:
> The Thrift transport layer is not thread-safe. It is essentially a wrapper on a socket.
>
> You can't interleave writing things to a single socket from multiple threads without locking. You also don't know what order the responses will come back in. Each thread is effectively calling read(). To have this work in a multi-threaded environment would require another layer of abstraction that parceled out responses on the socket and determined which data should go to which thread. This would be less efficient in the common case of a single transport per thread.
>
> You certainly could build this functionality on top of the Thrift abstractions, but the base layers are designed to be very lightweight and pretty closely mimic raw sockets.
>
>>> If so, is the only way to make it work in a
> multi-threaded environment is to use an independent connection (i.e.,
> a new Transport) per thread?  That seems kind of wasteful and
> inefficient.
>
> In practice, assuming your number of threads is on the order of you number of cores, this is not inefficient and additional sockets aren't very expensive. Having each thread own its own socket obviates the need for locking around all accesses the shared socket resource, which tends to be much more costly.
>
> Another common design in a multi-threaded environment is to have a single networker thread (or a low fixed number of them). This thread owns a transport, and the clients put in requests to this thread to perform an operation and then block, waiting to receive a callback when the operation they requested is complete.
>
> Cheers,
> mcslee
> ________________________________________

Thanks for the information, Mark.  This is not a criticism of Thrift,
but sometimes having more threads than the number of cores is
desirable when you have a thread pool with worker threads doing
network/disk I/O.  I have programmed with both a message passing model
and a worker thread model, and in my opinion, the latter results in
more readable code and is easier to program with, especially with
languages such as C++ that require explicit memory management.  I
suppose it is possible to keep most of code in a worker thread model
and just use a queue of requests at the lowest level where I need to
make Thrift RPC calls.  I was hoping that Thrift would do this for me
out of the box. :-D

Cheers,
Akshat

Re: Using multi-threaded clients with Thrift

Posted by Matthew Chambers <mc...@wetafx.co.nz>.


On 09/08/12 09:18, Akshat Aranya wrote:
> On Wed, Aug 8, 2012 at 5:10 PM, Matthew Chambers<mc...@wetafx.co.nz>  wrote:
>>> Thanks for the information, Mark.  This is not a criticism of Thrift,
>>> but sometimes having more threads than the number of cores is
>>> desirable when you have a thread pool with worker threads doing
>>> network/disk I/O.  I have programmed with both a message passing model
>>> and a worker thread model, and in my opinion, the latter results in
>>> more readable code and is easier to program with, especially with
>>> languages such as C++ that require explicit memory management.  I
>>> suppose it is possible to keep most of code in a worker thread model
>>> and just use a queue of requests at the lowest level where I need to
>>> make Thrift RPC calls.  I was hoping that Thrift would do this for me
>>> out of the box. :-D
>>>
>>> Cheers,
>>> Akshat
>>
>> If that is what your doing, then you might want to create a pool of thrift
>> connections your threads can check out, work with, and then check back in.
>> In this case, its often helpful to have a super lightweight service method
>> that can be used to test the connection before giving it to the thread, so
>> you can reconnect if necessary.
>>
>> -Matt
> That still doesn't allow me to make a synchronous remote call from my
> application logic thread, which is much easier to program than asking
> another thread to make the remote call.  For now, I've implemented
> ThreadLocal Clients for my application logic threads, each with their
> own Transports, which means each thread has its own connection.  I
> don't know yet if that'll be a scalability issue, especially on the
> server where multiple multi-threaded clients will connect to.
>
> Cheers,
> Akshat

I don't see why not, your application thread is just another thread that 
can do something like:

service = thiftPool.checkOutService()
try:
     do stuff
finally:
     thriftPool.checkIn(service)

Or, give your application thread its own instance of the service at 
start up.

Sockets are not thread safe.  That is how its always been.  Sharing a 
resource like a socket between threads slows it down, it doesn't speed 
it up.  I doubt you'll see scalability issues.  I don't know how many 
clients you plan to have, but I work with in the 10-30k range 
(admittedly, not that many) You might have to increase the default 
number of allowed open file handles on your server depending on how many 
nodes you have, but with thrift is far easier to saturate your network 
connection long before you saturate the CPUs.

-Matt

Re: Using multi-threaded clients with Thrift

Posted by Akshat Aranya <aa...@gmail.com>.

On Wed, Aug 8, 2012 at 5:10 PM, Matthew Chambers <mc...@wetafx.co.nz> wrote:
>
>> Thanks for the information, Mark.  This is not a criticism of Thrift,
>> but sometimes having more threads than the number of cores is
>> desirable when you have a thread pool with worker threads doing
>> network/disk I/O.  I have programmed with both a message passing model
>> and a worker thread model, and in my opinion, the latter results in
>> more readable code and is easier to program with, especially with
>> languages such as C++ that require explicit memory management.  I
>> suppose it is possible to keep most of code in a worker thread model
>> and just use a queue of requests at the lowest level where I need to
>> make Thrift RPC calls.  I was hoping that Thrift would do this for me
>> out of the box. :-D
>>
>> Cheers,
>> Akshat
>
>
> If that is what your doing, then you might want to create a pool of thrift
> connections your threads can check out, work with, and then check back in.
> In this case, its often helpful to have a super lightweight service method
> that can be used to test the connection before giving it to the thread, so
> you can reconnect if necessary.
>
> -Matt

That still doesn't allow me to make a synchronous remote call from my
application logic thread, which is much easier to program than asking
another thread to make the remote call.  For now, I've implemented
ThreadLocal Clients for my application logic threads, each with their
own Transports, which means each thread has its own connection.  I
don't know yet if that'll be a scalability issue, especially on the
server where multiple multi-threaded clients will connect to.

Cheers,
Akshat

Re: Using multi-threaded clients with Thrift

Posted by Matthew Chambers <mc...@wetafx.co.nz>.

> Thanks for the information, Mark.  This is not a criticism of Thrift,
> but sometimes having more threads than the number of cores is
> desirable when you have a thread pool with worker threads doing
> network/disk I/O.  I have programmed with both a message passing model
> and a worker thread model, and in my opinion, the latter results in
> more readable code and is easier to program with, especially with
> languages such as C++ that require explicit memory management.  I
> suppose it is possible to keep most of code in a worker thread model
> and just use a queue of requests at the lowest level where I need to
> make Thrift RPC calls.  I was hoping that Thrift would do this for me
> out of the box. :-D
>
> Cheers,
> Akshat

If that is what your doing, then you might want to create a pool of 
thrift connections your threads can check out, work with, and then check 
back in.  In this case, its often helpful to have a super lightweight 
service method that can be used to test the connection before giving it 
to the thread, so you can reconnect if necessary.

-Matt

Re: Using multi-threaded clients with Thrift

Posted by Akshat Aranya <aa...@gmail.com>.

On Wed, Aug 8, 2012 at 4:55 PM, Mark Slee <ms...@fb.com> wrote:
> The Thrift transport layer is not thread-safe. It is essentially a wrapper on a socket.
>
> You can't interleave writing things to a single socket from multiple threads without locking. You also don't know what order the responses will come back in. Each thread is effectively calling read(). To have this work in a multi-threaded environment would require another layer of abstraction that parceled out responses on the socket and determined which data should go to which thread. This would be less efficient in the common case of a single transport per thread.
>
> You certainly could build this functionality on top of the Thrift abstractions, but the base layers are designed to be very lightweight and pretty closely mimic raw sockets.
>
>>> If so, is the only way to make it work in a
> multi-threaded environment is to use an independent connection (i.e.,
> a new Transport) per thread?  That seems kind of wasteful and
> inefficient.
>
> In practice, assuming your number of threads is on the order of you number of cores, this is not inefficient and additional sockets aren't very expensive. Having each thread own its own socket obviates the need for locking around all accesses the shared socket resource, which tends to be much more costly.
>
> Another common design in a multi-threaded environment is to have a single networker thread (or a low fixed number of them). This thread owns a transport, and the clients put in requests to this thread to perform an operation and then block, waiting to receive a callback when the operation they requested is complete.
>
> Cheers,
> mcslee
> ________________________________________

Thanks for the information, Mark.  This is not a criticism of Thrift,
but sometimes having more threads than the number of cores is
desirable when you have a thread pool with worker threads doing
network/disk I/O.  I have programmed with both a message passing model
and a worker thread model, and in my opinion, the latter results in
more readable code and is easier to program with, especially with
languages such as C++ that require explicit memory management.  I
suppose it is possible to keep most of code in a worker thread model
and just use a queue of requests at the lowest level where I need to
make Thrift RPC calls.  I was hoping that Thrift would do this for me
out of the box. :-D

Cheers,
Akshat

RE: Using multi-threaded clients with Thrift

Posted by Mark Slee <ms...@fb.com>.

The Thrift transport layer is not thread-safe. It is essentially a wrapper on a socket.

You can't interleave writing things to a single socket from multiple threads without locking. You also don't know what order the responses will come back in. Each thread is effectively calling read(). To have this work in a multi-threaded environment would require another layer of abstraction that parceled out responses on the socket and determined which data should go to which thread. This would be less efficient in the common case of a single transport per thread.

You certainly could build this functionality on top of the Thrift abstractions, but the base layers are designed to be very lightweight and pretty closely mimic raw sockets.

>> If so, is the only way to make it work in a
multi-threaded environment is to use an independent connection (i.e.,
a new Transport) per thread?  That seems kind of wasteful and
inefficient.

In practice, assuming your number of threads is on the order of you number of cores, this is not inefficient and additional sockets aren't very expensive. Having each thread own its own socket obviates the need for locking around all accesses the shared socket resource, which tends to be much more costly.

Another common design in a multi-threaded environment is to have a single networker thread (or a low fixed number of them). This thread owns a transport, and the clients put in requests to this thread to perform an operation and then block, waiting to receive a callback when the operation they requested is complete.

Cheers,
mcslee
________________________________________
From: Akshat Aranya [aaranya@gmail.com]
Sent: Wednesday, August 08, 2012 12:00 PM
To: user@thrift.apache.org
Subject: Using multi-threaded clients with Thrift

Hi,

I'm trying to implement a Thrift client and server where both are
multi-threaded.  Things are working fine on the server side, but I'm
getting "out of sequence" errors on the client.  I looked through the
code a little bit, and I realized that a synchronous client can only
have one outstanding message, so it is not possible to have multiple
clients make simultaneous calls.  This problem happens even when each
thread has its own Client, but they share a Protocol.  Is my
assessment correct?  If so, is the only way to make it work in a
multi-threaded environment is to use an independent connection (i.e.,
a new Transport) per thread?  That seems kind of wasteful and
inefficient.

Thanks,
Akshat

Re: Using multi-threaded clients with Thrift

Posted by Matthew Chambers <mc...@wetafx.co.nz>.

All you really need to do is wrap the TSocket, Transport, and Protocol 
into your own MyConnection class and have 1 per thread.  There is 
nothing inefficient about it, except maybe a bit more memory.  Or you 
can put a lock around your service and make it so only 1 thread can talk 
at any given time.

-Matt

On 09/08/12 07:00, Akshat Aranya wrote:
> Hi,
>
> I'm trying to implement a Thrift client and server where both are
> multi-threaded.  Things are working fine on the server side, but I'm
> getting "out of sequence" errors on the client.  I looked through the
> code a little bit, and I realized that a synchronous client can only
> have one outstanding message, so it is not possible to have multiple
> clients make simultaneous calls.  This problem happens even when each
> thread has its own Client, but they share a Protocol.  Is my
> assessment correct?  If so, is the only way to make it work in a
> multi-threaded environment is to use an independent connection (i.e.,
> a new Transport) per thread?  That seems kind of wasteful and
> inefficient.
>
> Thanks,
> Akshat