You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Debacker <de...@gmail.com> on 2009/04/23 13:47:55 UTC

multi thread requests handling

Hi,

By reading the Java library, I noticed that each connection (transport) gets
its own thread (in the thread-pool version), so a given thread will process
requests of a single connection. In the single thread version, you can only
have a single connection concurrently. Why not accepting connection as they
arrive, initiate a read on all of them, put them is something like an epoll
(Linux world), and the pool of k threads would take care of the next request
of the next k connection which responded. When a thread has done with a
request, it can put it back in the epoll, and pick up another one. That's
based on event-based servers such as Nginx or Cherokee.

Could you tell me why the current design has been selected? I think that the
best way to use the current server is by creating a connection, making a
call or two, then disconnect. But socket creation takes time, that's why it
would be nicer to keep some connections open, and reuse them when needed
(think about connection pool of database, but instead of a database, you get
thrift).

I have noticed your non blocking server based on
http://java.sun.com/javase/6/docs/api/java/nio/channels/spi/SelectorProvider.html,
but you use the framed protocol to read the whole packet completely before
calling the implementation. But this is not needed, as I said in the first
paragraph, we could just read the first chunk of bytes (i.e. 4096) of each
connection. This would required one read buffer per connection, but it is
not a problem since the objective would be to reuse the connections.

Thanks,
Laurent Debacker.

Re: multi thread requests handling

Posted by Ted Dunning <te...@gmail.com>.
The tide in the ongoing debate between thread per connection and state
machine approaches has turned a bit recently.  Having a whale of a lot of
threads is not nearly as expensive as it once was.  In fact, as the
state-machine implementation gets more complex, the amount of state that has
to be maintained for each pending transaction leads to an implementation
which is essentially build a thread scheduler without hardware support.  On
the other side, general thread schedulers have been getting awesomely much
better with the result that thread per transaction is becoming a very viable
model.

On Fri, Apr 24, 2009 at 6:26 AM, Debacker <de...@gmail.com> wrote:

> However, if NIO is slower,
> it is likely Java's fault, in native C, the fastest web servers include
> nginx and Cherokee which are async-I/O and event-programming based.
>



-- 
Ted Dunning, CTO
DeepDyve

Re: multi thread requests handling

Posted by Debacker <de...@gmail.com>.
THsHaServer uses framed protocol which I don't want because it require an
unnecessary copy of data at the sender side to measure the length of the
packet. I know I'm picky, but Thrift is all about maximum efficiency as I
understand :p

Thanks for the link btw. It may very well explain why the designers of the
java lib for Thrift selected this implementation. However, if NIO is slower,
it is likely Java's fault, in native C, the fastest web servers include
nginx and Cherokee which are async-I/O and event-programming based.

If I have enough time, I could implement my idea, benchmark it, and
contribute back if it is faster or use less memory.

Laurent Debacker.

On Thu, Apr 23, 2009 at 11:13 PM, Joel Meyer <jo...@gmail.com> wrote:

> On Thu, Apr 23, 2009 at 11:38 AM, Debacker <de...@gmail.com> wrote:
>
> > Thanks for your answer. I should have specified that I would use
> > non-blocking asynchronous I/O for the initial read.
> > Hence, I can do an async read of max 4096 bytes on 1000 connections, put
> > them in a poll, and make a thread wait a completition event of any of the
> > 1000 connections.
> > Once the completition event is fired, the thread pick one of the ready
> > socket, and pass it to a worker thread. The worker thread will begin
> > reading
> > the beginning of the request, if it is not received completely, we can
> read
> > it synchronously until complete, then make the call to the interface, and
> > then write the response. After that, the worker thread give the
> connection
> > back to the master thread, and wait a new job from the master thread. The
> > objective here, is to support "sleeping" connections, which don't send
> any
> > request for some a few seconds or minutes. This, in turn, make it
> possible
> > to support connection pools at the client-side, to reduce useless
> > connection
> > creation/destruction, and the lag of it.
> >
> > With the current thread-pooled server, 1000 connections will use 1000
> > threads, even if connection are only used for 10% of the time. With my
> > system, I would use between 100 and 1000 threads instead.
> >
> > Another possibility would be to have one (or few) thread(s) reading
> > requests
> > from all connections at the same time (using aync I/O and epoll), using
> one
> > state machine per connection. Once the state machine indicates that the
> > parsing of the request is complete, we can pass the request to a thread
> of
> > the thread poll, get the response,
>
>
> I may be mistaken, but I believe this is what the THsHaServer does. The
> incoming requests are handled asynchronously using select() and when a
> complete request is received (framing is required to detect this) it is
> handed off to a thread in the invoker pool.
>
> I believe the THsHaServer is actually very close to what you desire. Also,
> if you're running on a system with a modern thread implementation, having a
> high number of threads isn't as bad as it once was*. You may be better off
> starting with something that's easily available and optimizing if
> necessary.
>
> *
>
> http://mailinator.blogspot.com/2008/02/kill-myth-please-nio-is-not-faster-than.html
>
> Cheers,
> Joel
>
> and pass the response to another thread
> > which will write all response using async I/O concurrently again. That
> way,
> > we do async I/O all the time, and ony the implementation will be
> blocking.
> > But I think it is not possible to implement this using the current
> > architecture, that is to say, using the current Protocol classes. And the
> > only advantage over my first proposition would be to more efficiency for
> > connection which are very slow to transmit a request, which is not that
> > important.
> >
> > Laurent Debacker.
> >
> > On Thu, Apr 23, 2009 at 7:29 PM, David Reiss <dr...@facebook.com>
> wrote:
> >
> > > > I have noticed your non blocking server based on
> > > >
> > >
> >
> http://java.sun.com/javase/6/docs/api/java/nio/channels/spi/SelectorProvider.html
> > > ,
> > > > but you use the framed protocol to read the whole packet completely
> > > before
> > > > calling the implementation. But this is not needed, as I said in the
> > > first
> > > > paragraph, we could just read the first chunk of bytes (i.e. 4096) of
> > > each
> > > > connection. This would required one read buffer per connection, but
> it
> > is
> > > > not a problem since the objective would be to reuse the connections.
> > >
> > > You have to read the entire request before handing off to the processor
> > > code.  This is because the processor will attempt to deserialize the
> > > entire request.  If it is deserializing from a buffer, it will reach
> the
> > > end and throw an exception, wasting valuable CPU time reading a request
> > > that is not complete.  If it is deserializing directly from the socket
> > > (or a TBufferedTransport on the socket), it will block until the rest
> of
> > > the message arrives, which makes that thread useless.  Does this make
> > > sense?
> > >
> > > --David
> > >
> >
>

Re: multi thread requests handling

Posted by Joel Meyer <jo...@gmail.com>.
On Thu, Apr 23, 2009 at 11:38 AM, Debacker <de...@gmail.com> wrote:

> Thanks for your answer. I should have specified that I would use
> non-blocking asynchronous I/O for the initial read.
> Hence, I can do an async read of max 4096 bytes on 1000 connections, put
> them in a poll, and make a thread wait a completition event of any of the
> 1000 connections.
> Once the completition event is fired, the thread pick one of the ready
> socket, and pass it to a worker thread. The worker thread will begin
> reading
> the beginning of the request, if it is not received completely, we can read
> it synchronously until complete, then make the call to the interface, and
> then write the response. After that, the worker thread give the connection
> back to the master thread, and wait a new job from the master thread. The
> objective here, is to support "sleeping" connections, which don't send any
> request for some a few seconds or minutes. This, in turn, make it possible
> to support connection pools at the client-side, to reduce useless
> connection
> creation/destruction, and the lag of it.
>
> With the current thread-pooled server, 1000 connections will use 1000
> threads, even if connection are only used for 10% of the time. With my
> system, I would use between 100 and 1000 threads instead.
>
> Another possibility would be to have one (or few) thread(s) reading
> requests
> from all connections at the same time (using aync I/O and epoll), using one
> state machine per connection. Once the state machine indicates that the
> parsing of the request is complete, we can pass the request to a thread of
> the thread poll, get the response,


I may be mistaken, but I believe this is what the THsHaServer does. The
incoming requests are handled asynchronously using select() and when a
complete request is received (framing is required to detect this) it is
handed off to a thread in the invoker pool.

I believe the THsHaServer is actually very close to what you desire. Also,
if you're running on a system with a modern thread implementation, having a
high number of threads isn't as bad as it once was*. You may be better off
starting with something that's easily available and optimizing if necessary.

*
http://mailinator.blogspot.com/2008/02/kill-myth-please-nio-is-not-faster-than.html

Cheers,
Joel

and pass the response to another thread
> which will write all response using async I/O concurrently again. That way,
> we do async I/O all the time, and ony the implementation will be blocking.
> But I think it is not possible to implement this using the current
> architecture, that is to say, using the current Protocol classes. And the
> only advantage over my first proposition would be to more efficiency for
> connection which are very slow to transmit a request, which is not that
> important.
>
> Laurent Debacker.
>
> On Thu, Apr 23, 2009 at 7:29 PM, David Reiss <dr...@facebook.com> wrote:
>
> > > I have noticed your non blocking server based on
> > >
> >
> http://java.sun.com/javase/6/docs/api/java/nio/channels/spi/SelectorProvider.html
> > ,
> > > but you use the framed protocol to read the whole packet completely
> > before
> > > calling the implementation. But this is not needed, as I said in the
> > first
> > > paragraph, we could just read the first chunk of bytes (i.e. 4096) of
> > each
> > > connection. This would required one read buffer per connection, but it
> is
> > > not a problem since the objective would be to reuse the connections.
> >
> > You have to read the entire request before handing off to the processor
> > code.  This is because the processor will attempt to deserialize the
> > entire request.  If it is deserializing from a buffer, it will reach the
> > end and throw an exception, wasting valuable CPU time reading a request
> > that is not complete.  If it is deserializing directly from the socket
> > (or a TBufferedTransport on the socket), it will block until the rest of
> > the message arrives, which makes that thread useless.  Does this make
> > sense?
> >
> > --David
> >
>

Re: multi thread requests handling

Posted by Debacker <de...@gmail.com>.
Thanks for your answer. I should have specified that I would use
non-blocking asynchronous I/O for the initial read.
Hence, I can do an async read of max 4096 bytes on 1000 connections, put
them in a poll, and make a thread wait a completition event of any of the
1000 connections.
Once the completition event is fired, the thread pick one of the ready
socket, and pass it to a worker thread. The worker thread will begin reading
the beginning of the request, if it is not received completely, we can read
it synchronously until complete, then make the call to the interface, and
then write the response. After that, the worker thread give the connection
back to the master thread, and wait a new job from the master thread. The
objective here, is to support "sleeping" connections, which don't send any
request for some a few seconds or minutes. This, in turn, make it possible
to support connection pools at the client-side, to reduce useless connection
creation/destruction, and the lag of it.

With the current thread-pooled server, 1000 connections will use 1000
threads, even if connection are only used for 10% of the time. With my
system, I would use between 100 and 1000 threads instead.

Another possibility would be to have one (or few) thread(s) reading requests
from all connections at the same time (using aync I/O and epoll), using one
state machine per connection. Once the state machine indicates that the
parsing of the request is complete, we can pass the request to a thread of
the thread poll, get the response, and pass the response to another thread
which will write all response using async I/O concurrently again. That way,
we do async I/O all the time, and ony the implementation will be blocking.
But I think it is not possible to implement this using the current
architecture, that is to say, using the current Protocol classes. And the
only advantage over my first proposition would be to more efficiency for
connection which are very slow to transmit a request, which is not that
important.

Laurent Debacker.

On Thu, Apr 23, 2009 at 7:29 PM, David Reiss <dr...@facebook.com> wrote:

> > I have noticed your non blocking server based on
> >
> http://java.sun.com/javase/6/docs/api/java/nio/channels/spi/SelectorProvider.html
> ,
> > but you use the framed protocol to read the whole packet completely
> before
> > calling the implementation. But this is not needed, as I said in the
> first
> > paragraph, we could just read the first chunk of bytes (i.e. 4096) of
> each
> > connection. This would required one read buffer per connection, but it is
> > not a problem since the objective would be to reuse the connections.
>
> You have to read the entire request before handing off to the processor
> code.  This is because the processor will attempt to deserialize the
> entire request.  If it is deserializing from a buffer, it will reach the
> end and throw an exception, wasting valuable CPU time reading a request
> that is not complete.  If it is deserializing directly from the socket
> (or a TBufferedTransport on the socket), it will block until the rest of
> the message arrives, which makes that thread useless.  Does this make
> sense?
>
> --David
>

Re: multi thread requests handling

Posted by David Reiss <dr...@facebook.com>.
> I have noticed your non blocking server based on
> http://java.sun.com/javase/6/docs/api/java/nio/channels/spi/SelectorProvider.html,
> but you use the framed protocol to read the whole packet completely before
> calling the implementation. But this is not needed, as I said in the first
> paragraph, we could just read the first chunk of bytes (i.e. 4096) of each
> connection. This would required one read buffer per connection, but it is
> not a problem since the objective would be to reuse the connections.

You have to read the entire request before handing off to the processor
code.  This is because the processor will attempt to deserialize the
entire request.  If it is deserializing from a buffer, it will reach the
end and throw an exception, wasting valuable CPU time reading a request
that is not complete.  If it is deserializing directly from the socket
(or a TBufferedTransport on the socket), it will block until the rest of
the message arrives, which makes that thread useless.  Does this make
sense?

--David