You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Curtis Spencer <th...@gmail.com> on 2009/05/07 00:01:45 UTC

Help Debugging Periodic Error

Hi,

First message to the thrift list.  Thanks for releasing such a great
technology.

I have been mainly using the C++ versions of client and server, but I
have been rarely encountering a problem that I don't really have much
data to help discern.
I wanted to see if this is something others had encountered and
whether or not I am doing something silly.

The problem I see is that my application servers (which are the
clients to a thrift service) periodically hang.  I can restart the
thrift service and the application servers (without restart) are happy
again.

The only output I see is at the thrift service level and what I get is this:

TServerTransport died on accept: Called write on non-open socket
Thrift: Wed May  6 09:30:07 2009 TSocket::write() send() <Host:  Port:
0>Broken pipe
TSimpleServer client died: write() send(): Broken pipe
TServerTransport died on accept: Called write on non-open socket
Thrift: Wed May  6 09:30:07 2009 TSocket::write() send() <Host:  Port:
0>Broken pipe
TSimpleServer client died: write() send(): Broken pipe
TServerTransport died on accept: Called write on non-open socket
Thrift: Wed May  6 09:30:07 2009 TSocket::write() send() <Host:  Port:
0>Broken pipe
TSimpleServer client died: write() send(): Broken pipe
TServerTransport died on accept: Called write on non-open socket
Thrift: Wed May  6 09:30:07 2009 TSocket::write() send() <Host:  Port:
0>Broken pipe
TSimpleServer client died: write() send(): Broken pipe
TServerTransport died on accept: Called write on non-open socket
Thrift: Wed May  6 09:30:07 2009 TSocket::write() send() <Host:  Port:
0>Broken pipe
TSimpleServer client died: write() send(): Broken pipe
TServerTransport died on accept: Called write on non-open socket
TSimpleServer client died: No more data to read.

For the next time I see it happen, I now have them both running with
debug symbols so i will attach to them with gdb and see exactly where
they are at, so I can add more information to this thread.

As for the client/server code, my server looks pretty cookie cutter:

int main(int argc, char **argv) {
  int port = 9092;
  shared_ptr<MyHandler> handler(new MyHandler());
  shared_ptr<TProcessor> processor(new MyProcessor(handler));
  shared_ptr<TServerTransport> serverTransport(new TServerSocket(port));
  shared_ptr<TTransportFactory> transportFactory(new
TBufferedTransportFactory());
  shared_ptr<TProtocolFactory> protocolFactory(new TBinaryProtocolFactory());

  TSimpleServer server(processor, serverTransport, transportFactory,
protocolFactory);
  server.serve();
  return 0;
}

My client that does the connection is:

  shared_ptr<TSocket> socket(new TSocket(host_name,port)));
  socket->setConnTimeout(THRIFT_TIMEOUT);  // Which Equals 10000
  socket->setRecvTimeout(THRIFT_TIMEOUT);
  shared_ptr<TTransport> transport(new TBufferedTransport(socket));
  shared_ptr<TProtocol> protocol(new TBinaryProtocol(transport));
  MyClient client(protocol);
  try {
    transport->open();
    client.mycall(results);
    transport->close();
  } catch (TException &tx) {
    cerr << "Caught Exception" << endl;
  }


Anything look fishy there?

Thanks in advance,
Curtis

Re: Help Debugging Periodic Error

Posted by Ted Dunning <te...@gmail.com>.
On Wed, May 6, 2009 at 5:18 PM, Curtis Spencer <th...@gmail.com> wrote:

> Wouldn't that be opening the transport anew each time?  So it
> shouldn't be keeping any connections open correct?
>

Looks like it.


>  I have found the performance to be fine, without worrying about a
> connection pooling strategy, although that is a question for another
> day.
>

I have found the same results.  Opening a connection each time is too small
to measure for most of our services when compared to the cost of the service
itself.



-- 
Ted Dunning, CTO
DeepDyve

Re: Help Debugging Periodic Error

Posted by Curtis Spencer <th...@gmail.com>.
On Wed, May 6, 2009 at 4:28 PM, Ted Dunning <te...@gmail.com> wrote:
> This sounds like it is the opposite problem.  Your connection is probably
> being closed for you somehow after being idle.
>
> Depending on your speed requirements, you can re-open the transport for each
> burst of activity or you can use a pooling strategy to close idle
> connections and re-open them when you need to do more work.

Ted,

I am going through this execution path on every request in my app server:

 shared_ptr<TSocket> socket(new TSocket(host_name,port)));
 socket->setConnTimeout(THRIFT_TIMEOUT);  // Which Equals 10000
 socket->setRecvTimeout(THRIFT_TIMEOUT);
 shared_ptr<TTransport> transport(new TBufferedTransport(socket));
 shared_ptr<TProtocol> protocol(new TBinaryProtocol(transport));
 MyClient client(protocol);
 try {
   transport->open();
   client.mycall(results);
   transport->close();
 } catch (TException &tx) {
   cerr << "Caught Exception" << endl;
 }

Wouldn't that be opening the transport anew each time?  So it
shouldn't be keeping any connections open correct?

I have found the performance to be fine, without worrying about a
connection pooling strategy, although that is a question for another
day.

-Curtis

Re: Help Debugging Periodic Error

Posted by Ted Dunning <te...@gmail.com>.
This sounds like it is the opposite problem.  Your connection is probably
being closed for you somehow after being idle.

Depending on your speed requirements, you can re-open the transport for each
burst of activity or you can use a pooling strategy to close idle
connections and re-open them when you need to do more work.

On Wed, May 6, 2009 at 4:16 PM, Curtis Spencer <th...@gmail.com> wrote:

> Also, one other thing I ought to mention is that I have never seen it
> happen when the service is under a heavy load.  Usually when a first
> request after a lull period or after a deploy.
>

Re: Help Debugging Periodic Error

Posted by Curtis Spencer <th...@gmail.com>.
On Wed, May 6, 2009 at 10:10 PM, Krzysztof Godlewski
<kr...@dajerade.pl> wrote:
>
> On 2009-05-07, at 00:01, Curtis Spencer wrote:
>
>> Anything look fishy there?
>
> Blind guess: file descriptor limit reached?
>

@David

I will just watch the problem and next time it exhibits, see where it
does hang.  Doesn't happen all that often.  Just the most opportune
times such as tech demos ;-).

The send timeout may be the ticket.

@Krzysztof

I have ulimit => unlimited, so I doubt it is the problem.  Is there
any way to ascertain whether or not it is encountering a file
descriptor limit?

Also, one other thing I ought to mention is that I have never seen it
happen when the service is under a heavy load.  Usually when a first
request after a lull period or after a deploy.

-Curtis

Re: Help Debugging Periodic Error

Posted by Krzysztof Godlewski <kr...@dajerade.pl>.
On 2009-05-07, at 00:01, Curtis Spencer wrote:

> Anything look fishy there?

Blind guess: file descriptor limit reached?

Re: Help Debugging Periodic Error

Posted by David Reiss <dr...@facebook.com>.
Looks pretty standard.  I'd be interested to see where the client is
hanging.  It doesn't seem likely, but maybe it is during the send?
You could try setting a send timeout.

--David

Curtis Spencer wrote:
> Hi,
> 
> First message to the thrift list.  Thanks for releasing such a great
> technology.
> 
> I have been mainly using the C++ versions of client and server, but I
> have been rarely encountering a problem that I don't really have much
> data to help discern.
> I wanted to see if this is something others had encountered and
> whether or not I am doing something silly.
> 
> The problem I see is that my application servers (which are the
> clients to a thrift service) periodically hang.  I can restart the
> thrift service and the application servers (without restart) are happy
> again.
> 
> The only output I see is at the thrift service level and what I get is this:
> 
> TServerTransport died on accept: Called write on non-open socket
> Thrift: Wed May  6 09:30:07 2009 TSocket::write() send() <Host:  Port:
> 0>Broken pipe
> TSimpleServer client died: write() send(): Broken pipe
> TServerTransport died on accept: Called write on non-open socket
> Thrift: Wed May  6 09:30:07 2009 TSocket::write() send() <Host:  Port:
> 0>Broken pipe
> TSimpleServer client died: write() send(): Broken pipe
> TServerTransport died on accept: Called write on non-open socket
> Thrift: Wed May  6 09:30:07 2009 TSocket::write() send() <Host:  Port:
> 0>Broken pipe
> TSimpleServer client died: write() send(): Broken pipe
> TServerTransport died on accept: Called write on non-open socket
> Thrift: Wed May  6 09:30:07 2009 TSocket::write() send() <Host:  Port:
> 0>Broken pipe
> TSimpleServer client died: write() send(): Broken pipe
> TServerTransport died on accept: Called write on non-open socket
> Thrift: Wed May  6 09:30:07 2009 TSocket::write() send() <Host:  Port:
> 0>Broken pipe
> TSimpleServer client died: write() send(): Broken pipe
> TServerTransport died on accept: Called write on non-open socket
> TSimpleServer client died: No more data to read.
> 
> For the next time I see it happen, I now have them both running with
> debug symbols so i will attach to them with gdb and see exactly where
> they are at, so I can add more information to this thread.
> 
> As for the client/server code, my server looks pretty cookie cutter:
> 
> int main(int argc, char **argv) {
>   int port = 9092;
>   shared_ptr<MyHandler> handler(new MyHandler());
>   shared_ptr<TProcessor> processor(new MyProcessor(handler));
>   shared_ptr<TServerTransport> serverTransport(new TServerSocket(port));
>   shared_ptr<TTransportFactory> transportFactory(new
> TBufferedTransportFactory());
>   shared_ptr<TProtocolFactory> protocolFactory(new TBinaryProtocolFactory());
> 
>   TSimpleServer server(processor, serverTransport, transportFactory,
> protocolFactory);
>   server.serve();
>   return 0;
> }
> 
> My client that does the connection is:
> 
>   shared_ptr<TSocket> socket(new TSocket(host_name,port)));
>   socket->setConnTimeout(THRIFT_TIMEOUT);  // Which Equals 10000
>   socket->setRecvTimeout(THRIFT_TIMEOUT);
>   shared_ptr<TTransport> transport(new TBufferedTransport(socket));
>   shared_ptr<TProtocol> protocol(new TBinaryProtocol(transport));
>   MyClient client(protocol);
>   try {
>     transport->open();
>     client.mycall(results);
>     transport->close();
>   } catch (TException &tx) {
>     cerr << "Caught Exception" << endl;
>   }
> 
> 
> Anything look fishy there?
> 
> Thanks in advance,
> Curtis