You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Curtis Spencer <th...@gmail.com> on 2009/05/07 00:01:45 UTC
Help Debugging Periodic Error
Hi,
First message to the thrift list. Thanks for releasing such a great
technology.
I have been mainly using the C++ versions of client and server, but I
have been rarely encountering a problem that I don't really have much
data to help discern.
I wanted to see if this is something others had encountered and
whether or not I am doing something silly.
The problem I see is that my application servers (which are the
clients to a thrift service) periodically hang. I can restart the
thrift service and the application servers (without restart) are happy
again.
The only output I see is at the thrift service level and what I get is this:
TServerTransport died on accept: Called write on non-open socket
Thrift: Wed May 6 09:30:07 2009 TSocket::write() send() <Host: Port:
0>Broken pipe
TSimpleServer client died: write() send(): Broken pipe
TServerTransport died on accept: Called write on non-open socket
Thrift: Wed May 6 09:30:07 2009 TSocket::write() send() <Host: Port:
0>Broken pipe
TSimpleServer client died: write() send(): Broken pipe
TServerTransport died on accept: Called write on non-open socket
Thrift: Wed May 6 09:30:07 2009 TSocket::write() send() <Host: Port:
0>Broken pipe
TSimpleServer client died: write() send(): Broken pipe
TServerTransport died on accept: Called write on non-open socket
Thrift: Wed May 6 09:30:07 2009 TSocket::write() send() <Host: Port:
0>Broken pipe
TSimpleServer client died: write() send(): Broken pipe
TServerTransport died on accept: Called write on non-open socket
Thrift: Wed May 6 09:30:07 2009 TSocket::write() send() <Host: Port:
0>Broken pipe
TSimpleServer client died: write() send(): Broken pipe
TServerTransport died on accept: Called write on non-open socket
TSimpleServer client died: No more data to read.
For the next time I see it happen, I now have them both running with
debug symbols so i will attach to them with gdb and see exactly where
they are at, so I can add more information to this thread.
As for the client/server code, my server looks pretty cookie cutter:
int main(int argc, char **argv) {
int port = 9092;
shared_ptr<MyHandler> handler(new MyHandler());
shared_ptr<TProcessor> processor(new MyProcessor(handler));
shared_ptr<TServerTransport> serverTransport(new TServerSocket(port));
shared_ptr<TTransportFactory> transportFactory(new
TBufferedTransportFactory());
shared_ptr<TProtocolFactory> protocolFactory(new TBinaryProtocolFactory());
TSimpleServer server(processor, serverTransport, transportFactory,
protocolFactory);
server.serve();
return 0;
}
My client that does the connection is:
shared_ptr<TSocket> socket(new TSocket(host_name,port)));
socket->setConnTimeout(THRIFT_TIMEOUT); // Which Equals 10000
socket->setRecvTimeout(THRIFT_TIMEOUT);
shared_ptr<TTransport> transport(new TBufferedTransport(socket));
shared_ptr<TProtocol> protocol(new TBinaryProtocol(transport));
MyClient client(protocol);
try {
transport->open();
client.mycall(results);
transport->close();
} catch (TException &tx) {
cerr << "Caught Exception" << endl;
}
Anything look fishy there?
Thanks in advance,
Curtis
Re: Help Debugging Periodic Error
Posted by Ted Dunning <te...@gmail.com>.
On Wed, May 6, 2009 at 5:18 PM, Curtis Spencer <th...@gmail.com> wrote:
> Wouldn't that be opening the transport anew each time? So it
> shouldn't be keeping any connections open correct?
>
Looks like it.
> I have found the performance to be fine, without worrying about a
> connection pooling strategy, although that is a question for another
> day.
>
I have found the same results. Opening a connection each time is too small
to measure for most of our services when compared to the cost of the service
itself.
--
Ted Dunning, CTO
DeepDyve
Re: Help Debugging Periodic Error
Posted by Curtis Spencer <th...@gmail.com>.
On Wed, May 6, 2009 at 4:28 PM, Ted Dunning <te...@gmail.com> wrote:
> This sounds like it is the opposite problem. Your connection is probably
> being closed for you somehow after being idle.
>
> Depending on your speed requirements, you can re-open the transport for each
> burst of activity or you can use a pooling strategy to close idle
> connections and re-open them when you need to do more work.
Ted,
I am going through this execution path on every request in my app server:
shared_ptr<TSocket> socket(new TSocket(host_name,port)));
socket->setConnTimeout(THRIFT_TIMEOUT); // Which Equals 10000
socket->setRecvTimeout(THRIFT_TIMEOUT);
shared_ptr<TTransport> transport(new TBufferedTransport(socket));
shared_ptr<TProtocol> protocol(new TBinaryProtocol(transport));
MyClient client(protocol);
try {
transport->open();
client.mycall(results);
transport->close();
} catch (TException &tx) {
cerr << "Caught Exception" << endl;
}
Wouldn't that be opening the transport anew each time? So it
shouldn't be keeping any connections open correct?
I have found the performance to be fine, without worrying about a
connection pooling strategy, although that is a question for another
day.
-Curtis
Re: Help Debugging Periodic Error
Posted by Ted Dunning <te...@gmail.com>.
This sounds like it is the opposite problem. Your connection is probably
being closed for you somehow after being idle.
Depending on your speed requirements, you can re-open the transport for each
burst of activity or you can use a pooling strategy to close idle
connections and re-open them when you need to do more work.
On Wed, May 6, 2009 at 4:16 PM, Curtis Spencer <th...@gmail.com> wrote:
> Also, one other thing I ought to mention is that I have never seen it
> happen when the service is under a heavy load. Usually when a first
> request after a lull period or after a deploy.
>
Re: Help Debugging Periodic Error
Posted by Curtis Spencer <th...@gmail.com>.
On Wed, May 6, 2009 at 10:10 PM, Krzysztof Godlewski
<kr...@dajerade.pl> wrote:
>
> On 2009-05-07, at 00:01, Curtis Spencer wrote:
>
>> Anything look fishy there?
>
> Blind guess: file descriptor limit reached?
>
@David
I will just watch the problem and next time it exhibits, see where it
does hang. Doesn't happen all that often. Just the most opportune
times such as tech demos ;-).
The send timeout may be the ticket.
@Krzysztof
I have ulimit => unlimited, so I doubt it is the problem. Is there
any way to ascertain whether or not it is encountering a file
descriptor limit?
Also, one other thing I ought to mention is that I have never seen it
happen when the service is under a heavy load. Usually when a first
request after a lull period or after a deploy.
-Curtis
Re: Help Debugging Periodic Error
Posted by Krzysztof Godlewski <kr...@dajerade.pl>.
On 2009-05-07, at 00:01, Curtis Spencer wrote:
> Anything look fishy there?
Blind guess: file descriptor limit reached?
Re: Help Debugging Periodic Error
Posted by David Reiss <dr...@facebook.com>.
Looks pretty standard. I'd be interested to see where the client is
hanging. It doesn't seem likely, but maybe it is during the send?
You could try setting a send timeout.
--David
Curtis Spencer wrote:
> Hi,
>
> First message to the thrift list. Thanks for releasing such a great
> technology.
>
> I have been mainly using the C++ versions of client and server, but I
> have been rarely encountering a problem that I don't really have much
> data to help discern.
> I wanted to see if this is something others had encountered and
> whether or not I am doing something silly.
>
> The problem I see is that my application servers (which are the
> clients to a thrift service) periodically hang. I can restart the
> thrift service and the application servers (without restart) are happy
> again.
>
> The only output I see is at the thrift service level and what I get is this:
>
> TServerTransport died on accept: Called write on non-open socket
> Thrift: Wed May 6 09:30:07 2009 TSocket::write() send() <Host: Port:
> 0>Broken pipe
> TSimpleServer client died: write() send(): Broken pipe
> TServerTransport died on accept: Called write on non-open socket
> Thrift: Wed May 6 09:30:07 2009 TSocket::write() send() <Host: Port:
> 0>Broken pipe
> TSimpleServer client died: write() send(): Broken pipe
> TServerTransport died on accept: Called write on non-open socket
> Thrift: Wed May 6 09:30:07 2009 TSocket::write() send() <Host: Port:
> 0>Broken pipe
> TSimpleServer client died: write() send(): Broken pipe
> TServerTransport died on accept: Called write on non-open socket
> Thrift: Wed May 6 09:30:07 2009 TSocket::write() send() <Host: Port:
> 0>Broken pipe
> TSimpleServer client died: write() send(): Broken pipe
> TServerTransport died on accept: Called write on non-open socket
> Thrift: Wed May 6 09:30:07 2009 TSocket::write() send() <Host: Port:
> 0>Broken pipe
> TSimpleServer client died: write() send(): Broken pipe
> TServerTransport died on accept: Called write on non-open socket
> TSimpleServer client died: No more data to read.
>
> For the next time I see it happen, I now have them both running with
> debug symbols so i will attach to them with gdb and see exactly where
> they are at, so I can add more information to this thread.
>
> As for the client/server code, my server looks pretty cookie cutter:
>
> int main(int argc, char **argv) {
> int port = 9092;
> shared_ptr<MyHandler> handler(new MyHandler());
> shared_ptr<TProcessor> processor(new MyProcessor(handler));
> shared_ptr<TServerTransport> serverTransport(new TServerSocket(port));
> shared_ptr<TTransportFactory> transportFactory(new
> TBufferedTransportFactory());
> shared_ptr<TProtocolFactory> protocolFactory(new TBinaryProtocolFactory());
>
> TSimpleServer server(processor, serverTransport, transportFactory,
> protocolFactory);
> server.serve();
> return 0;
> }
>
> My client that does the connection is:
>
> shared_ptr<TSocket> socket(new TSocket(host_name,port)));
> socket->setConnTimeout(THRIFT_TIMEOUT); // Which Equals 10000
> socket->setRecvTimeout(THRIFT_TIMEOUT);
> shared_ptr<TTransport> transport(new TBufferedTransport(socket));
> shared_ptr<TProtocol> protocol(new TBinaryProtocol(transport));
> MyClient client(protocol);
> try {
> transport->open();
> client.mycall(results);
> transport->close();
> } catch (TException &tx) {
> cerr << "Caught Exception" << endl;
> }
>
>
> Anything look fishy there?
>
> Thanks in advance,
> Curtis