You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "David Fankhauser (JIRA)" <ji...@apache.org> on 2017/02/09 10:32:41 UTC
[jira] [Created] (THRIFT-4080) Unix sockets can get stuck forever
David Fankhauser created THRIFT-4080:
----------------------------------------
Summary: Unix sockets can get stuck forever
Key: THRIFT-4080
URL: https://issues.apache.org/jira/browse/THRIFT-4080
Project: Thrift
Issue Type: Bug
Components: Python - Library
Affects Versions: 0.10.0
Environment: Ubuntu 14.04
Reporter: David Fankhauser
Priority: Critical
I had the problem that if the network connection is really bad the server sometimes does not accept more connections. "Really bad" means that a simple ping event sent via thrift could take 15 seconds.
Having this issue for nearly 2 years now I could finally figure it out:
There is no timeout when the socket receives data. After a connection is established and the socket object is created, the connection can drop which yields to the socket blocking forever.
I added a timeout in the TSocket accept function which makes the socket throw a resource not available error after 30 seconds:
def accept(self):
client, addr = self.handle.accept()
# added timeout of 30.0 seconds
client.setsockopt(socket.SOL_SOCKET, socket.SO_RCVTIMEO, struct.pack('LL', 30, 0))
result = TSocket()
result.setHandle(client)
return result
Gives this error:
buff = self.handle.recv(sz)
error: [Errno 11] Resource temporarily unavailable
I also tried using python socket's settimeout() function which does not work. Only setting the receive timeout times out dropped connections.
This bug does not appear on stable connections. However, I have 4 devices that are connected via WiFi and my ThreadedServer gets stuck about 4-5 times a day. The ThreadedServer has 5 threads, thus all 5 sockets get stuck all the time...
FYI here is the strace of the stuck socket:
[pid 2698] futex(0x7faf50000d80, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 2697] read(4, <unfinished ...>
[pid 2693] accept(7, {sa_family=AF_INET6, sin6_port=htons(39911), inet_pton(AF_INET6, "::ffff:46.125.249.41", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 6
[pid 2693] recvfrom(6,
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)