You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Susan Hinrichs (JIRA)" <ji...@apache.org> on 2016/04/21 18:06:26 UTC

[jira] [Created] (TS-4372) Traffic server heart beat fails with 6.1

Susan Hinrichs created TS-4372:
----------------------------------

             Summary: Traffic server heart beat fails with 6.1
                 Key: TS-4372
                 URL: https://issues.apache.org/jira/browse/TS-4372
             Project: Traffic Server
          Issue Type: Bug
          Components: Cop, Manager
            Reporter: Susan Hinrichs


When running 6.1 in a loaded production environment, traffic server will run for a while (30 minutes or so), then server heart beats will start failing intermittently.  Eventually two will fail in a row causing the traffic_cop to restart traffic_server (or traffic_manager and then traffic_server I'm still a bit unclear there).

{code}
traffic_cop[18078]: (test) read failed [104 'Connection reset by peer']
{code}

There are no particular resource limitations on the production machine in this state.  The number of open sockets is around 50-60K which is consistent with its 5.3.x peer.  The memory usage is no where near the limit.  The CPU usage is high, but again, not near the limit (perhaps half the entire machine usage).

If we look at the packets exchanged on the loopback interface during this heartbeat failing interval, we see some interesting things.  I'll attach an example pcap file.   The interesting traffic is on port 8084 and 8083.  Traffic_cop sends a GET http://127.0.0.1:8083/synthetic.txt request to traffic_server over port 8084.  Traffic server should proxy the request and send the request GET /synthetic.txt to traffic_manager listing on port 8083.  Traffic manager returns a 200 response with some data.  Traffic_server relays that response to traffic_cop.

However, in the failure cases, traffic_cop sends the request and traffic_manager sends a RESET after the connection has been established and the request has been sent to it.   I'm guessing that there is logic in traffic_server that closes the socket before reading the get request causing the reset to be sent.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)