You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Gancho Tenev (JIRA)" <ji...@apache.org> on 2016/01/29 00:26:39 UTC

[jira] [Created] (TS-4161) ProcessManager prone to stack-overflow

Gancho Tenev created TS-4161:
--------------------------------

             Summary: ProcessManager prone to stack-overflow
                 Key: TS-4161
                 URL: https://issues.apache.org/jira/browse/TS-4161
             Project: Traffic Server
          Issue Type: Bug
          Components: Manager
            Reporter: Gancho Tenev


ProcessManager::pollLMConnection() can get "stuck" in a loop while handling big number of messages in a raw from the same socket. 

Since alloca() is used to allocate buffers on the stack for each message read from the socket, and those buffers are not released until the function returns, getting "stuck" in the loop can lead to stack-overflow, fwiw same could happen if the message length is big enough (accidentally or on purpose).

It can be reproduced easily by setting up:
            proxy.config.lm.pserver_timeout_secs: 0
            proxy.config.lm.pserver_timeout_msecs: 0
in records.config and running ./bin/traffic_manager. 

ATS crashes with a segfault in a weird place (while trying to allocate with malloc()). If you inspect the core you would see that it got "stuck" in the loop before it crashed over-flowing the stack (kept allocating buffers on the stack with alloca() until it crashed).

It is worth considering replacing the alloca() with VLA (which "releases" memory when out of scope on each iteration of the loop) or using ats_malloc() which is supposedly less time-efficient but would be better to handle bigger messages without worrying about stack-overflow. 

IMO adding a message size limit check is a good practice especially with the current implementation.

If the code gets "stuck" in the while loop while reading big number of messages in a row from the same socket then the port configured by proxy.config.process_manager.mgmt_port becomes unavailable (connection refused). Adding a limit of messages that can be processed in a row should be a good idea.

I stumbled up on this while running TSQA regression tests where TSQA kept complaining that the management port is not available and the ATS kept crashing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)