You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Alan M. Carroll (JIRA)" <ji...@apache.org> on 2012/10/01 22:19:07 UTC

[jira] [Commented] (TS-1487) the ordering of plugin_init and init_HttpProxyServer cause crashed TS to core endlessly

    [ https://issues.apache.org/jira/browse/TS-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467149#comment-13467149 ] 

Alan M. Carroll commented on TS-1487:
-------------------------------------

I spent some time looking at the code for this. I think it's feasible to implement a useful solution.

We have 5 events:

A) Internal plugins are initialized
B) Sockets are opened
C) External (client) plugins are initialized
D) Accept threads are started and listen on sockets.
E) Start cache

A key part of the problem is that B and D are conflated although there is no fundamental reason for this. In fact there is already code that (to some extent) handles them separately because the sockets can be opened in traffic_manager. The HttpProxyPort data structure already tracks the socket open state for this reason. It might actually simplify the code because as of now there are basically duplicate chunks of code that set all of the socket parameters.

Pushing D past E may be a little trickier. I'll take a look at that.
                
> the ordering of plugin_init and init_HttpProxyServer cause crashed TS to core endlessly
> ---------------------------------------------------------------------------------------
>
>                 Key: TS-1487
>                 URL: https://issues.apache.org/jira/browse/TS-1487
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.2.0
>         Environment: Linux RHEL6.2
>            Reporter: Aidan McGurn
>            Assignee: Alan M. Carroll
>            Priority: Critical
>         Attachments: INTD-529-RespawnCrash.patch, INTD-529-RespawnCrash.patch
>
>
> We've had a serious issue whereby the TS when it crashes re-spawns/cores continuously when its tries to re-start under load. I traced the issue to SNMP research library (a third party lib)- They use selects and what happens is the file descriptor number spikes under load after the crash as all the sockets get opened at once - this causes buffer overflow in the select (which their library is full of) as the fd allocated to the FD_SET is much bigger than the FD_SETSIZE of 1024 (which  was a bitch to track down as the stack was corrupted and gdb therefore useless). Tracing why this happened on 3.2.0 and not 3.0.2, I find the sequence 
> of the plugin_init has changed - On 3.0.2 the sequence was in effect  1. plugin_init and then 2. init_HttpProxyServer. Whereas this has mysteriously been reversed on 3.2.0. In order to get our system to work in this crash case , I've patched ATS to flip them around like in 3.0.2.
> i'll attach the patch we propose we need to use to get around this.
> Is this actually a bug then waiting to happen in other systems - Or was there a reason to change this sequence?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira