You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@trafficserver.apache.org by Jan-Frode Myklebust <ja...@tanso.net> on 2012/09/16 22:52:53 UTC

3.2.0 segfaulting

I'm struggeling with getting core-dumps for some crashes we're seeing
with ATS 3.2.0 (+TS-1392 patch). What we're seeing is this logged in
"dmesg":

[ET_SSL 5][29708]: segfault at 0 ip 00002b7b94f14425 sp
00002b7ba4eadb88 error 4 in libssl.so.1.0.0[2b7b94edf000+53000]

and traffic.out saying:

[Sep 16 17:17:23.316] Manager {0x7f84e91a37e0} ERROR:
[LocalManager::pollMgmtProcessServer] Server Process terminated due to
Sig 11: Segmentation fault
[Sep 16 17:17:23.321] Manager {0x7f84e91a37e0} ERROR:  (last system
error 2: No such file or directory)
[Sep 16 17:17:23.321] Manager {0x7f84e91a37e0} ERROR:
[Alarms::signalAlarm] Server Process was reset
[Sep 16 17:17:23.321] Manager {0x7f84e91a37e0} ERROR:  (last system
error 2: No such file or directory)
[Sep 16 17:17:24.331] Manager {0x7f84e91a37e0} NOTE:
[LocalManager::startProxy] Launching ts process
[TrafficServer] using root directory '/usr'
[Sep 16 17:17:24.352] Manager {0x7f84e91a37e0} NOTE:
[LocalManager::pollMgmtProcessServer] New process connecting fd '14'
[Sep 16 17:17:24.352] Manager {0x7f84e91a37e0} NOTE:
[Alarms::signalAlarm] Server Process born
[Sep 16 17:17:25.367] {0x2b77790c0d00} STATUS: opened
/var/log/trafficserver/diags.log
[Sep 16 17:17:25.367] {0x2b77790c0d00} NOTE: updated diags config
[Sep 16 17:17:25.371] Server {0x2b77790c0d00} NOTE: cache clustering disabled
[Sep 16 17:17:25.395] Server {0x2b77790c0d00} NOTE: cache clustering disabled
[Sep 16 17:17:25.430] Server {0x2b77790c0d00} ERROR: Cannot insert duplicate!
[Sep 16 17:17:25.431] Server {0x2b77790c0d00} ERROR: Cannot insert duplicate!
[Sep 16 17:17:25.431] Server {0x2b77790c0d00} ERROR: Cannot insert duplicate!
[Sep 16 17:17:25.483] Server {0x2b77790c0d00} NOTE: logging
initialized[15], logging_mode = 3
[Sep 16 17:17:25.536] Server {0x2b77790c0d00} NOTE: traffic server running
[Sep 16 17:17:25.861] Server {0x2b7779e38700} NOTE: cache enabled


I've tried enabling core-dumps the way I got it working earlier, but
see no core dumps.. :

    sysctl -w fs.suid_dumpable=1
    sysctl -w kernel.core_pattern=/tmp/core.%e.%p
    records.config: CONFIG proxy.config.stack_dump_enabled INT 0
    /etc/profile: ulimit -c unlimited >/dev/null 2>&1
    /etc/sysconfig/init: DAEMON_COREFILE_LIMIT='unlimited'

Does the crash seem familiar to anybody? Or could someone please help
me with other settings needed to enable core dumps on RHEL6.3 so that
I can provide a better bug report ?


  -jf

Re: Log.cc and Free-/Open-BSD

Posted by Phil Sorber <so...@apache.org>.

On Mon, Sep 17, 2012 at 5:52 AM, Daniel Gruno <ru...@cord.dk> wrote:
> Hello, happy people,
>
> Lately, I've been wrapping my head around Traffic Server 3.2/3.3 not
> running well on FreeBSD. The exact issue is described in TS-993 as well:
> 1) When starting TS, it runs up a hefty CPU bill (100% cpu used at all
> times), even when idling.
> 2) It crashes and burns when compiled with --enable-debug, complaining:
>
>    FATAL: ../../lib/ts/ink_thread.h:267: failed assert
>    `pthread_cond_wait(cp, mp) == 0`
>
> After giving up on doing a git bisect (my computer is simply too slow
> for all those recompiles), I tried running it through callgrind to
> analyze the function calls being made, and discovered that
> LogObjectManager::flush_buffers() was being called about 11 million
> times during the first few minutes, which is not good. So I opened up
> Log.cc, and discovered, to my surprise, that, apart from flushing
> buffers in a loop there, we are calling ink_cond_wait without any
> apparent locking of the flush_mutex we are supposed to release while
> waiting for the condition. On FreeBSD at least, this results in an EPERM
> error (caller does not own the thread being released), which in turn
> means that there will be no waiting, it's just one big cpu sink.
>
> The addition of "ink_mutex_try_acquire(&flush_mutex);" before the
> ink_cond_wait, seems to have fixed this problem, and TS starts fine,
> doesn't use 100% while idling, and doesn't complain when running in
> debug mode,ie an apparent win-win situation for my FreeBSD machines.
>
> However - and because Igor told me to - since this doesn't seem to be an
> issue on Linux, I was wondering...does the mutex in question lock
> somewhere else that I am unaware of, or did we simply forget to lock it
> and are lucky that Linux somehow takes care of this blunder for us?
>
> In any case, I don't think adding ink_mutex_try_acquire could hurt
> anything, and since it does seem to fix the FreeBSD/OpenBSD issue at
> hand, I am mostly interested in any comments you lot would have about it
> before I go and commit the fix (if it is a fix, that's what I'm asking ;).
>
> With regards,
> Daniel.

>From the pthread_cond_wait man page:

"The pthread_cond_timedwait() and pthread_cond_wait() functions shall
block on a condition variable. They shall be called with mutex locked
by the calling thread  or  undefined  behavior
       results."

So I think you found out what "undefined behavior" means on FreeBSD.
>From what I can see that mutex is used nowhere else, so it seems it's
a happy coincidence that it works on linux. I say +1 for adding the
fix. Also, good work on tracking that down!