You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficserver.apache.org by Brian Geffon <br...@gmail.com> on 2011/08/17 04:17:19 UTC

SEGFAULT on MUTEX_TRY_LOCK_FOR

Hi, for some reason I'm getting a segfault originating in
UnixEThread.cc:130 from the macro expansion of MUTEX_TRY_LOCK_FOR.
I've been digging and I can't exactly see where the SEGFAULT is coming
from and I was hoping someone might have an idea of how/where to look.
Attached is gdb dump with variable printouts

Thanks

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff65fd700 (LWP 25872)]
0x00000000006fc663 in EThread::process_event (this=0x7ffff6a02010,
e=0x1cdae00, calling_code=1) at UnixEThread.cc:130
130	  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
Missing separate debuginfos, use: debuginfo-install
expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6.x86_64
keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6.x86_64
libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64
libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64
openssl-1.0.0-10.el6.x86_64 pcre-7.8-3.1.el6.x86_64
tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
(gdb)

(gdb) thread 4
[Switching to thread 4 (Thread 0x7ffff65fd700 (LWP 25872))]#0
0x00000000006fc663 in EThread::process_event (this=0x7ffff6a02010,
e=0x1cdae00, calling_code=1) at UnixEThread.cc:130
130	  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
(gdb) bt full
#0  0x00000000006fc663 in EThread::process_event (this=0x7ffff6a02010,
e=0x1cdae00, calling_code=1) at UnixEThread.cc:130
        lock = {m = {m_ptr = 0x7ffff65fcd20}, lock_acquired = 202}
#1  0x00000000006fcbaf in EThread::execute (this=0x7ffff6a02010) at
UnixEThread.cc:232
        done_one = false
        e = 0x1cdae00
        NegativeQueue = {<DLL<Event, Event::Link_link>> = {head =
0xfc76b0}, tail = 0xfc76b0}
        next_time = 1313544853901824000
#2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e50) at Thread.cc:88
        p = 0xfb7e50
#3  0x00000036198077e1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00000036190e68ed in clone () from /lib64/libc.so.6
No symbol table info available.
(gdb) p lock
$1 = {m = {m_ptr = 0x7ffff65fcd20}, lock_acquired = 202}
(gdb) p *e
$2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1e433c8,
mutex = {m_ptr = 0x7fffe01f7fc0}, cancelled = 1}, ethread =
0x7ffff6a02010, in_the_prot_queue = 0, in_the_priority_queue = 0,
immediate = 1,
  globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at
= 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0},
prev = 0x0}}
(gdb) p *this
$3 = {<Thread> = {_vptr.Thread = 0x777cd0, tid = 140737326864128,
mutex = 0xfa1de0, static cur_time = 1313544915111050000, static
thread_data_key = 0, mutex_ptr = {m_ptr = 0xfa1de0}}, generator = {mt
= {
      6515880703608544270, 16986482079684378148, 15202063160176352569,
4205329379000379692, 12731830085304294028, 5480258858423381456,
10229258432381328244, 972562981554976717, 18395500132552756687,
      4487932761539272140, 12691884673430599292, 7316910863069287584,
5065951866562550114, 4154042561459159904, 15584312626940693380,
9624614458510014996, 9968563369847389685, 12824461920448898420,
      14462382457103121812, 5636903841279708677, 108456372588063946,
10914735368454364476, 1391653386474464686, 339527281604892216,
17583222426572638386, 12714948226675727764, 10384533536666676727,
      12742920906068567642, 7414539615627972288, 13110098895981418225,
7379926632990832018, 6099087619172791125, 2536224650590723892,
382683522953382117, 10231673541280025321, 7107430660860578011,
      7886268976247870260, 7997954524699562408, 18082362942149937173,
15688175327275391936, 15472646731392263073, 277724836093898845,
12487796429353579606, 5944536777161237626, 5665876839336191090,
      15430707508795400154, 16331261647529815279, 5423540733316273306,
17394333482952066578, 11366497846600751298, 16305007425437665835,
713384774098653973, 12297870710583917641, 1896349530920381225,
      3277987015364241249, 9643670098096554062, 8119984560785158182,
16175911704439711675, 3219742715358945838, 4816082870626235158,
14594451917324419702, 7888330045337888144, 5057913160333338563,
      6169348468080883231, 9778178952945849350, 987533948906304956,
16317159449386203166, 3171921364428955140, 10208586364912296467,
7809465257794849055, 1680111128059269845, 16136986776683096294,
      18015668002715527213, 13278377935189106989,
11743452794607383806, 2746730627084060956, 7415684522480326270,
17835167989601050650, 13644349150946908870, 4160024052101881266,
17539766658046938652,
      10407400190329824903, 7304369908628622292, 13352358256700488186,
16302724239278405024, 14811336521658776772, 6785080592012219523,
13704545704488431336, 5974178841673754942, 9373950295534483023,
      1491414981955262570, 4428493940909807087, 10697507311517498481,
7676279848033042950, 8795520090147876568, 14793309434905446269,
17349151737314203601, 318363486776602984, 3252866848761985710,
      14602838786230255279, 6663384150002802509, 13207584062093788442,
3602332261895214748, 12115374842051596721, 17804887703809024225,
15008780001901498735, 8466905236227018594, 15141783284958673310,
      13020813280758784108, 5072890343878616143, 8693479320212297354,
1520327718240798357, 8946077030960316612, 16820020637526772615,
15608438939460737942, 17292348150903688740, 13414047553988954714,
      18179855376969248328, 11967110319132545417, 9932774245514928620,
2652732629280510588, 9475133953596034132, 3187127687516369034,
11950312940000105974, 3745335166691698042, 11697900474902081289,
      15429980149052262194, 15844964295117371452, 9181893980464487301,
2059420496139012809, 970805747506534381, 7714758131813798827,
2551251103176024462, 12652308573540052877, 16631049133880952832,
      8518564418451246754, 3543615080091126111, 10840129230388916602,
13172938275746849944, 11806130369180195425, 12219367846851704235,
10200980404628099541, 3302930645268845505, 11606523947547391428,
      8011850156032096051, 16579373652728601192, 11062673499630823352,
12335033106792541513, 15383019066827270747, 17884405326580452832,
11585644554336583434, 637317886765188468, 11261595716411488360,
      7040759458192158495, 12831465890722188517, 12359754017083295446,
603744962171142352, 8332425911816727819, 10935078845256490889,
5691336731509894171, 13101302067510145880, 16757750617541072598,
      2867325553860021997, 7227754026105151115, 5912367076540545274,
14217371220906922367, 11930007479151925417, 8530725699722547034,
2277108522706323630, 6559767378131734501, 6115368527864630318,
      12369511977458252801, 2678458862493460619, 2403469501489209356,
4082678348210816369, 1232372919321020093, 11037390841346358853,
1666299614233116994, 3932209724916249508, 13499766582553278534,
      7727607699891515274, 16480932790657985627, 1934971005052974066,
13299634884122044899, 6377612005863298617, 18195543869581048595,
13242414943208633671, 17445101213740487654, 12239509476691163607,
      15830603585384378060, 2171274271980641087, 3774577349533238205,
12761267904385570139, 11505287270624992788, 5679565669140047857,
11067143442802170563, 10807939108817954932, 8592623689054871778,
      274254701680346328, 12015802538928499170...}, mti = 126},
eventAllocator = {allocated = 4, freelist = 0x7fffe01d5b90},
netVCAllocator = {allocated = 14, freelist = 0x7fffe401e560},
sslNetVCAllocator = {
    allocated = 0, freelist = 0x0}, inkioNetVCAllocator = {allocated =
0, freelist = 0x0}, httpClientSessionAllocator = {allocated = 0,
freelist = 0x0}, httpServerSessionAllocator = {allocated = 0,
    freelist = 0x0}, cacheVConnectionAllocator = {allocated = 371,
freelist = 0x7fffdc0da8d0}, newCacheVConnectionAllocator = {allocated
= 0, freelist = 0x0}, openDirEntryAllocator = {allocated = 370,
    freelist = 0x7fffc80d2800}, ramCacheCLFUSEntryAllocator =
{allocated = 0, freelist = 0x0}, ramCacheLRUEntryAllocator =
{allocated = 0, freelist = 0x0}, evacuationBlockAllocator = {allocated
= 0,
    freelist = 0x0}, ioDataAllocator = {allocated = 0, freelist =
0x0}, ioBlockAllocator = {allocated = 0, freelist = 0x0},
ioBufAllocator = {{allocated = 0, freelist = 0x0} <repeats 15 times>},
  thread_private = "4\310", '\000' <repeats 30 times>, "j\352 *",
'\000' <repeats 28 times>, "5\025\017,", '\000' <repeats 164
times>"\245, [\001", '\000' <repeats 29 times>"\357, q", '\000'
<repeats 30 times>, "xܕ", '\000' <repeats 29 times>, "xܕ", '\000'
<repeats 29 times>, "\003l", '\000' <repeats 702 times>, "Hl", '\000'
<repeats 54 times>, "|D\000\000\000\000\000\000|D", '\000' <repeats 22
times>"\301, \376\377\377\377\377\377\377\067\212", '\000' <repeats 54
times>, "!6\000\000\000\000\000\000!6", '\000' <repeats 598 times>,
"\022\066\000\000\000\000\000\000\022\066", '\000' <repeats 246
times>, "|D", '\000' <repeats 30 times>, "|D", '\000' <repeats 542
times>..., diskHandler = 0x0, aio_ops = {<DLL<Continuation,
Continuation::Link_link>> = {head = 0x0}, tail = 0x0},
EventQueueExternal = {al = {head = {data = -4413809109769533759},
      name = 0x777abb "ProtectedQueue", offset = 72}, lock = {__data =
{__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0,
__spins = 0, __list = {__prev = 0x0, __next = 0x0}},
      __size = '\000' <repeats 39 times>, __align = 0},
might_have_data = {__data = {__lock = 0, __futex = 6, __total_seq = 3,
__wakeup_seq = 3, __woken_seq = 3, __mutex = 0x7ffff6b02bf8,
__nwaiters = 0,
        __broadcast_seq = 0},
      __size = "\000\000\000\000\006\000\000\000\003\000\000\000\000\000\000\000\003\000\000\000\000\000\000\000\003\000\000\000\000\000\000\000\370+\260\366\377\177\000\000\000\000\000\000\000\000\000",
      __align = 25769803776}, localQueue = {<DLL<Event,
Event::Link_link>> = {head = 0x7fffe01d9f70}, tail = 0x1cdd680}},
EventQueue = {after = {{<DLL<Event, Event::Link_link>> = {head = 0x0},
tail = 0x0},
      {<DLL<Event, Event::Link_link>> = {head = 0x0}, tail = 0x0},
{<DLL<Event, Event::Link_link>> = {head = 0x0}, tail = 0x0},
{<DLL<Event, Event::Link_link>> = {head = 0x0}, tail = 0x0},
      {<DLL<Event, Event::Link_link>> = {head = 0x0}, tail = 0x0},
{<DLL<Event, Event::Link_link>> = {head = 0xfc7650}, tail = 0xfc7650},
{<DLL<Event, Event::Link_link>> = {head = 0x0}, tail = 0x0},
      {<DLL<Event, Event::Link_link>> = {head = 0x0}, tail = 0x0},
{<DLL<Event, Event::Link_link>> = {head = 0x0}, tail = 0x0},
{<DLL<Event, Event::Link_link>> = {head = 0xfc6cf0}, tail =
0x1cde3a0}},
    last_check_time = 1313544915111050000, last_check_buckets =
715977966}, ethreads_to_be_signalled = 0xfbd340,
n_ethreads_to_be_signalled = 0, accept_event = {0x0 <repeats 20
times>}, main_accept_index = -1,
  id = 2, event_types = 1, signal_hook = 0x6d33b8
<net_signal_hook_function(EThread*)>, evfd = 12, ep = 0xfd9a50, tt =
REGULAR, oneevent = 0x0, eventsem = 0x0}
(gdb)

Re: SEGFAULT on MUTEX_TRY_LOCK_FOR

Posted by Brian Geffon <br...@gmail.com>.
ATS 3.0.1 on RedHat Enterprise Linux 6.1

On Tue, Aug 16, 2011 at 7:41 PM, Leif Hedstrom <zw...@apache.org> wrote:
> On 08/16/2011 08:17 PM, Brian Geffon wrote:
>>
>> Hi, for some reason I'm getting a segfault originating in
>> UnixEThread.cc:130 from the macro expansion of MUTEX_TRY_LOCK_FOR.
>> I've been digging and I can't exactly see where the SEGFAULT is coming
>> from and I was hoping someone might have an idea of how/where to look.
>> Attached is gdb dump with variable printouts
>
> Version?
>
> -- leif
>
>

Re: SEGFAULT on MUTEX_TRY_LOCK_FOR

Posted by John Plevyak <jp...@acm.org>.
Some unix systems will hard spin on a delay of less than 10msec.  The result
for a cross thread scheduling couid be that the thread communicated 'to'
might
start immediately and hit the lock held by the 'from' thread before it past
posting
the event.  If the 'from' thread didn't initialize some of the shared data
completely
till after posting the event (on the assumption that there would be a delay)
this might
cause problems.

just an idea.

john

On Mon, Aug 22, 2011 at 1:35 AM, Brian Geffon <br...@gmail.com> wrote:

> If anyone is listening to this thread, I have narrowed the problem
> down to TSContSchedule(), I can't figure out exactly why it's
> happening but when the timeout is very small (either 0 or < 10ms) or
> so, it caues MUTEX_TRY_LOCK_FOR to throw a segfault, if I increase the
> time to 1000ms it will never happen.. I've been digging through the
> code and cannot find anything that would indicate the cause. Does
> anyone have ideas of where I might look? This is not happening on OS
> X, it only happens on RedHat Enterprise, I'll be checking another
> linux distribution soon. Thanks in advance.
>
> Best,
> Brian
>
> On Tue, Aug 16, 2011 at 7:41 PM, Leif Hedstrom <zw...@apache.org> wrote:
> > On 08/16/2011 08:17 PM, Brian Geffon wrote:
> >>
> >> Hi, for some reason I'm getting a segfault originating in
> >> UnixEThread.cc:130 from the macro expansion of MUTEX_TRY_LOCK_FOR.
> >> I've been digging and I can't exactly see where the SEGFAULT is coming
> >> from and I was hoping someone might have an idea of how/where to look.
> >> Attached is gdb dump with variable printouts
> >
> > Version?
> >
> > -- leif
> >
> >
>

Re: SEGFAULT on MUTEX_TRY_LOCK_FOR

Posted by Brian Geffon <br...@gmail.com>.
If anyone is listening to this thread, I have narrowed the problem
down to TSContSchedule(), I can't figure out exactly why it's
happening but when the timeout is very small (either 0 or < 10ms) or
so, it caues MUTEX_TRY_LOCK_FOR to throw a segfault, if I increase the
time to 1000ms it will never happen.. I've been digging through the
code and cannot find anything that would indicate the cause. Does
anyone have ideas of where I might look? This is not happening on OS
X, it only happens on RedHat Enterprise, I'll be checking another
linux distribution soon. Thanks in advance.

Best,
Brian

On Tue, Aug 16, 2011 at 7:41 PM, Leif Hedstrom <zw...@apache.org> wrote:
> On 08/16/2011 08:17 PM, Brian Geffon wrote:
>>
>> Hi, for some reason I'm getting a segfault originating in
>> UnixEThread.cc:130 from the macro expansion of MUTEX_TRY_LOCK_FOR.
>> I've been digging and I can't exactly see where the SEGFAULT is coming
>> from and I was hoping someone might have an idea of how/where to look.
>> Attached is gdb dump with variable printouts
>
> Version?
>
> -- leif
>
>

Re: SEGFAULT on MUTEX_TRY_LOCK_FOR

Posted by Brian Geffon <br...@gmail.com>.
I should also mention that this never happens on OS X, it only appears
to happen on RHEL.

On Tue, Aug 16, 2011 at 7:41 PM, Leif Hedstrom <zw...@apache.org> wrote:
> On 08/16/2011 08:17 PM, Brian Geffon wrote:
>>
>> Hi, for some reason I'm getting a segfault originating in
>> UnixEThread.cc:130 from the macro expansion of MUTEX_TRY_LOCK_FOR.
>> I've been digging and I can't exactly see where the SEGFAULT is coming
>> from and I was hoping someone might have an idea of how/where to look.
>> Attached is gdb dump with variable printouts
>
> Version?
>
> -- leif
>
>

Re: SEGFAULT on MUTEX_TRY_LOCK_FOR

Posted by Leif Hedstrom <zw...@apache.org>.
On 08/16/2011 08:17 PM, Brian Geffon wrote:
> Hi, for some reason I'm getting a segfault originating in
> UnixEThread.cc:130 from the macro expansion of MUTEX_TRY_LOCK_FOR.
> I've been digging and I can't exactly see where the SEGFAULT is coming
> from and I was hoping someone might have an idea of how/where to look.
> Attached is gdb dump with variable printouts

Version?

-- leif