You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Jared Ocker (JIRA)" <ji...@apache.org> on 2014/12/16 00:03:15 UTC

[jira] [Comment Edited] (TS-1975) LocalManager may cause manager crash

    [ https://issues.apache.org/jira/browse/TS-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247393#comment-14247393 ] 

Jared Ocker edited comment on TS-1975 at 12/15/14 11:02 PM:
------------------------------------------------------------

Can you provide any additional details on what was rewritten and in what version(s)?  I did a completely new compile of version 4.2.2 on RHEL 6.6 and it's still core dumping like crazy.  Our linux team got the debuginfo packages installed and below is the gdb backtrace:

{code}
(gdb) backtrace full
#0  0x0000003ae5832625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        resultvar = 0
        pid = <value optimized out>
        selftid = 13147
#1  0x0000003ae5833e05 in abort () at abort.c:92
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0, sa_sigaction = 0}, sa_mask = {__val = {0, 0, 0, 0, 1416332663, 252958501720, 47570620698208, 7955998172648928882,
              4294967295, 47570613743360, 5, 2358760, 0, 0, 3, 47570613723136}}, sa_flags = -452927424, sa_restorer = 0x5}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00002b43e58ae6d8 in ink_die_die_die (retval=1) at ink_error.cc:43
No locals.
#3  0x00002b43e58ae7a5 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1,
    message_format=0x2b43e58bc748 "%s:%d: failed assert `%s`", ap=0x2b44002829f0) at ink_error.cc:65
        extended_format = "FATAL: %s:%d: failed assert `%s`", '\000' <repeats 3288 times>, "`'(\000D+\000\000\034\000\000\000\000\000\000\000@\a\363\345C+\000\000\244\360\234\a\000\000\000\000\"\234\000\345:", '\000' <repeats 11 times>"\302, s\036\000\000\000\000\000$\000\000\000\000\000\000\000xv\210\345C+\000\000\000\000\000\000\000\000\000\000\340((\000D+\000\000\bw\210\345C+\000\000\200ň\345C+", '\000' <repeats 50 times>"\360, V\210\345C+", '\000' <repeats 18 times>"\360, V\210\345C+\000\000\272Έ\345C+\000\000\310Ȉ\345C+\000\000\200ň\345C+\000\000\000\000\000\000\005\000\000\000E\003\000\000\001", '\000' <repeats 19 times>, "HZ\210\345C+\000\000\060"...
        message = "FATAL: MIME.cc:1544: failed assert `field->is_live()`", '\000' <repeats 4042 times>
#4  0x00002b43e58ae86e in ink_fatal (return_code=1, message_format=0x2b43e58bc748 "%s:%d: failed assert `%s`") at ink_error.cc:73
        ap = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 0x2b4400282ad0, reg_save_area = 0x2b4400282a10}}
#5  0x00002b43e58ad488 in _ink_assert (expression=0x73b402 "field->is_live()", file=0x73abff "MIME.cc", line=1544) at ink_assert.cc:37
No locals.
#6  0x000000000061003a in mime_hdr_field_detach (mh=0x2b43eef4b8c8, field=0x2b43eef4b978, detach_all_dups=false) at MIME.cc:1544
        next_dup = 0x0
#7  0x0000000000610314 in mime_hdr_field_delete (heap=0x2b43eef4b810, mh=0x2b43eef4b8c8, field=0x2b43eef4b978, delete_all_dups=false) at MIME.cc:1619
No locals.
#8  0x00000000004f55d8 in TSMimeHdrFieldDestroy (bufp=0x31125c0, mh_mloc=0x2b43eef4b898, field_mloc=0x31201f0) at InkAPI.cc:2793
        mh = 0x2b43eef4b8c8
        heap = 0x2b43eef4b810
        field_handle = 0x31201f0
#9  0x00002b43edd0c52c in fetch_resource (cont=0x310bbf0, event=TS_EVENT_IMMEDIATE, edata=0x2b43f0232260) at rfc5861.c:460
        state = 0x31124b0
        consume_cont = 0x310bb80
        connection_hdr_loc = 0x31201f0
        connection_hdr_dup_loc = 0x0
#10 0x00000000004f0c26 in INKContInternal::handle_event (this=0x310bbf0, event=1, edata=0x2b43f0232260) at InkAPI.cc:997
No locals.
#11 0x00000000004e7f64 in Continuation::handleEvent (this=0x310bbf0, event=1, data=0x2b43f0232260) at ../iocore/eventsystem/I_Continuation.h:146
No locals.
#12 0x00000000006edafe in EThread::process_event (this=0x2b43f64f7010, e=0x2b43f0232260, calling_code=1) at UnixEThread.cc:145
        c_temp = 0x310bbf0
        lock = {m = {m_ptr = 0x2b43f022b830}, lock_acquired = true}
#13 0x00000000006edd4e in EThread::execute (this=0x2b43f64f7010) at UnixEThread.cc:196
        done_one = false
        e = 0x2b43f0232260
        NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x0}, tail = 0x0}
        next_time = 1416332663161650883
#14 0x00000000006ed02b in spawn_thread_internal (a=0x3110ed0) at Thread.cc:88
        p = 0x3110ed0
#15 0x00002b43e5d169d1 in start_thread (arg=0x2b4400283700) at pthread_create.c:301
        __res = <value optimized out>
        pd = 0x2b4400283700
        now = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {47571060406016, 312099895748216673, 140733589549360, 47571060406720, 0, 3, 5970873034414901089,
                5968121152373147489}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <value optimized out>
        pagesize_m1 = <value optimized out>
        sp = <value optimized out>
---Type <return> to continue, or q <return> to quit---
        freesize = <value optimized out>
#16 0x0000003ae58e89dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
No locals.
{code}


was (Author: ocker):
Can you provide any additional details on what was rewritten and in what version(s)?  I did a completely new compile of version 4.2.2 on RHEL 6.6 and it's still core dumping like crazy.  Our linux team got the debuginfo packages installed and below is the gdb backtrace:

(gdb) backtrace full
#0  0x0000003ae5832625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        resultvar = 0
        pid = <value optimized out>
        selftid = 13147
#1  0x0000003ae5833e05 in abort () at abort.c:92
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0, sa_sigaction = 0}, sa_mask = {__val = {0, 0, 0, 0, 1416332663, 252958501720, 47570620698208, 7955998172648928882,
              4294967295, 47570613743360, 5, 2358760, 0, 0, 3, 47570613723136}}, sa_flags = -452927424, sa_restorer = 0x5}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00002b43e58ae6d8 in ink_die_die_die (retval=1) at ink_error.cc:43
No locals.
#3  0x00002b43e58ae7a5 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1,
    message_format=0x2b43e58bc748 "%s:%d: failed assert `%s`", ap=0x2b44002829f0) at ink_error.cc:65
        extended_format = "FATAL: %s:%d: failed assert `%s`", '\000' <repeats 3288 times>, "`'(\000D+\000\000\034\000\000\000\000\000\000\000@\a\363\345C+\000\000\244\360\234\a\000\000\000\000\"\234\000\345:", '\000' <repeats 11 times>"\302, s\036\000\000\000\000\000$\000\000\000\000\000\000\000xv\210\345C+\000\000\000\000\000\000\000\000\000\000\340((\000D+\000\000\bw\210\345C+\000\000\200ň\345C+", '\000' <repeats 50 times>"\360, V\210\345C+", '\000' <repeats 18 times>"\360, V\210\345C+\000\000\272Έ\345C+\000\000\310Ȉ\345C+\000\000\200ň\345C+\000\000\000\000\000\000\005\000\000\000E\003\000\000\001", '\000' <repeats 19 times>, "HZ\210\345C+\000\000\060"...
        message = "FATAL: MIME.cc:1544: failed assert `field->is_live()`", '\000' <repeats 4042 times>
#4  0x00002b43e58ae86e in ink_fatal (return_code=1, message_format=0x2b43e58bc748 "%s:%d: failed assert `%s`") at ink_error.cc:73
        ap = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 0x2b4400282ad0, reg_save_area = 0x2b4400282a10}}
#5  0x00002b43e58ad488 in _ink_assert (expression=0x73b402 "field->is_live()", file=0x73abff "MIME.cc", line=1544) at ink_assert.cc:37
No locals.
#6  0x000000000061003a in mime_hdr_field_detach (mh=0x2b43eef4b8c8, field=0x2b43eef4b978, detach_all_dups=false) at MIME.cc:1544
        next_dup = 0x0
#7  0x0000000000610314 in mime_hdr_field_delete (heap=0x2b43eef4b810, mh=0x2b43eef4b8c8, field=0x2b43eef4b978, delete_all_dups=false) at MIME.cc:1619
No locals.
#8  0x00000000004f55d8 in TSMimeHdrFieldDestroy (bufp=0x31125c0, mh_mloc=0x2b43eef4b898, field_mloc=0x31201f0) at InkAPI.cc:2793
        mh = 0x2b43eef4b8c8
        heap = 0x2b43eef4b810
        field_handle = 0x31201f0
#9  0x00002b43edd0c52c in fetch_resource (cont=0x310bbf0, event=TS_EVENT_IMMEDIATE, edata=0x2b43f0232260) at rfc5861.c:460
        state = 0x31124b0
        consume_cont = 0x310bb80
        connection_hdr_loc = 0x31201f0
        connection_hdr_dup_loc = 0x0
#10 0x00000000004f0c26 in INKContInternal::handle_event (this=0x310bbf0, event=1, edata=0x2b43f0232260) at InkAPI.cc:997
No locals.
#11 0x00000000004e7f64 in Continuation::handleEvent (this=0x310bbf0, event=1, data=0x2b43f0232260) at ../iocore/eventsystem/I_Continuation.h:146
No locals.
#12 0x00000000006edafe in EThread::process_event (this=0x2b43f64f7010, e=0x2b43f0232260, calling_code=1) at UnixEThread.cc:145
        c_temp = 0x310bbf0
        lock = {m = {m_ptr = 0x2b43f022b830}, lock_acquired = true}
#13 0x00000000006edd4e in EThread::execute (this=0x2b43f64f7010) at UnixEThread.cc:196
        done_one = false
        e = 0x2b43f0232260
        NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x0}, tail = 0x0}
        next_time = 1416332663161650883
#14 0x00000000006ed02b in spawn_thread_internal (a=0x3110ed0) at Thread.cc:88
        p = 0x3110ed0
#15 0x00002b43e5d169d1 in start_thread (arg=0x2b4400283700) at pthread_create.c:301
        __res = <value optimized out>
        pd = 0x2b4400283700
        now = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {47571060406016, 312099895748216673, 140733589549360, 47571060406720, 0, 3, 5970873034414901089,
                5968121152373147489}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <value optimized out>
        pagesize_m1 = <value optimized out>
        sp = <value optimized out>
---Type <return> to continue, or q <return> to quit---
        freesize = <value optimized out>
#16 0x0000003ae58e89dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
No locals.

> LocalManager may cause manager crash
> ------------------------------------
>
>                 Key: TS-1975
>                 URL: https://issues.apache.org/jira/browse/TS-1975
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Manager
>    Affects Versions: 3.3.4
>            Reporter: Zhao Yongming
>            Assignee: portl4t
>              Labels: Crash
>
> when something wrong with the LocalManager, with [LocalManager::pollMgmtProcessServer] Error in read (errno: 104), then you will get manager and server restart.
> {code}
> Jun 17 17:40:06 cache163 traffic_manager[25654]: {0x7f528b4297e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104)
> Jun 17 17:40:06 cache163 traffic_manager[25654]: {0x7f528b4297e0} FATAL:  (last system error 104: Connection reset by peer)
> Jun 17 17:40:06 cache163 traffic_manager[25654]: {0x7f528b4297e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message
> Jun 17 17:40:06 cache163 traffic_manager[25654]: {0x7f528b4297e0} ERROR:  (last system error 32: Broken pipe)
> Jun 17 17:40:07 cache163 traffic_cop[25652]: cop received child status signal [25654 2816]
> Jun 17 17:40:07 cache163 traffic_cop[25652]: traffic_manager not running, making sure traffic_server is dead
> Jun 17 17:40:07 cache163 traffic_cop[25652]: spawning traffic_manager
> Jun 17 17:40:07 cache163 traffic_manager[10118]: NOTE: --- Manager Starting ---
> Jun 17 17:40:07 cache163 traffic_manager[10118]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 51516 on Jun 15 2013 at 16:01:06)
> Jun 17 17:40:07 cache163 traffic_manager[10118]: NOTE: RLIMIT_NOFILE(7):cur(160000),max(160000)
> Jun 17 17:40:07 cache163 traffic_manager[10118]: {0x7f26fc24a7e0} STATUS: opened /var/log/trafficserver/manager.log
> Jun 17 17:40:09 cache163 traffic_server[10131]: NOTE: --- Server Starting ---
> Jun 17 17:40:09 cache163 traffic_server[10131]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 51516 on Jun 15 2013 at 16:01:31)
> Jun 17 17:40:09 cache163 traffic_server[10131]: {0x2b167ded2280} STATUS: opened /var/log/trafficserver/diags.log
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)