You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/10/11 09:28:20 UTC

[jira] [Work logged] (TS-4915) Crash from hostdb in PriorityQueueLess

     [ https://issues.apache.org/jira/browse/TS-4915?focusedWorklogId=30355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30355 ]

ASF GitHub Bot logged work on TS-4915:
--------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Oct/16 09:27
            Start Date: 11/Oct/16 09:27
    Worklog Time Spent: 10m 
      Work Description: GitHub user shinrich opened a pull request:

    https://github.com/apache/trafficserver/pull/1088

    TS-4915: Crash from hostdb in PriorityQueueLess

    These changes have been running on my production box since leaving work Monday night.  Will keep an eye on it.  Lower traffic overnight might not be stressing it sufficiently.
    
    The main change was in PriorityQueueLess<>::erase.  The assignment of the end item to the erase point was not preserving the entry index.  So the assumption that entry->index is less than _v.length() was made invalid the next time around.  I think breaking this entry->index == _v index assignment can also harm the bubble_sorting logic.  I think PriorityQueueLess<>::pop also has a problem, but my work load was not triggering that function, so I didn't dive in there.
    
    The other change was in RefCountCachePartition<C>::make_space_for.  There was an extra pop which I believe was doubly removing an entry already removed in PriorityQueueLess::erase (called from RefCountCachePartition<C>::erase).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shinrich/trafficserver ts-4915-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/trafficserver/pull/1088.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1088
    
----
commit 0898a59bc33d63d18997a66437c808acd2e7e073
Author: Susan Hinrichs <sh...@ieee.org>
Date:   2016-10-11T09:20:11Z

    TS-4915: Crash from hostdb in PriorityQueueLess

----


Issue Time Tracking
-------------------

            Worklog Id:     (was: 30355)
            Time Spent: 10m
    Remaining Estimate: 0h

> Crash from hostdb in PriorityQueueLess
> --------------------------------------
>
>                 Key: TS-4915
>                 URL: https://issues.apache.org/jira/browse/TS-4915
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: HostDB
>            Reporter: Susan Hinrichs
>            Priority: Blocker
>             Fix For: 7.1.0
>
>         Attachments: ts-4915.diff, ts-4915.diff
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x0000000000547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x000000000054988d in PriorityQueueLess<RefCountCacheHashEntry*>::operator() (this=0x2b78a9a2587b, a=@0x2b78f402af68, b=@0x2b78f402aa28)
>     at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x0000000000549785 in PriorityQueue<RefCountCacheHashEntry*, PriorityQueueLess<RefCountCacheHashEntry*> >::_bubble_up (this=0x1cb2990, 
>     index=2) at ../lib/ts/PriorityQueue.h:191
>         comp = {<No data fields>}
>         parent = 0
> #3  0x00000000006ecfcc in PriorityQueue<RefCountCacheHashEntry*, PriorityQueueLess<RefCountCacheHashEntry*> >::push (this=0x1cb2990, 
>     entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
>         len = 2
> #4  0x00000000006ec206 in RefCountCachePartition<HostDBInfo>::put (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
>     expire_time=1475202356) at ./P_RefCountCache.h:210
>         expiry_entry = 0x2b78f402af60
>         __func__ = "put"
>         val = 0x1cc0880
> #5  0x00000000006eb3de in RefCountCache<HostDBInfo>::put (this=0x18051e0, key=6912554662447498853, item=0x2b78aee04f00, size=16, 
>     expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x00000000006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, event=600, e=0x2b78ac009440) at HostDB.cc:1422
>         is_rr = false
>         old_rr_data = 0x0
>         first_record = 0x2b78ac0094f8
>         m = 0x1
>         failed = false
>         old_r = {m_ptr = 0x0}
>         af = 2 '\002'
>         s_size = 16
>         rrsize = 0
>         allocSize = 16
>         r = 0x2b78aee04f00
>         old_info = {<RefCountObj> = {<ForceVFPTToTop> = {_vptr.ForceVFPTToTop = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>           key = 47797242059264, app = {allotment = {application1 = 5326300, application2 = 0}, http_data = {http_version = 4, 
>               pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
>             ip = {sa = {sa_family = 54488, sa_data = "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, sin_port = 94, 
>                 sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
>                 sin6_addr = {__in6_u = {__u6_addr8 = "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 11128, 
>                       0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
>             hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>           hostname_offset = 11128, ip_timestamp = 2845989456, ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>           round_robin_elt = 0}
>         valid_records = 0
>         tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
>                 __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
>             _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>               47797215710448}}}
>         ttl_seconds = 132
>         aname = 0x2b7938021000 "fbmm1.zenfs.com"
>         offset = 96
>         thread = 0x2b78a8101010
>         __func__ = "dnsEvent"
> #7  0x00000000005145dc in Continuation::handleEvent (this=0x2b7938020f00, event=600, data=0x2b78ac009440)
>     at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8  0x00000000006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at DNS.cc:1269
>         __func__ = "postEvent"
> #9  0x00000000005145dc in Continuation::handleEvent (this=0x2b78f4028600, event=1, data=0x2aac954db040)
>     at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #10 0x00000000007bc9be in EThread::process_event (this=0x2b78a8101010, e=0x2aac954db040, calling_code=1) at UnixEThread.cc:143
>         c_temp = 0x2b78f4028600
>         lock = {m = {m_ptr = 0x17dea10}, lock_acquired = true}
>         __func__ = "process_event"
> #11 0x00000000007bcc2d in EThread::execute (this=0x2b78a8101010) at UnixEThread.cc:197
>         done_one = false
>         e = 0x2aac954db040
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x18ce400}, tail = 0x18ce400}
>         next_time = 1475191803711988905
>         __func__ = "execute"
> #12 0x00000000007bbfd2 in spawn_thread_internal (a=0x17fb9a0) at Thread.cc:84
>         p = 0x17fb9a0
> #13 0x00002b78a2555aa1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #14 0x00000032310e893d in clone () from /lib64/libc.so.6
> No symbol table info available.
> core == ET_NET 13 and core == ET_NET 20
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)