You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Thomas Jackson (JIRA)" <ji...@apache.org> on 2016/11/12 01:02:34 UTC

[jira] [Assigned] (TS-5052) Segfault in HostDB sync if something fails while not holding the parent continuation mutex

     [ https://issues.apache.org/jira/browse/TS-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Jackson reassigned TS-5052:
----------------------------------

    Assignee: Thomas Jackson

> Segfault in HostDB sync if something fails while not holding the parent continuation mutex
> ------------------------------------------------------------------------------------------
>
>                 Key: TS-5052
>                 URL: https://issues.apache.org/jira/browse/TS-5052
>             Project: Traffic Server
>          Issue Type: Bug
>            Reporter: Thomas Jackson
>            Assignee: Thomas Jackson
>
> What we noticed was the following in traffic.out:
> {code}
> Server {0x2af761e0d700} WARNING: <P_RefCountCache.h:510 (initialize_storage)> Unable to create temporary file /var/trafficserver/
> host.db.syncing, unable to persist hostdb: -13 error:Permission denied
> traffic_server: Segmentation fault (Address not mapped to object [0x28])traffic_server - STACK TRACE: 
> {code}
> Which lead me to dig into it-- and it turns out the issue is related to changes after the HostDB rewrite to move syncing outside of the main NET threads. Before all calls to this syncer where done in a single net thread wherever it was initially scheduled. Now we bounce between ET_TASK threads and ET_NET threads (to avoid switching, lock contention, etc.)-- but the error handlers weren't updated to handle this situation.
> So to fix this, I've created a "set_error" and "return_error" method to the RefCountCacheSerializer which will take this into consideration-- specifically that it will immediately return the error if scheduled in the calling thread-- otherwise it'll reschedule onto that thread *then* return the error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)