You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficcontrol.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/06/26 21:23:00 UTC

[jira] [Commented] (TC-401) Traffic Router Serves OFFLINE Caches

    [ https://issues.apache.org/jira/browse/TC-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063811#comment-16063811 ] 

ASF GitHub Bot commented on TC-401:
-----------------------------------

GitHub user elsloo opened a pull request:

    https://github.com/apache/incubator-trafficcontrol/pull/702

    [TC-401] Fixes a race condition related to lazy loading of CacheLocat…

    …ions on a NetworkNode when state changes occur from OFFLINE<->ONLINE within a CRConfig snapshot.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/elsloo/incubator-trafficcontrol 2.1.x_fix_network_node_lazy_loading_race

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafficcontrol/pull/702.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #702
    
----
commit 37a978e24e2d79b0441a0d3bb3ca2fe9dc6fac66
Author: Jeff Elsloo <je...@cable.comcast.com>
Date:   2017-06-26T21:19:38Z

    [TC-401] Fixes a race condition related to lazy loading of CacheLocations on a NetworkNode when state changes occur from OFFLINE<->ONLINE within a CRConfig snapshot.

----


> Traffic Router Serves OFFLINE Caches
> ------------------------------------
>
>                 Key: TC-401
>                 URL: https://issues.apache.org/jira/browse/TC-401
>             Project: Traffic Control
>          Issue Type: Bug
>          Components: Traffic Router
>    Affects Versions: 2.0.0
>            Reporter: Jeff Elsloo
>             Fix For: 2.1.0
>
>
> We identified an issue that causes Traffic Router to serve up an {{OFFLINE}} cache indefinitely after a snapshot of the CRConfig. This bug will also do the inverse, where a cache that was previously set to {{OFFLINE}} will never have traffic routed to it when set back to {{ONLINE}} or {{REPORTED}} (referenced only as {{ONLINE}} henceforth).
> The bug is caused by {{ConfigHandler.processConfig()}} clearing the cache locations from the {{NetworkNode}} prior to swapping out the instance of {{CacheRegister}}. When the cache locations have been cleared, but the prior {{CacheRegister}} is still in place, a race condition can occur where the {{CacheLocation}} for a given cache group from the prior config will be set on the recently cleared {{NetworkNode}}. When this happens, the {{List<Cache>}} contains the prior config's list for that cache group, which means that any host state change from/to {{ONLINE}} or {{OFFLINE}} will not be reflected. This is because when transitioning to {{OFFLINE}} the {{Cache}} drops from the CRConfig and it will reappear when set to {{ONLINE}}. Contrast this with {{ONLINE}} to {{ADMIN_DOWN}}, the {{Cache}} remains in the CRConfig, so we are simply using the status to determine whether the cache is available and the software works as designed.
> This is due to the way we use lazy loading to associate network ranges within the CZF with {{CacheLocations}} within a given {{NetworkNode}} representing that section of the CZF. In {{TrafficRouter}}, during cache selection, if we have a hit in the coverage zone file but the {{CacheLocation}} is uninitialized, we obtain the {{CacheLocation}} from {{CacheRegister}} and set it for that specific {{NetworkNode}}. If our {{NetworkNode}} is cleared but our {{CacheRegister}} has yet to be swapped, we will set the {{NetworkNode}} to the old {{CacheLocation}} and as mentioned, which will have a reference to the prior {{List<Cache>}}, denying anyone the opportunity to populate that {{NetworkNode}} with the new {{CacheLocation}} and new {{List<Cache>}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)