You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficcontrol.apache.org by "John Shen (weifensh)" <we...@cisco.com> on 2017/03/29 13:02:15 UTC

Re: Backup Cache Group Selection

Hi Jeff,

I have just tried the getClosestCacheLocation() logic. It appears the CZF matched lat/long does come from CZF, but the lat/long of the “closest” Cache Groups is from the configuration by Ops. This means to calculate the distance from the matched CG and “closest” CG, the source lat/long is from CZF, but the dest lat/long is not from CZF but from CG settings on Ops. Is this expected behavior? 

Thanks,
John


On 27/01/2017, 10:51 PM, "Jeff Elsloo" <je...@gmail.com> wrote:

    Steve: I don't think the patch is required, however, as Eric found,
    without the patch there could be some gaps depending on the scenario.
    That specific scenario revolved around the "next best cache group" not
    having a DS assigned, or a healthy cache with the DS assigned. In that
    case, despite the hits, you would still end up falling through to the
    geolocation provider. The patch addresses that.
    
    Eric: The rloc field is set via the Geolocation associated with the
    CacheLocation, which ultimately comes from the edgeLocations section
    of the CRConfig. When a CZF lookup is performed inside TR, a hit
    returns a CacheLocation. When caches aren't available within that
    CacheLocation, getClosestCacheLocation() is called, and that's why you
    see the lat/long of the "next best cache group" instead of the actual
    hit's lat/long.
    
    If we want to have granularity in this situation, we might need to 1)
    create a new RestultType, such as ResultType.CZ_NEXT (or something),
    and/or 2) massage the log format such that we either have a the
    original lat/long, and new lat/long in the rloc field, or create a new
    field to save one or the other, such that we log both lat/longs.
    
    Thoughts? Whatever we decide should go into TC-90 so we can apply the
    proposed patch and improve the logging.
    --
    Thanks,
    Jeff
    
    
    On Fri, Jan 27, 2017 at 7:14 AM, Eric Friedrich (efriedri)
    <ef...@cisco.com> wrote:
    > The rloc field usually indicates the Geolocation IP of the client (short for request location)
    >
    > But here it looks like rloc is reflecting the location of the CG it ultimately redirected to (response location?).
    >
    > I would have expected the rloc field to either
    >    1) be blank (because we never did a lookup from geoprovider)
    >         or
    >    2)  to contain the coordinates of the cache group the CZF hit on (in this case us-ga-macon at 32.7261, -83.6547”)
    >
    > —Eric
    >
    >> On Jan 27, 2017, at 8:28 AM, Steve Malenfant <sm...@gmail.com> wrote:
    >>
    >> Jeff,
    >>
    >> CZF properly installed: yes
    >> Network address or not: same behavior
    >>
    >> But you nailed the API one. There is no cache assigned to us-ga-macon,
    >> which is exactly what I'm testing.
    >>
    >> I added cache groups for my testing in the lab which I assigned a few
    >> caches to them :
    >>
    >> - us-ga-atlanta 34.0362 -84.3207
    >> - us-ok-oklahomacity 35.4777 -97.5545
    >> - us-va-nova 38.7922 -77.2136
    >> - us-ca-sandiego 32.7205 -117.0838
    >>
    >> API :
    >> {"locationByGeo":{"city":"Macon","countryCode":"US","latitude":"32.7288","postalCode":"31216","countryName":"United
    >> States","longitude":"-83.6865"},"locationByFederation":"not
    >> found","requestIp":"24.252.192.1","locationByCoverageZone":"not found"}
    >>
    >> Using the X-MM-Client-IP it returned the proper cache based on CZ, it
    >> correctly sent the request to the cache in us-ga-atlanta :
    >> 1485522786.423 qtype=HTTP chi=24.252.192.1 url="
    >> http://crs.cox-col-jitp2.cdn1.coxlab.net/" cqhm=GET cqhv=HTTP/1.1 rtype=CZ
    >> rloc="34.03,-84.32" rdtl=- rerr="-" rgb="-" pssc=302 ttms=0.260 rurl="
    >> http://cdn1cdedge0007.cox-col-jitp2.cdn1.coxlab.net/" rh="-"
    >>
    >> I then changed the coordinate to match the us-ca-sandiego group in the CZF
    >> and now the request is sent to the us-ca-sandiego caches :
    >> 1485523546.345 qtype=HTTP chi=24.252.192.1 url="
    >> http://crs.cox-col-jitp2.cdn1.coxlab.net/" cqhm=GET cqhv=HTTP/1.1 rtype=CZ
    >> rloc="32.72,-117.08" rdtl=- rerr="-" rgb="-" pssc=302 ttms=0.206 rurl="
    >> http://cdn1cdedge0001.cox-col-jitp2.cdn1.coxlab.net/" rh="-
    >>
    >> I'm using 1.6.1 + patch discussed in this email. Not sure if those are
    >> necessary but I'll need to try on unpatched version.
    >>
    >> Do we want to fix API to reflect CZF?
    >>
    >> Thanks for your help.
    >>
    >> Steve
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >> On Thu, Jan 26, 2017 at 4:47 PM, Jeff Elsloo <je...@gmail.com> wrote:
    >>
    >>> Dave just let me know that in this case you don't have any caches
    >>> assigned in us-ga-macon. I'm not sure how the API behaves at that
    >>> point – it likely won't follow the same "next best cache group" logic,
    >>> as it was designed as a simple lookup tool.
    >>>
    >>> Can you try simulating a request through Traffic Router directly using
    >>> the X-MM-Client-IP header, or fakeClientIpAddress query parameter
    >>> using the example IP of 24.252.192.0? After you do so, check the
    >>> coordinates in the log entry and see if the result is a CZ hit.
    >>> --
    >>> Thanks,
    >>> Jeff
    >>>
    >>>
    >>> On Thu, Jan 26, 2017 at 2:03 PM, Jeff Elsloo <je...@gmail.com>
    >>> wrote:
    >>>> Are you 100% sure that the Traffic Router has loaded the updated CZF?
    >>>> If so, what happens when you use an IP within the /20 instead of the
    >>>> network address (.0)? I tried using a network address of a /22 on a
    >>>> 1.8 TR and it hit the CZF as expected. Ultimately what you're seeing
    >>>> is a CZF miss, unrelated to the geo coordinates.
    >>>>
    >>>> The underlying feature with the coordinates is to select the next best
    >>>> cache group by proximity where healthy caches have a given delivery
    >>>> service assigned. In order to test that, you would need to have a CZF
    >>>> hit in a cache group which doesn't have that particular delivery
    >>>> service assigned to any caches, or have all caches within that cache
    >>>> group with that delivery service in an unhealthy state.
    >>>>
    >>>> Thanks,
    >>>> --
    >>>> Thanks,
    >>>> Jeff
    >>>>
    >>>>
    >>>> On Wed, Jan 25, 2017 at 1:33 PM, Steve Malenfant <sm...@gmail.com>
    >>> wrote:
    >>>>> Jeff,
    >>>>>
    >>>>> I've tried this coverage zone file coordinate overwrite... I might be
    >>>>> missing something.
    >>>>>
    >>>>> I defined the following :
    >>>>>
    >>>>>        "us-ga-macon": {
    >>>>>>            "coordinates": {
    >>>>>>                "latitude": "32.7261",
    >>>>>>                "longitude": "-83.6547"
    >>>>>>            },
    >>>>>>            "network": [
    >>>>>>                "24.252.192.0/20",
    >>>>>>                "68.1.20.0/22",
    >>>>>
    >>>>>
    >>>>> Then issued the following query :
    >>>>>
    >>>>>> curl http://traffic_router:3333/crs/stats/ip/24.252.192.0
    >>>>>>
    >>>>>> {"locationByGeo":{"city":"Macon","countryCode":"US","
    >>> latitude":"32.7288","postalCode":"31216","countryName":"United
    >>>>>> States","longitude":"-83.6865"},"locationByFederation":"not
    >>>>>> found","requestIp":"24.252.192.0","locationByCoverageZone":"not
    >>> found"}
    >>>>>>
    >>>>> I believe I'm expecting "locationByCoverageZone" to find something...
    >>>>>
    >>>>> I tried on 1.6.0 and 1.6.1 (patched with the pastebin above which I
    >>> wasn't
    >>>>> sure I was suppose to do).
    >>>>>
    >>>>> Would you mind giving me some light on this?
    >>>>>
    >>>>> Thanks,
    >>>>>
    >>>>> Steve
    >>>>>
    >>>>>
    >>>>> On Mon, Jan 23, 2017 at 3:05 PM, Jeff Elsloo <je...@gmail.com>
    >>> wrote:
    >>>>>
    >>>>>> Yes; the feature went into 1.5.x.
    >>>>>> --
    >>>>>> Thanks,
    >>>>>> Jeff
    >>>>>>
    >>>>>>
    >>>>>> On Thu, Jan 19, 2017 at 10:37 AM, Steve Malenfant <
    >>> smalenfant@gmail.com>
    >>>>>> wrote:
    >>>>>>> I didn't know about this which is good information. Does that work on
    >>>>>>> Traffic Router 1.6?
    >>>>>>>
    >>>>>>> On Mon, Jan 9, 2017 at 12:44 PM, Eric Friedrich (efriedri) <
    >>>>>>> efriedri@cisco.com> wrote:
    >>>>>>>
    >>>>>>>> Jeff and I had a quick Slack convo, so I’ll add a followup summary
    >>> here
    >>>>>> in
    >>>>>>>> case anyone else is interested.
    >>>>>>>>
    >>>>>>>> Cache Group location (lat/long) is configured in Traffic Ops today
    >>> (and
    >>>>>> is
    >>>>>>>> used for computing distance from Maxmind Geolocation).
    >>>>>>>>
    >>>>>>>> You can also configure the location (lat/long) for a Cache Group in
    >>> the
    >>>>>>>> CoverageZone file (example below).
    >>>>>>>>
    >>>>>>>> When this location is configured (and Jeff’s suggested logic fix
    >>> from
    >>>>>>>> below is applied) and all caches in the mapped cache group are
    >>>>>> unavailable,
    >>>>>>>> TR will send a client request to the cache group that is closest to
    >>> the
    >>>>>>>> original mapped group.
    >>>>>>>>
    >>>>>>>> Example CZF w/ cache location
    >>>>>>>> -----
    >>>>>>>> "coverageZones": {
    >>>>>>>>    “edge-cg-1": {
    >>>>>>>>      "network6": [
    >>>>>>>>        ...
    >>>>>>>>      ],
    >>>>>>>>      "network": [
    >>>>>>>>        ...
    >>>>>>>>      ],
    >>>>>>>>      "coordinates": {
    >>>>>>>>        "longitude": “-75.3342",
    >>>>>>>>        "latitude": “42.555"
    >>>>>>>>      }
    >>>>>>>>    },
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> —Eric
    >>>>>>>>
    >>>>>>>>
    >>>>>>>>> On Jan 5, 2017, at 12:06 PM, Jeff Elsloo <je...@gmail.com>
    >>>>>> wrote:
    >>>>>>>>>
    >>>>>>>>> If we applied the proposed change, given your scenario we should
    >>> fall
    >>>>>>>>> through to the return statement that calls
    >>> getClosestCacheLocation().
    >>>>>>>>> That method will order all cache groups based on their lat/long
    >>> and
    >>>>>>>>> the lat/long of the cache group we hit on in the CZF. Once the
    >>> list is
    >>>>>>>>> ordered, we iterate through the list until we find a cache group
    >>> that
    >>>>>>>>> has available caches for that DS.
    >>>>>>>>>
    >>>>>>>>> BTW, the stuff on line 536 is likely to produce the exact same
    >>> result
    >>>>>>>>> as the check that precedes it. networkNode.getLoc() will return
    >>> the
    >>>>>>>>> string name of the cache group, so when we find the
    >>> CacheLocation, it
    >>>>>>>>> will be the same as what we had just checked. We could probably
    >>> get
    >>>>>>>>> away with removing that part of the method as it's redundant.
    >>>>>>>>> --
    >>>>>>>>> Thanks,
    >>>>>>>>> Jeff
    >>>>>>>>>
    >>>>>>>>>
    >>>>>>>>> On Wed, Jan 4, 2017 at 11:54 AM, Eric Friedrich (efriedri)
    >>>>>>>>> <ef...@cisco.com> wrote:
    >>>>>>>>>> Where would TR look outside the assigned cache group to find the
    >>> next
    >>>>>>>> closest cache group?
    >>>>>>>>>>
    >>>>>>>>>>> On Jan 4, 2017, at 11:25 AM, Eric Friedrich (efriedri) <
    >>>>>>>> efriedri@cisco.com> wrote:
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>> On Jan 3, 2017, at 5:20 PM, Jeff Elsloo <jeff.elsloo@gmail.com
    >>>>>> <mailto:
    >>>>>>>> jeff.elsloo@gmail.com>> wrote:
    >>>>>>>>>>>
    >>>>>>>>>>> Hey Eric,
    >>>>>>>>>>>
    >>>>>>>>>>> It sounds like the use case you're after is an RFC 1918 client
    >>>>>>>>>>> associated with a cache group whose caches are all unavailable
    >>> for
    >>>>>> one
    >>>>>>>>>>> reason or another. Is that correct?
    >>>>>>>>>>> Yes, exactly.
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>> I looked at the code a bit, and I think that we can make a minor
    >>>>>>>>>>> change to achieve the behavior you're looking for as long as
    >>> you're
    >>>>>>>>>>> able to put your RFC 1918 ranges in the CZF.
    >>>>>>>>>>> Yes, we would want those ranges in the CZF. I can’t think of any
    >>>>>> other
    >>>>>>>> place they would go.
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>> There's a small logic gap in the existing algorithm around cache
    >>>>>>>>>>> location selection and I think if we fix that (two line
    >>> change), we
    >>>>>>>>>>> should be better off all around. I think the only time we'd ever
    >>>>>> want
    >>>>>>>>>>> to go to the geolocation provider is in the event of a miss on
    >>> the
    >>>>>>>>>>> CZF, so as long as we have a hit there, we should find the cache
    >>>>>> group
    >>>>>>>>>>> closest to that hit location that has available caches. This
    >>> would
    >>>>>>>>>>> automatically provide the "backup" cache group concept, and has
    >>> the
    >>>>>>>>>>> added benefit of doing this selection dynamically based on the
    >>> state
    >>>>>>>>>>> of the CDN.
    >>>>>>>>>>> Wow, thanks for picking up on this solution. Sounds like a
    >>> strong
    >>>>>>>> possibility. I like that it can extend dynamically.
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>> See this to get an idea of what I mean:
    >>> http://apaste.info/u3PQo
    >>>>>>>>>>> https://github.com/apache/incubator-trafficcontrol/blob/
    >>>>>>>> 249bd7504eeb7cc43402126f3719017e2475ad33/traffic_router/
    >>>>>>>> core/src/main/java/com/comcast/cdn/traffic_control/
    >>>>>>>> traffic_router/core/router/TrafficRouter.java#L536
    >>>>>>>>>>> Does this line set cacheLocation to the closest cache group with
    >>>>>>>> active caches on that DS?
    >>>>>>>>>>>
    >>>>>>>>>>> What does networkNode.getLoc() actually return?
    >>>>>>>>>>>
    >>>>>>>>>>> —Eric
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>> Obviously we'd need to test this to ensure we don't break other
    >>>>>>>> functionality.
    >>>>>>>>>>> --
    >>>>>>>>>>> Thanks,
    >>>>>>>>>>> Jeff
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>> On Tue, Jan 3, 2017 at 10:07 AM, Eric Friedrich (efriedri)
    >>>>>>>>>>> <ef...@cisco.com>> wrote:
    >>>>>>>>>>> If all caches in the primary cache group are unavailable, our
    >>> goal
    >>>>>> is
    >>>>>>>> to provide a backup routing policy for RFC1918 clients.
    >>>>>>>>>>>
    >>>>>>>>>>> When client IP is an public Internet IP, the current backup
    >>> policy
    >>>>>> is
    >>>>>>>> to assign the client to the geographically closest cache (Distance =
    >>>>>>>> MaxMind Geo Lat/Long - configured CG lat/long).
    >>>>>>>>>>>
    >>>>>>>>>>> When client IP is an RFC1918 IP, the client would not have a
    >>> maxmind
    >>>>>>>> geo-loc, so would fall back to the DS geo-miss lat long. We’d prefer
    >>>>>> some
    >>>>>>>> more granular control over where these clients are routed to, rather
    >>>>>> than a
    >>>>>>>> per-DS setting.
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>> So with an RFC1918 client, the lookup process would be (step 3
    >>> is
    >>>>>> only
    >>>>>>>> addition)
    >>>>>>>>>>> 1) Check CZF for a subnet match (and find a match for existing
    >>> cache
    >>>>>>>> group). Assign client to CG
    >>>>>>>>>>> 2) Check CG for available (online and associated w/ DS)
    >>> servers. In
    >>>>>>>> this particular case, assume CG has no servers available to route
    >>> the
    >>>>>>>> client to
    >>>>>>>>>>> 3) Walk the CZF's list of backup CGs and perform the check from
    >>> #2
    >>>>>> for
    >>>>>>>> each CG. Use first server that is found
    >>>>>>>>>>> 4) Assuming no server is found in #3, perform geo-location and
    >>> find
    >>>>>>>> closest cache group. Use a server from the closest CG if one is
    >>> found
    >>>>>>>>>>> 4a) If geo-location returns null, use the DS’ default geo-miss
    >>>>>>>> location as the client location.
    >>>>>>>>>>>
    >>>>>>>>>>> —Eric
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>> On Dec 26, 2016, at 10:01 AM, Jan van Doorn <jvd@knutsel.com
    >>>>>> <mailto:
    >>>>>>>> jvd@knutsel.com>> wrote:
    >>>>>>>>>>>
    >>>>>>>>>>> Hi Eric,
    >>>>>>>>>>>
    >>>>>>>>>>> How does the backup list relate to the RFC1918-is-not-in-geo
    >>>>>> problem?
    >>>>>>>>>>>
    >>>>>>>>>>> To get to a cachegroup you need to get a match in the coverage
    >>>>>> zone, I
    >>>>>>>> would think?
    >>>>>>>>>>>
    >>>>>>>>>>> Rgds,
    >>>>>>>>>>> JvD
    >>>>>>>>>>>
    >>>>>>>>>>> On Dec 22, 2016, at 12:28, Eric Friedrich (efriedri) <
    >>>>>>>> efriedri@cisco.com<ma...@cisco.com>> wrote:
    >>>>>>>>>>>
    >>>>>>>>>>> The current behavior of cache group selection works as follows
    >>>>>>>>>>> 1) Look for a subnet match in CZF
    >>>>>>>>>>> 2) Use MaxMind/Neustar for GeoLocation based on client IP.
    >>> Choose
    >>>>>>>> closest cache group.
    >>>>>>>>>>> 3) Use Delivery Service Geo-Miss Lat/Long. Choose closest cache
    >>>>>> group.
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>> For deployments where IP addressing is primarily private (say
    >>>>>> RFC-1918
    >>>>>>>> addresses), client IP Geo Location (#2) is not useful.
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>> We are considering adding another field to the Coverage Zone
    >>> File
    >>>>>> that
    >>>>>>>> configures an ordered list of backup cache groups to try if the
    >>> primary
    >>>>>>>> cache group does not have any available caches.
    >>>>>>>>>>>
    >>>>>>>>>>> Example:
    >>>>>>>>>>>
    >>>>>>>>>>> "coverageZones": {
    >>>>>>>>>>> "cache-group-01": {
    >>>>>>>>>>> “backupList”: [“cache-group-02”, “cache-group-03”],
    >>>>>>>>>>> "network6": [
    >>>>>>>>>>> "1234:5678::\/64”,
    >>>>>>>>>>> "1234:5679::\/64"],
    >>>>>>>>>>> "network": [
    >>>>>>>>>>> "192.168.8.0\/24",
    >>>>>>>>>>> "192.168.9.0\/24”]
    >>>>>>>>>>> }
    >>>>>>>>>>>
    >>>>>>>>>>> This configuration could also be part of the per-cache group
    >>>>>>>> configuration, but that would give less control over which clients
    >>>>>>>> preferred which cache groups. For example, you may have cache
    >>> groups in
    >>>>>> LA,
    >>>>>>>> Chicago and NY. If the Chicago Cache group fails, you may want some
    >>> of
    >>>>>> the
    >>>>>>>> Chicago clients to go to LA and some to go to NY. If the backup CG
    >>>>>>>> configuration is per-cg, we would not be able to control where
    >>> clients
    >>>>>> are
    >>>>>>>> allocated.
    >>>>>>>>>>>
    >>>>>>>>>>> Looking for opinions and comments on the above proposal, this is
    >>>>>> still
    >>>>>>>> in idea stage.
    >>>>>>>>>>>
    >>>>>>>>>>> Thanks All!
    >>>>>>>>>>> Eric
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>
    >>>>>>>>
    >>>>>>>>
    >>>>>>
    >>>
    >
    


Re: Backup Cache Group Selection

Posted by Jeff Elsloo <el...@apache.org>.
Yes, it's expected behavior. What you're describing sounds like a
cachegroup in the CZF without any corresponding configuration in
Traffic Ops, or a cachegroup with configuration in Traffic Ops, but
with no available caches (DS assignments, health, etc).

Presuming we have configured geolocation coordinates within the CZF,
we know the lat/long of the cachegroup within the CZF that contains
the source address. We can then order our list of cachegroups by
lat/long, then select the "next best" cache group by distance and
availability. That will be the actual cachegroup to serve the request;
this prevents a miss on the CZF that would normally be routed to the
Geolocation service selected for the DS.

We do have a slight gap around logging, and maybe that's part of the
question. What we see in the log is the selected lat/long, not the
source lat/long of the hit, so we can't easily tell when we're in this
case by simply looking at logs. This could be an area of improvement,
however, we'll need to be careful to not conflate the logs with
unnecessary information. In most cases the hit is the selected
cachegroup, so we need to be careful to not just add "source" and
"actual" coordinates to the log because it'll be identical in most CZF
hit cases.

Thanks,
Thanks,
Jeff


On Wed, Mar 29, 2017 at 7:02 AM, John Shen (weifensh)
<we...@cisco.com> wrote:
> Hi Jeff,
>
> I have just tried the getClosestCacheLocation() logic. It appears the CZF matched lat/long does come from CZF, but the lat/long of the “closest” Cache Groups is from the configuration by Ops. This means to calculate the distance from the matched CG and “closest” CG, the source lat/long is from CZF, but the dest lat/long is not from CZF but from CG settings on Ops. Is this expected behavior?
>
> Thanks,
> John
>
>
> On 27/01/2017, 10:51 PM, "Jeff Elsloo" <je...@gmail.com> wrote:
>
>     Steve: I don't think the patch is required, however, as Eric found,
>     without the patch there could be some gaps depending on the scenario.
>     That specific scenario revolved around the "next best cache group" not
>     having a DS assigned, or a healthy cache with the DS assigned. In that
>     case, despite the hits, you would still end up falling through to the
>     geolocation provider. The patch addresses that.
>
>     Eric: The rloc field is set via the Geolocation associated with the
>     CacheLocation, which ultimately comes from the edgeLocations section
>     of the CRConfig. When a CZF lookup is performed inside TR, a hit
>     returns a CacheLocation. When caches aren't available within that
>     CacheLocation, getClosestCacheLocation() is called, and that's why you
>     see the lat/long of the "next best cache group" instead of the actual
>     hit's lat/long.
>
>     If we want to have granularity in this situation, we might need to 1)
>     create a new RestultType, such as ResultType.CZ_NEXT (or something),
>     and/or 2) massage the log format such that we either have a the
>     original lat/long, and new lat/long in the rloc field, or create a new
>     field to save one or the other, such that we log both lat/longs.
>
>     Thoughts? Whatever we decide should go into TC-90 so we can apply the
>     proposed patch and improve the logging.
>     --
>     Thanks,
>     Jeff
>
>
>     On Fri, Jan 27, 2017 at 7:14 AM, Eric Friedrich (efriedri)
>     <ef...@cisco.com> wrote:
>     > The rloc field usually indicates the Geolocation IP of the client (short for request location)
>     >
>     > But here it looks like rloc is reflecting the location of the CG it ultimately redirected to (response location?).
>     >
>     > I would have expected the rloc field to either
>     >    1) be blank (because we never did a lookup from geoprovider)
>     >         or
>     >    2)  to contain the coordinates of the cache group the CZF hit on (in this case us-ga-macon at 32.7261, -83.6547”)
>     >
>     > —Eric
>     >
>     >> On Jan 27, 2017, at 8:28 AM, Steve Malenfant <sm...@gmail.com> wrote:
>     >>
>     >> Jeff,
>     >>
>     >> CZF properly installed: yes
>     >> Network address or not: same behavior
>     >>
>     >> But you nailed the API one. There is no cache assigned to us-ga-macon,
>     >> which is exactly what I'm testing.
>     >>
>     >> I added cache groups for my testing in the lab which I assigned a few
>     >> caches to them :
>     >>
>     >> - us-ga-atlanta 34.0362 -84.3207
>     >> - us-ok-oklahomacity 35.4777 -97.5545
>     >> - us-va-nova 38.7922 -77.2136
>     >> - us-ca-sandiego 32.7205 -117.0838
>     >>
>     >> API :
>     >> {"locationByGeo":{"city":"Macon","countryCode":"US","latitude":"32.7288","postalCode":"31216","countryName":"United
>     >> States","longitude":"-83.6865"},"locationByFederation":"not
>     >> found","requestIp":"24.252.192.1","locationByCoverageZone":"not found"}
>     >>
>     >> Using the X-MM-Client-IP it returned the proper cache based on CZ, it
>     >> correctly sent the request to the cache in us-ga-atlanta :
>     >> 1485522786.423 qtype=HTTP chi=24.252.192.1 url="
>     >> http://crs.cox-col-jitp2.cdn1.coxlab.net/" cqhm=GET cqhv=HTTP/1.1 rtype=CZ
>     >> rloc="34.03,-84.32" rdtl=- rerr="-" rgb="-" pssc=302 ttms=0.260 rurl="
>     >> http://cdn1cdedge0007.cox-col-jitp2.cdn1.coxlab.net/" rh="-"
>     >>
>     >> I then changed the coordinate to match the us-ca-sandiego group in the CZF
>     >> and now the request is sent to the us-ca-sandiego caches :
>     >> 1485523546.345 qtype=HTTP chi=24.252.192.1 url="
>     >> http://crs.cox-col-jitp2.cdn1.coxlab.net/" cqhm=GET cqhv=HTTP/1.1 rtype=CZ
>     >> rloc="32.72,-117.08" rdtl=- rerr="-" rgb="-" pssc=302 ttms=0.206 rurl="
>     >> http://cdn1cdedge0001.cox-col-jitp2.cdn1.coxlab.net/" rh="-
>     >>
>     >> I'm using 1.6.1 + patch discussed in this email. Not sure if those are
>     >> necessary but I'll need to try on unpatched version.
>     >>
>     >> Do we want to fix API to reflect CZF?
>     >>
>     >> Thanks for your help.
>     >>
>     >> Steve
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >> On Thu, Jan 26, 2017 at 4:47 PM, Jeff Elsloo <je...@gmail.com> wrote:
>     >>
>     >>> Dave just let me know that in this case you don't have any caches
>     >>> assigned in us-ga-macon. I'm not sure how the API behaves at that
>     >>> point – it likely won't follow the same "next best cache group" logic,
>     >>> as it was designed as a simple lookup tool.
>     >>>
>     >>> Can you try simulating a request through Traffic Router directly using
>     >>> the X-MM-Client-IP header, or fakeClientIpAddress query parameter
>     >>> using the example IP of 24.252.192.0? After you do so, check the
>     >>> coordinates in the log entry and see if the result is a CZ hit.
>     >>> --
>     >>> Thanks,
>     >>> Jeff
>     >>>
>     >>>
>     >>> On Thu, Jan 26, 2017 at 2:03 PM, Jeff Elsloo <je...@gmail.com>
>     >>> wrote:
>     >>>> Are you 100% sure that the Traffic Router has loaded the updated CZF?
>     >>>> If so, what happens when you use an IP within the /20 instead of the
>     >>>> network address (.0)? I tried using a network address of a /22 on a
>     >>>> 1.8 TR and it hit the CZF as expected. Ultimately what you're seeing
>     >>>> is a CZF miss, unrelated to the geo coordinates.
>     >>>>
>     >>>> The underlying feature with the coordinates is to select the next best
>     >>>> cache group by proximity where healthy caches have a given delivery
>     >>>> service assigned. In order to test that, you would need to have a CZF
>     >>>> hit in a cache group which doesn't have that particular delivery
>     >>>> service assigned to any caches, or have all caches within that cache
>     >>>> group with that delivery service in an unhealthy state.
>     >>>>
>     >>>> Thanks,
>     >>>> --
>     >>>> Thanks,
>     >>>> Jeff
>     >>>>
>     >>>>
>     >>>> On Wed, Jan 25, 2017 at 1:33 PM, Steve Malenfant <sm...@gmail.com>
>     >>> wrote:
>     >>>>> Jeff,
>     >>>>>
>     >>>>> I've tried this coverage zone file coordinate overwrite... I might be
>     >>>>> missing something.
>     >>>>>
>     >>>>> I defined the following :
>     >>>>>
>     >>>>>        "us-ga-macon": {
>     >>>>>>            "coordinates": {
>     >>>>>>                "latitude": "32.7261",
>     >>>>>>                "longitude": "-83.6547"
>     >>>>>>            },
>     >>>>>>            "network": [
>     >>>>>>                "24.252.192.0/20",
>     >>>>>>                "68.1.20.0/22",
>     >>>>>
>     >>>>>
>     >>>>> Then issued the following query :
>     >>>>>
>     >>>>>> curl http://traffic_router:3333/crs/stats/ip/24.252.192.0
>     >>>>>>
>     >>>>>> {"locationByGeo":{"city":"Macon","countryCode":"US","
>     >>> latitude":"32.7288","postalCode":"31216","countryName":"United
>     >>>>>> States","longitude":"-83.6865"},"locationByFederation":"not
>     >>>>>> found","requestIp":"24.252.192.0","locationByCoverageZone":"not
>     >>> found"}
>     >>>>>>
>     >>>>> I believe I'm expecting "locationByCoverageZone" to find something...
>     >>>>>
>     >>>>> I tried on 1.6.0 and 1.6.1 (patched with the pastebin above which I
>     >>> wasn't
>     >>>>> sure I was suppose to do).
>     >>>>>
>     >>>>> Would you mind giving me some light on this?
>     >>>>>
>     >>>>> Thanks,
>     >>>>>
>     >>>>> Steve
>     >>>>>
>     >>>>>
>     >>>>> On Mon, Jan 23, 2017 at 3:05 PM, Jeff Elsloo <je...@gmail.com>
>     >>> wrote:
>     >>>>>
>     >>>>>> Yes; the feature went into 1.5.x.
>     >>>>>> --
>     >>>>>> Thanks,
>     >>>>>> Jeff
>     >>>>>>
>     >>>>>>
>     >>>>>> On Thu, Jan 19, 2017 at 10:37 AM, Steve Malenfant <
>     >>> smalenfant@gmail.com>
>     >>>>>> wrote:
>     >>>>>>> I didn't know about this which is good information. Does that work on
>     >>>>>>> Traffic Router 1.6?
>     >>>>>>>
>     >>>>>>> On Mon, Jan 9, 2017 at 12:44 PM, Eric Friedrich (efriedri) <
>     >>>>>>> efriedri@cisco.com> wrote:
>     >>>>>>>
>     >>>>>>>> Jeff and I had a quick Slack convo, so I’ll add a followup summary
>     >>> here
>     >>>>>> in
>     >>>>>>>> case anyone else is interested.
>     >>>>>>>>
>     >>>>>>>> Cache Group location (lat/long) is configured in Traffic Ops today
>     >>> (and
>     >>>>>> is
>     >>>>>>>> used for computing distance from Maxmind Geolocation).
>     >>>>>>>>
>     >>>>>>>> You can also configure the location (lat/long) for a Cache Group in
>     >>> the
>     >>>>>>>> CoverageZone file (example below).
>     >>>>>>>>
>     >>>>>>>> When this location is configured (and Jeff’s suggested logic fix
>     >>> from
>     >>>>>>>> below is applied) and all caches in the mapped cache group are
>     >>>>>> unavailable,
>     >>>>>>>> TR will send a client request to the cache group that is closest to
>     >>> the
>     >>>>>>>> original mapped group.
>     >>>>>>>>
>     >>>>>>>> Example CZF w/ cache location
>     >>>>>>>> -----
>     >>>>>>>> "coverageZones": {
>     >>>>>>>>    “edge-cg-1": {
>     >>>>>>>>      "network6": [
>     >>>>>>>>        ...
>     >>>>>>>>      ],
>     >>>>>>>>      "network": [
>     >>>>>>>>        ...
>     >>>>>>>>      ],
>     >>>>>>>>      "coordinates": {
>     >>>>>>>>        "longitude": “-75.3342",
>     >>>>>>>>        "latitude": “42.555"
>     >>>>>>>>      }
>     >>>>>>>>    },
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> —Eric
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>> On Jan 5, 2017, at 12:06 PM, Jeff Elsloo <je...@gmail.com>
>     >>>>>> wrote:
>     >>>>>>>>>
>     >>>>>>>>> If we applied the proposed change, given your scenario we should
>     >>> fall
>     >>>>>>>>> through to the return statement that calls
>     >>> getClosestCacheLocation().
>     >>>>>>>>> That method will order all cache groups based on their lat/long
>     >>> and
>     >>>>>>>>> the lat/long of the cache group we hit on in the CZF. Once the
>     >>> list is
>     >>>>>>>>> ordered, we iterate through the list until we find a cache group
>     >>> that
>     >>>>>>>>> has available caches for that DS.
>     >>>>>>>>>
>     >>>>>>>>> BTW, the stuff on line 536 is likely to produce the exact same
>     >>> result
>     >>>>>>>>> as the check that precedes it. networkNode.getLoc() will return
>     >>> the
>     >>>>>>>>> string name of the cache group, so when we find the
>     >>> CacheLocation, it
>     >>>>>>>>> will be the same as what we had just checked. We could probably
>     >>> get
>     >>>>>>>>> away with removing that part of the method as it's redundant.
>     >>>>>>>>> --
>     >>>>>>>>> Thanks,
>     >>>>>>>>> Jeff
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>> On Wed, Jan 4, 2017 at 11:54 AM, Eric Friedrich (efriedri)
>     >>>>>>>>> <ef...@cisco.com> wrote:
>     >>>>>>>>>> Where would TR look outside the assigned cache group to find the
>     >>> next
>     >>>>>>>> closest cache group?
>     >>>>>>>>>>
>     >>>>>>>>>>> On Jan 4, 2017, at 11:25 AM, Eric Friedrich (efriedri) <
>     >>>>>>>> efriedri@cisco.com> wrote:
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>> On Jan 3, 2017, at 5:20 PM, Jeff Elsloo <jeff.elsloo@gmail.com
>     >>>>>> <mailto:
>     >>>>>>>> jeff.elsloo@gmail.com>> wrote:
>     >>>>>>>>>>>
>     >>>>>>>>>>> Hey Eric,
>     >>>>>>>>>>>
>     >>>>>>>>>>> It sounds like the use case you're after is an RFC 1918 client
>     >>>>>>>>>>> associated with a cache group whose caches are all unavailable
>     >>> for
>     >>>>>> one
>     >>>>>>>>>>> reason or another. Is that correct?
>     >>>>>>>>>>> Yes, exactly.
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>> I looked at the code a bit, and I think that we can make a minor
>     >>>>>>>>>>> change to achieve the behavior you're looking for as long as
>     >>> you're
>     >>>>>>>>>>> able to put your RFC 1918 ranges in the CZF.
>     >>>>>>>>>>> Yes, we would want those ranges in the CZF. I can’t think of any
>     >>>>>> other
>     >>>>>>>> place they would go.
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>> There's a small logic gap in the existing algorithm around cache
>     >>>>>>>>>>> location selection and I think if we fix that (two line
>     >>> change), we
>     >>>>>>>>>>> should be better off all around. I think the only time we'd ever
>     >>>>>> want
>     >>>>>>>>>>> to go to the geolocation provider is in the event of a miss on
>     >>> the
>     >>>>>>>>>>> CZF, so as long as we have a hit there, we should find the cache
>     >>>>>> group
>     >>>>>>>>>>> closest to that hit location that has available caches. This
>     >>> would
>     >>>>>>>>>>> automatically provide the "backup" cache group concept, and has
>     >>> the
>     >>>>>>>>>>> added benefit of doing this selection dynamically based on the
>     >>> state
>     >>>>>>>>>>> of the CDN.
>     >>>>>>>>>>> Wow, thanks for picking up on this solution. Sounds like a
>     >>> strong
>     >>>>>>>> possibility. I like that it can extend dynamically.
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>> See this to get an idea of what I mean:
>     >>> http://apaste.info/u3PQo
>     >>>>>>>>>>> https://github.com/apache/incubator-trafficcontrol/blob/
>     >>>>>>>> 249bd7504eeb7cc43402126f3719017e2475ad33/traffic_router/
>     >>>>>>>> core/src/main/java/com/comcast/cdn/traffic_control/
>     >>>>>>>> traffic_router/core/router/TrafficRouter.java#L536
>     >>>>>>>>>>> Does this line set cacheLocation to the closest cache group with
>     >>>>>>>> active caches on that DS?
>     >>>>>>>>>>>
>     >>>>>>>>>>> What does networkNode.getLoc() actually return?
>     >>>>>>>>>>>
>     >>>>>>>>>>> —Eric
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>> Obviously we'd need to test this to ensure we don't break other
>     >>>>>>>> functionality.
>     >>>>>>>>>>> --
>     >>>>>>>>>>> Thanks,
>     >>>>>>>>>>> Jeff
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>> On Tue, Jan 3, 2017 at 10:07 AM, Eric Friedrich (efriedri)
>     >>>>>>>>>>> <ef...@cisco.com>> wrote:
>     >>>>>>>>>>> If all caches in the primary cache group are unavailable, our
>     >>> goal
>     >>>>>> is
>     >>>>>>>> to provide a backup routing policy for RFC1918 clients.
>     >>>>>>>>>>>
>     >>>>>>>>>>> When client IP is an public Internet IP, the current backup
>     >>> policy
>     >>>>>> is
>     >>>>>>>> to assign the client to the geographically closest cache (Distance =
>     >>>>>>>> MaxMind Geo Lat/Long - configured CG lat/long).
>     >>>>>>>>>>>
>     >>>>>>>>>>> When client IP is an RFC1918 IP, the client would not have a
>     >>> maxmind
>     >>>>>>>> geo-loc, so would fall back to the DS geo-miss lat long. We’d prefer
>     >>>>>> some
>     >>>>>>>> more granular control over where these clients are routed to, rather
>     >>>>>> than a
>     >>>>>>>> per-DS setting.
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>> So with an RFC1918 client, the lookup process would be (step 3
>     >>> is
>     >>>>>> only
>     >>>>>>>> addition)
>     >>>>>>>>>>> 1) Check CZF for a subnet match (and find a match for existing
>     >>> cache
>     >>>>>>>> group). Assign client to CG
>     >>>>>>>>>>> 2) Check CG for available (online and associated w/ DS)
>     >>> servers. In
>     >>>>>>>> this particular case, assume CG has no servers available to route
>     >>> the
>     >>>>>>>> client to
>     >>>>>>>>>>> 3) Walk the CZF's list of backup CGs and perform the check from
>     >>> #2
>     >>>>>> for
>     >>>>>>>> each CG. Use first server that is found
>     >>>>>>>>>>> 4) Assuming no server is found in #3, perform geo-location and
>     >>> find
>     >>>>>>>> closest cache group. Use a server from the closest CG if one is
>     >>> found
>     >>>>>>>>>>> 4a) If geo-location returns null, use the DS’ default geo-miss
>     >>>>>>>> location as the client location.
>     >>>>>>>>>>>
>     >>>>>>>>>>> —Eric
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>> On Dec 26, 2016, at 10:01 AM, Jan van Doorn <jvd@knutsel.com
>     >>>>>> <mailto:
>     >>>>>>>> jvd@knutsel.com>> wrote:
>     >>>>>>>>>>>
>     >>>>>>>>>>> Hi Eric,
>     >>>>>>>>>>>
>     >>>>>>>>>>> How does the backup list relate to the RFC1918-is-not-in-geo
>     >>>>>> problem?
>     >>>>>>>>>>>
>     >>>>>>>>>>> To get to a cachegroup you need to get a match in the coverage
>     >>>>>> zone, I
>     >>>>>>>> would think?
>     >>>>>>>>>>>
>     >>>>>>>>>>> Rgds,
>     >>>>>>>>>>> JvD
>     >>>>>>>>>>>
>     >>>>>>>>>>> On Dec 22, 2016, at 12:28, Eric Friedrich (efriedri) <
>     >>>>>>>> efriedri@cisco.com<ma...@cisco.com>> wrote:
>     >>>>>>>>>>>
>     >>>>>>>>>>> The current behavior of cache group selection works as follows
>     >>>>>>>>>>> 1) Look for a subnet match in CZF
>     >>>>>>>>>>> 2) Use MaxMind/Neustar for GeoLocation based on client IP.
>     >>> Choose
>     >>>>>>>> closest cache group.
>     >>>>>>>>>>> 3) Use Delivery Service Geo-Miss Lat/Long. Choose closest cache
>     >>>>>> group.
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>> For deployments where IP addressing is primarily private (say
>     >>>>>> RFC-1918
>     >>>>>>>> addresses), client IP Geo Location (#2) is not useful.
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>> We are considering adding another field to the Coverage Zone
>     >>> File
>     >>>>>> that
>     >>>>>>>> configures an ordered list of backup cache groups to try if the
>     >>> primary
>     >>>>>>>> cache group does not have any available caches.
>     >>>>>>>>>>>
>     >>>>>>>>>>> Example:
>     >>>>>>>>>>>
>     >>>>>>>>>>> "coverageZones": {
>     >>>>>>>>>>> "cache-group-01": {
>     >>>>>>>>>>> “backupList”: [“cache-group-02”, “cache-group-03”],
>     >>>>>>>>>>> "network6": [
>     >>>>>>>>>>> "1234:5678::\/64”,
>     >>>>>>>>>>> "1234:5679::\/64"],
>     >>>>>>>>>>> "network": [
>     >>>>>>>>>>> "192.168.8.0\/24",
>     >>>>>>>>>>> "192.168.9.0\/24”]
>     >>>>>>>>>>> }
>     >>>>>>>>>>>
>     >>>>>>>>>>> This configuration could also be part of the per-cache group
>     >>>>>>>> configuration, but that would give less control over which clients
>     >>>>>>>> preferred which cache groups. For example, you may have cache
>     >>> groups in
>     >>>>>> LA,
>     >>>>>>>> Chicago and NY. If the Chicago Cache group fails, you may want some
>     >>> of
>     >>>>>> the
>     >>>>>>>> Chicago clients to go to LA and some to go to NY. If the backup CG
>     >>>>>>>> configuration is per-cg, we would not be able to control where
>     >>> clients
>     >>>>>> are
>     >>>>>>>> allocated.
>     >>>>>>>>>>>
>     >>>>>>>>>>> Looking for opinions and comments on the above proposal, this is
>     >>>>>> still
>     >>>>>>>> in idea stage.
>     >>>>>>>>>>>
>     >>>>>>>>>>> Thanks All!
>     >>>>>>>>>>> Eric
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>
>     >>>
>     >
>
>