You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by GitBox <gi...@apache.org> on 2022/12/22 21:43:44 UTC

[GitHub] [trafficserver] bdgranger opened a new issue, #9275: cache write lock not released when following redirect and no parent

bdgranger opened a new issue, #9275:
URL: https://github.com/apache/trafficserver/issues/9275

   **Configuration**
   - ATS 8.1.x, single-tier CDN (no parent caches, just edge caches and origins)
   - proxy.config.http.parent_proxy_routing_enable set to 0 or proxy.config.http.parent_proxy_routing_enable set to 1 with empty parent.config
   - origin is a redirecting origin that always responds with a 307 to the "actual" origin
   - hosting.config uses volume 2 for host of original request, but has no entry for IP address of redirected host
   
   **The basic gist of what is happening is:**
   - the first request for an asset on a DS comes in and is remapped toorigin, which is mapped to volume 2 (ramdisk) by hosting.config
   - the object is not found and results in the expected cache miss
   - ATS opens a write lock on the ramdisk for the asset and issues a request to the origin to fetch it
   - the origin responds with a 307
   - since ATS is configured to follow redirections, a new read request is issued with the new URL
   - the host of the new URL is not mapped to ramdisk by hosting.config, so ATS looks for the object on the "generic" volume 1
   - the object is also not found on volume 1, so ATS takes a write lock on that volume and issues a request to the "real origin" to fetch
   - object is fetched successfully, written to volume 1, and streamed to client
   - volume 1 cache write lock is released
   - volume 2 (ramdisk) cache write lock is never released
   - When subsequent requests for the same asset are received (read_while_writer enabled):
   - - ATS looks for the object in ramdisk, since that is were the remapped origin is supposed to store content
   - - Because of the write lock, read waits for any content to appear in the ramdisk cache, which will never happen
   - - read retries the configured number of times, then fails and issues its own write request to origin
   - - ATS fails the configured number of times to open the write lock (because the first request still has it), then just goes straight to origin
   - - origin returns a 307 response to the same location as before and ATS follows it
   - - ATS looks in volume 1 based on the host of the "real origin", finds the content in cache and returns a HIT
   - - but by this time the client has already given up and aborted, depending on how many retries were configured and how long each waited
   
   We tried setting proxy.config.http.redirect_use_orig_cache_key to 1, but it appears to make no difference.
   
   We saw this when a customer upgraded from a system that had previously been using ATC 3.x which never generated empty parent.config files.  If we restore the following default line to parent.config which actually has no parents in it, it appears to solve the issue:
   
   `dest_domain=. parent="" round_robin=consistent_hash go_direct=false qstring=ignore`
   
   The difference appears to be that in the case of empty parent.config or proxy routing enable set to 0, the 307 is treated via HandleCacheMiss and a new ISSUE_WRITE is performed which opens a new CacheVC on the default volume.  When the default line is in parent.config, this code is executed instead:
   ```
   else if (s->dns_info.lookup_name[0] <= '9' && s->dns_info.lookup_name[0] >= '0' && s->parent_params->parent_table->hostMatch &&
                !s->http_config_param->no_dns_forward_to_parent) {
       // note, broken logic: ACC fudges the OR stmt to always be true,
       // 'AuthHttpAdapter' should do the rev-dns if needed, not here .
       TRANSACT_RETURN(SM_ACTION_DNS_REVERSE_LOOKUP, HttpTransact::StartAccessControl);
     }
   ```
   In this case there is a message from decideCacheLookup "will NOT do lookup" and ATS just reuses the CacheVC that was opened for the original lookup and actually writes to the ramdisk so that future requests for the object get the immediate cache hit in the ramdisk as expected. But this looks like it only worked because the redirect was to an IP address and not to another fqdn.
   
   It seems either something isn't quite right with the redirect following or something that is supposed to free the first CacheVC got missed, leaving the write lock in place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] bdgranger commented on issue #9275: cache write lock not released when following redirect and no parent

Posted by GitBox <gi...@apache.org>.
bdgranger commented on issue #9275:
URL: https://github.com/apache/trafficserver/issues/9275#issuecomment-1385716644

   > @bdgranger does the same problem exist in 9.1?
   
   @ywkaras we will have to test this. Will get back to you. Looks like the OSDNSLookup() method has substantial changes since 8.1.x branch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] ywkaras commented on issue #9275: cache write lock not released when following redirect and no parent

Posted by "ywkaras (via GitHub)" <gi...@apache.org>.
ywkaras commented on issue #9275:
URL: https://github.com/apache/trafficserver/issues/9275#issuecomment-1579507873

   @bdgranger any updates?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] bdgranger commented on issue #9275: cache write lock not released when following redirect and no parent

Posted by "bdgranger (via GitHub)" <gi...@apache.org>.
bdgranger commented on issue #9275:
URL: https://github.com/apache/trafficserver/issues/9275#issuecomment-1597470427

   @ywkaras 
   
   Sorry it took me so long to notice this question.  As of now, I have been tied up in other issues and have not been able to test with 9.x. We have worked around the issue for now by making sure that parent.config always has a default "dest_domain=. ..." line in it and there is then no problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] ywkaras commented on issue #9275: cache write lock not released when following redirect and no parent

Posted by GitBox <gi...@apache.org>.
ywkaras commented on issue #9275:
URL: https://github.com/apache/trafficserver/issues/9275#issuecomment-1384707101

   @bdgranger  does the same problem exist in 9.1?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org