You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by GitBox <gi...@apache.org> on 2021/05/26 09:06:33 UTC

[GitHub] [trafficserver] lukenowak opened a new issue #7880: Trafficserver holds chunked response for backend down case

lukenowak opened a new issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880


   We have a situation where the backend returns a response with body size > 2048 bytes without Content-Lenght. This leads to TrafficServer returning it further with Transfer-Encoding: chunked. If the body size is <= 2048 TrafficServer adds Content-Length by itself and serves the content happily.
   
   We are using TrafficServer 8.1.1, but the same behaviour is with 9.0.1.
   
   Our configuration:
   
   - proxy.config.http.negative_revalidating_enabled 1
   - proxy.config.http.cache.max_stale_age 604800
   - proxy.config.http.negative_revalidating_lifetime 0
   - proxy.config.http.chunking_enabled 1 (default)
   - proxy.config.http.chunking.size 4096 (default)
   - proxy.config.http.server_ports 10.0.117.32:23432
   - proxy.config.http.keep_alive_no_activity_timeout_in 10 (this is for the test environment, but higher values in live environment just extend the case)
   
   If the backend is up (returns 200) and negative_revalidating_enabled does not kick in, TrafficServer returns the full response (headers and body) immediately.
   
   The backend up case communication between client and TrafficServer:
   
   ![backend-up-case](https://user-images.githubusercontent.com/334019/119633233-d1e37b80-be11-11eb-8923-5c89dcf4f267.png)
   
   So we happily support the case when the backend is down, but after returning headers it is giving part of the body as chunked, but does not finish it with \r\n0\r\n, which leads to the client being “stuck” for the time of proxy.config.http.keep_alive_no_activity_timeout_in, when then the TrafficServer sends FIN/ACK to the client, somehow releasing it. Seems that [RFC](https://datatracker.ietf.org/doc/html/rfc7230#section-4.1) is not followed.
   
   The backend down case communication between client and TrafficServer:
   
   ![backend-down-case](https://user-images.githubusercontent.com/334019/119633352-f4759480-be11-11eb-84e4-4a2b81ca3ead.png)
   
   Setting proxy.config.http.chunking_enabled to 0 fixes the issue, but we can't accept it as a workaround, as we rely on chunked encoding for many cases, especially for backend-up ones.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] vpelletier edited a comment on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
vpelletier edited a comment on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-849321454


   Some more data points and details.
   
   In the original setup where we noticed this issue (which @lukenowak then reproduced in a smaller test scenario with less identifying data) there was another weird behaviour to the chunking: for a 8211 bytes cached resource, both `tcpdump` and `curl --raw` on an access local to the maching running ATS show:
   - a first 0x1000 chunk header, followed by 4096 bytes of data as expected
   - a second 0x1000 chunk header, followed by only 3597 bytes of data before the connection becomes silent, until ATS eventually closes it of its own initiative once the inactivity timeout is reached
   
   So not only is the response's body lacking a final empty chunk, it is even lacking part of a chunk, and a further 3rd 0x13 bytes-long chunk, so 499 + 13 bytes are missing from the final decoded resource.
   
   This resource has `content-encoding: gzip`. If I request the same resource with curl but without `--compressed` ATS does a cache-miss, reaches for origin, gets a ~~502~~503 response which it caches as an alternate. So I believe this content encoding should be harmless as ATS does not seem to do content encoding negociation.
   
   Also, as I just described, in our setup the origin is not necessarily down from a network perspective: an alternative origin may be reached instead of the normal one, and it would only serve error pages with a ~~502~~503 status and a short html body. We expect ATS to handle both cases in the same way for resources which are available from cache (the cached version is served, possibly after a refresh attempt towards origin).
   
   EDIT: s/502/503/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
shinrich commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-850496179


   @bneradt do you see this problem on 9.1?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bneradt commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
bneradt commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-849076266


   @lukenowak : can you please attach the two packet captures, filtered for these two transactions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] lukenowak commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
lukenowak commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-849420345


   @vpelletier is right regarding the origin state - if it's down the issue is not there, it happens only when it return 5xx.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bneradt commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
bneradt commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-861793121


   > Thanks @bneradt . Is there a chance that this fixes will end up in 9.0.2? And do you have any due date for this release? I saw there are no dates on milestones (I assume you'll release, when it's ready).
   
   The fix just got merged into 9.0.x (see #7577), so it will be in 9.0.2. We are currently working on this release. It should be out within a couple months.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] lukenowak commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
lukenowak commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-849417187


   The packet captures.
   
   [backend_up.pcap.gz](https://github.com/apache/trafficserver/files/6552052/backend_up.pcap.gz)
   [backend_down.pcap.gz](https://github.com/apache/trafficserver/files/6552053/backend_down.pcap.gz)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bneradt commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
bneradt commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-849969939


   Thank you, both. This is helpful feedback. It's good to know that a 5xx from the server is sufficient to repro this because writing a test where the server actually disappears is possible in AuTest but more complicated than me just configuring the test server to reply with a 503.
   
   I've now reproduced the issue on this debug branch:
   https://github.com/bneradt/trafficserver/tree/chunked_backend_down_debug_90x
   
   Running this new AuTest like so:
   ```
   ./autest.sh --sandbox /tmp/sb_chunked --clean=none --ats-bin ~/build/ts_asf_master_chunked_backend_down/bin/ -f negative-revalidating
   ```
   
   The Proxy Verifier client shows this output for the first, correct request:
   
   ```   [0]: Received an HTTP/1 200 response for key 1 with headers:
   - "server": "ATS/9.0.2"
   - "date": "Wed, 26 May 2021 08,50,45 GMT"
   - "vary": "Accept-Encoding"
   - "content-type": "text/html; charset=utf-8"
   - "cache-control": "max-age=2, stale-while-revalidate=60, stale-if-error=604800, public"
   - "set-cookie": "secured=value;secure"
   - "set-cookie": "nonsecured=value"
   - "content-encoding": "gzip"
   - "backend-content-length": "2072"
   - "age": "0" 
   - "Transfer-Encoding": "chunked"
   - "Connection": "keep-alive"
   
      [0]: Drained 2072 chunked body bytes with chunk stream: 818^M
   0000000 0000001 0000002 0000003 0000004 0000005 0000006 0000007 0000008 0000009 000000a 000000b 000000c 000000d 000000e 000000f 0000010 0000011 0000012 0000013 0000014 0000015 0000016 0000017 0000018 0000019 000001a 000001b 000001c 000001d 000001e 000001f 0000020 0000021 0000022 0000023 0000024 0000025 0000026 0000027 0000028 0000029 000002a 000002b 000002c 000002d 000002e 000002f 0000030 0000031 0000032 0000033 0000034 0000035 0000036 0000037 0000038 0000039 000003a 000003b 000003c 000003d 000003e 000003f 0000040 0000041 0000042 0000043 0000044 0000045 0000046 0000047 0000048 0000049 000004a 000004b 000004c 000004d 000004e 000004f 0000050 0000051 0000052 0000053 0000054 0000055 0000056 0000057 0000058 0000059 000005a 000005b 000005c 000005d 000005e 000005f 0000060 0000061 0000062 0000063 0000064 0000065 0000066 0000067 0000068 0000069 000006a 000006b 000006c 000006d 000006e 000006f 0000070 0000071 0000072 0000073 0000074 0000075 0000076 0000077 0000078 0000079 000007a 000007b 000
 007c 000007d 000007e 000007f 0000080 0000081 0000082 0000083 0000084 0000085 0000086 0000087 0000088 0000089 000008a 000008b 000008c 000008d 000008e 000008f 0000090 0000091 0000092 0000093 0000094 0000095 0000096 0000097 0000098 0000099 000009a 000009b 000009c 000009d 000009e 000009f 00000a0 00000a1 00000a2 00000a3 00000a4 00000a5 00000a6 00000a7 00000a8 00000a9 00000aa 00000ab 00000ac 00000ad 00000ae 00000af 00000b0 00000b1 00000b2 00000b3 00000b4 00000b5 00000b6 00000b7 00000b8 00000b9 00000ba 00000bb 00000bc 00000bd 00000be 00000bf 00000c0 00000c1 00000c2 00000c3 00000c4 00000c5 00000c6 00000c7 00000c8 00000c9 00000ca 00000cb 00000cc 00000cd 00000ce 00000cf 00000d0 00000d1 00000d2 00000d3 00000d4 00000d5 00000d6 00000d7 00000d8 00000d9 00000da 00000db 00000dc 00000dd 00000de 00000df 00000e0 00000e1 00000e2 00000e3 00000e4 00000e5 00000e6 00000e7 00000e8 00000e9 00000ea 00000eb 00000ec 00000ed 00000ee 00000ef 00000f0 00000f1 00000f2 00000f3 00000f4 00000f5 00000f6 00000f7 00000f8 
 00000f9 00000fa 00000fb 00000fc 00000fd 00000fe 00000ff 0000100 0000101 0000102 ^M
   0^M
   ^M
   ```
   
   For the second, during the negative revalidating period, it shows this:
   
   ```
      [0]: Received an HTTP/1 200 response for key 3 with headers:
   - "server": "ATS/9.0.2"
   - "date": "Wed, 26 May 2021 08,50,45 GMT"
   - "vary": "Accept-Encoding"
   - "content-type": "text/html; charset=utf-8"
   - "cache-control": "max-age=2, stale-while-revalidate=60, stale-if-error=604800, public"
   - "content-encoding": "gzip"
   - "backend-content-length": "2072"
   - "age": "0"
   - "Expires": "Thu, 27 May 2021 21:52:23 GMT"
   - "Transfer-Encoding": "chunked"
   - "Connection": "keep-alive"
   
      [3]: Unexpected chunked content: expected 0 bytes, drained 1703 bytes.
      [0]: Drained 1703 chunked body bytes with chunk stream: 818^M
   0000000 0000001 0000002 0000003 0000004 0000005 0000006 0000007 0000008 0000009 000000a 000000b 000000c 000000d 000000e 000000f 0000010 0000011 0000012 0000013 0000014 0000015 0000016 0000017 0000018 0000019 000001a 000001b 000001c 000001d 000001e 000001f 0000020 0000021 0000022 0000023 0000024 0000025 0000026 0000027 0000028 0000029 000002a 000002b 000002c 000002d 000002e 000002f 0000030 0000031 0000032 0000033 0000034 0000035 0000036 0000037 0000038 0000039 000003a 000003b 000003c 000003d 000003e 000003f 0000040 0000041 0000042 0000043 0000044 0000045 0000046 0000047 0000048 0000049 000004a 000004b 000004c 000004d 000004e 000004f 0000050 0000051 0000052 0000053 0000054 0000055 0000056 0000057 0000058 0000059 000005a 000005b 000005c 000005d 000005e 000005f 0000060 0000061 0000062 0000063 0000064 0000065 0000066 0000067 0000068 0000069 000006a 000006b 000006c 000006d 000006e 000006f 0000070 0000071 0000072 0000073 0000074 0000075 0000076 0000077 0000078 0000079 000007a 000007b 000
 007c 000007d 000007e 000007f 0000080 0000081 0000082 0000083 0000084 0000085 0000086 0000087 0000088 0000089 000008a 000008b 000008c 000008d 000008e 000008f 0000090 0000091 0000092 0000093 0000094 0000095 0000096 0000097 0000098 0000099 000009a 000009b 000009c 000009d 000009e 000009f 00000a0 00000a1 00000a2 00000a3 00000a4 00000a5 00000a6 00000a7 00000a8 00000a9 00000aa 00000ab 00000ac 00000ad 00000ae 00000af 00000b0 00000b1 00000b2 00000b3 00000b4 00000b5 00000b6 00000b7 00000b8 00000b9 00000ba 00000bb 00000bc 00000bd 00000be 00000bf 00000c0 00000c1 00000c2 00000c3 00000c4 00000c5 00000c6 00000c7 00000c8 00000c9 00000ca 00000cb 00000cc 00000cd 00000ce 00000cf 00000d0 00000d1 00000d2 00000d3 00000d4
      [3]: Failed to replay transaction with key: 3
   ```
   
   Note that the chunk is cut off prematurely, just as described by @lukenowak and @vpelletier . 
   
   Interestingly, I cannot reproduce this problem on master. Only when I cherry-picked these changes to 9.0.x, rebuilt, and ran the test there did I see this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bneradt edited a comment on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
bneradt edited a comment on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-849062280


   I'm working on reproducing this but haven't been able to yet. We have a negative-revalidating test which I've modified like so:
   
   ```$ git diff
   diff --git a/tests/gold_tests/cache/negative-revalidating.test.py b/tests/gold_tests/cache/negative-revalidating.test.py
   index 1f4975206..901e65f95 100644
   --- a/tests/gold_tests/cache/negative-revalidating.test.py
   +++ b/tests/gold_tests/cache/negative-revalidating.test.py
   @@ -56,7 +56,14 @@ ts.Disk.records_config.update({
        # Negative revalidating is on by default. Verify this by leaving out the
        # following line and expect negative_revalidating to be enabled.
        # 'proxy.config.http.negative_revalidating_enabled': 1,
   -    'proxy.config.http.cache.max_stale_age': 6
   +    'proxy.config.http.cache.max_stale_age': 6,
   +
   +    # From issue.
   +    'proxy.config.http.negative_revalidating_lifetime': 0,
   +    'proxy.config.http.chunking_enabled': 1,
   +    'proxy.config.http.chunking.size': 4096,
   +    'proxy.config.http.keep_alive_no_activity_timeout_in': 10,
   +
    })
    tr = Test.AddTestRun("Verify negative revalidating behavior.")
    replay_file = "replay/negative-revalidating-enabled.replay.yaml"
   diff --git a/tests/gold_tests/cache/replay/negative-revalidating-enabled.replay.yaml b/tests/gold_tests/cache/replay/negative-revalidating-enabled.replay.yaml
   index 3fdb97149..aa660764b 100644
   --- a/tests/gold_tests/cache/replay/negative-revalidating-enabled.replay.yaml
   +++ b/tests/gold_tests/cache/replay/negative-revalidating-enabled.replay.yaml
   @@ -42,8 +42,9 @@ sessions:
          reason: "OK"
          headers:
            fields:
   -        - [ Content-Length, 32 ]
            - [ Cache-Control, max-age=2 ]
   +      content:
   +        size: 3200
    
        proxy-response:
          status: 200
   ```
   
   That's a modification of this test:
   https://github.com/apache/trafficserver/blob/050b2dfe9a89348b4105183a2b31e7cc59fc389d/tests/gold_tests/cache/negative-revalidating.test.py
   
   Note I modified the server to not contain a content-length header in its response and I increased the body size to be greater than 2048 (3200 bytes). I then ran the test. The first three transactions do this:
   
   1. Populate the cache with a response to a GET request.
   2. Repeat the GET immediately request and make sure it is served out of cache when the resource is not stale.
   3. Wait 4 seconds so that the cached resource is stale (greater than the 2 second max-age), but younger than max_stale_age.
   4. Repeat the GET.
   5. Server responds with a 503.
   6. ATS replies out of cache due to negative_revalidating.
   
   Looking at the proxy verifier output, I see the following:
   1. The very first response is chunked encoded from ATS with a c80 chunk header (hex for 3200).
   2. The second one, responded out of cache while not stale, has a content-length field value of 3200.
   3. The third one is served out of the cache while stale due to negative revalidating, and it is chunk encoded and has the appropriate trailer:
   
      [0]: Received an HTTP/1 200 response for key 3 with headers:
   - "cache-control": "max-age=2"
   - "Date": "Wed, 26 May 2021 19:02:26 GMT"
   - "Expires": "Wed, 26 May 2021 19:02:31 GMT"
   - "Transfer-Encoding": "chunked"
   - "Connection": "keep-alive"
   - "Server": "ATS/10.0.0"
   
   ```
      [0]: Drained 3200 chunked body bytes with chunk stream: c80^M
   0000000 0000001 0000002 0000003 0000004 0000005 0000006 0000007 0000008 0000009 000000a 000000b 000000c 000000d 000000e 000000f 0000010 0000011 0000012 0000013 0000014 0000015 0000016 0000017 0000018 0000019 000001a 000001b 000001c 000001d 000001e 000001f
   0^M
   ^M
   
   ```
   
   The line with the chunk body is truncated to the width of my terminal, thus it doesn't contain all 3200 characters. But note that it does have the zero-lengthed chunk trailer in this case.
   
   I repeated the test by hand, now taking down the server so it is not reachable rather than having it reply with a 503. This time, when the server was down and the resource was greater than Cache-Control's max-age but less than max_stale_age, I notice that it replies with a content-length of 3200 rather than chunked encoding.
   
   Anyway, that's what I've tried so far. I'll keep thinking about what else to try to fit @lukenowak 's setup.
   
   @lukenowak : if reading the above you notice any configuration or behavior I might be missing to reproduce this, please let me know.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bneradt closed issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
bneradt closed issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bneradt commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
bneradt commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-850531664


   > @bneradt do you see this problem on 9.1?
   
   @shinrich : Good question. I do not see this issue when I run on 9.1.x. Thus, to be explicit:
   
   1. 8.1.1 and 9.0.1 have an issue where the stale, chunked responses are truncated.
   2. 9.1.x and master do not have this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bneradt commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
bneradt commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-850529776


   Recording this now so I don't forget: in an offline conversation with @bryancall, he points out that we should look into why ATS is replying with chunk encoded content in the first place for this scenario. When we serve from the cache as a fresh resource we do a content-length reply. As we are seeing with this issue, when we reply from the cache with a stale resource via the negative revalidating feature, we are serving chunk encoded. At least theoretically we should be able to reply with a content-length for the latter case as well as with the former. And we should reply with content-length bodies if we can because body communicated via content-length has parsing and preparatory advantages for the client.
   
   Thus there are two things that should be considered when implementing a fix for this issue:
   
   1. Under some circumstances, we incorrectly truncate our chunk encoded response when we serve from the cache.
   2. It would be preferable for us to serve content-length responses when responding via the negative revalidating feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] vpelletier commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
vpelletier commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-849321454


   Some more data points and details.
   
   In the original setup where we noticed this issue (which @lukenowak then reproduced in a smaller test scenario with less identifying data) there was another weird behaviour to the chunking: for a 8211 bytes cached resource, both `tcpdump` and `curl --raw` on an access local to the maching running ATS show:
   - a first 0x1000 chunk header, followed by 4096 bytes of data as expected
   - a second 0x1000 chunk header, followed by only 3597 bytes of data before the connection becomes silent, until ATS eventually closes it of its own initiative once the inactivity timeout is reached
   
   So not only is the response's body lacking a final empty chunk, it is even lacking part of a chunk, and a further 3rd 0x13 bytes-long chunk, so 499 + 13 bytes are missing from the final decoded resource.
   
   This resource has `content-encoding: gzip`. If I request the same resource with curl but without `--compressed` ATS does a cache-miss, reaches for origin, gets a 502 response which it caches as an alternate. So I believe this content encoding should be harmless as ATS does not seem to do content encoding negociation.
   
   Also, as I just described, in our setup the origin is not necessarily down from a network perspective: an alternative origin may be reached instead of the normal one, and it would only serve error pages with a 502 status and a short html body. We expect ATS to handle both cases in the same way for resources which are available from cache (the cached version is served, possibly after a refresh attempt towards origin).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bneradt commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
bneradt commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-861853883


   Closing since Susan's fix (#7577) is now merged to 9.0.x and 8.1.x.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bneradt commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
bneradt commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-849062280


   I'm working on reproducing this but haven't been able to yet. We have a negative-revalidating test which I've modified like so:
   
   ```$ git diff
   diff --git a/tests/gold_tests/cache/negative-revalidating.test.py b/tests/gold_tests/cache/negative-revalidating.test.py
   index 1f4975206..901e65f95 100644
   --- a/tests/gold_tests/cache/negative-revalidating.test.py
   +++ b/tests/gold_tests/cache/negative-revalidating.test.py
   @@ -56,7 +56,14 @@ ts.Disk.records_config.update({
        # Negative revalidating is on by default. Verify this by leaving out the
        # following line and expect negative_revalidating to be enabled.
        # 'proxy.config.http.negative_revalidating_enabled': 1,
   -    'proxy.config.http.cache.max_stale_age': 6
   +    'proxy.config.http.cache.max_stale_age': 6,
   +
   +    # From issue.
   +    'proxy.config.http.negative_revalidating_lifetime': 0,
   +    'proxy.config.http.chunking_enabled': 1,
   +    'proxy.config.http.chunking.size': 4096,
   +    'proxy.config.http.keep_alive_no_activity_timeout_in': 10,
   +
    })
    tr = Test.AddTestRun("Verify negative revalidating behavior.")
    replay_file = "replay/negative-revalidating-enabled.replay.yaml"
   diff --git a/tests/gold_tests/cache/replay/negative-revalidating-enabled.replay.yaml b/tests/gold_tests/cache/replay/negative-revalidating-enabled.replay.yaml
   index 3fdb97149..aa660764b 100644
   --- a/tests/gold_tests/cache/replay/negative-revalidating-enabled.replay.yaml
   +++ b/tests/gold_tests/cache/replay/negative-revalidating-enabled.replay.yaml
   @@ -42,8 +42,9 @@ sessions:
          reason: "OK"
          headers:
            fields:
   -        - [ Content-Length, 32 ]
            - [ Cache-Control, max-age=2 ]
   +      content:
   +        size: 3200
    
        proxy-response:
          status: 200
   ```
   
   Note I modified the server to not contain a content-length header in its response and I increased the body size to be greater than 2048 (3200 bytes). I then ran the test. The first three transactions do this:
   
   1. Populate the cache with a response to a GET request.
   2. Repeat the GET immediately request and make sure it is served out of cache when the resource is not stale.
   3. Wait 4 seconds so that the cached resource is stale (greater than the 2 second max-age), but younger than max_stale_age.
   4. Repeat the GET.
   5. Server responds with a 503.
   6. ATS replies out of cache due to negative_revalidating.
   
   Looking at the proxy verifier output, I see the following:
   1. The very first response is chunked encoded from ATS with a c80 chunk header (hex for 3200).
   2. The second one, responded out of cache while not stale, has a content-length field value of 3200.
   3. The third one is served out of the cache while stale due to negative revalidating, and it is chunk encoded and has the appropriate trailer:
   
      [0]: Received an HTTP/1 200 response for key 3 with headers:
   - "cache-control": "max-age=2"
   - "Date": "Wed, 26 May 2021 19:02:26 GMT"
   - "Expires": "Wed, 26 May 2021 19:02:31 GMT"
   - "Transfer-Encoding": "chunked"
   - "Connection": "keep-alive"
   - "Server": "ATS/10.0.0"
   
   ```
      [0]: Drained 3200 chunked body bytes with chunk stream: c80^M
   0000000 0000001 0000002 0000003 0000004 0000005 0000006 0000007 0000008 0000009 000000a 000000b 000000c 000000d 000000e 000000f 0000010 0000011 0000012 0000013 0000014 0000015 0000016 0000017 0000018 0000019 000001a 000001b 000001c 000001d 000001e 000001f
   0^M
   ^M
   
   ```
   
   The line with the chunk body is truncated to the width of my terminal, thus it doesn't contain all 3200 characters. But note that it does have the zero-lengthed chunk trailer in this case.
   
   I repeated the test by hand, now taking down the server so it is not reachable rather than having it reply with a 503. This time, when the server was down and the resource was greater than Cache-Control's max-age but less than max_stale_age, I notice that it replies with a content-length of 3200 rather than chunked encoding.
   
   Anyway, that's what I've tried so far. I'll keep thinking about what else to try to fit @lukenowak 's setup.
   
   @lukenowak : if reading the above you notice any configuration or behavior I might be missing to reproduce this, please let me know.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bneradt edited a comment on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
bneradt edited a comment on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-849062280


   I'm working on reproducing this but haven't been able to yet. We have a negative-revalidating test which I've modified like so:
   
   ```$ git diff
   diff --git a/tests/gold_tests/cache/negative-revalidating.test.py b/tests/gold_tests/cache/negative-revalidating.test.py
   index 1f4975206..901e65f95 100644
   --- a/tests/gold_tests/cache/negative-revalidating.test.py
   +++ b/tests/gold_tests/cache/negative-revalidating.test.py
   @@ -56,7 +56,14 @@ ts.Disk.records_config.update({
        # Negative revalidating is on by default. Verify this by leaving out the
        # following line and expect negative_revalidating to be enabled.
        # 'proxy.config.http.negative_revalidating_enabled': 1,
   -    'proxy.config.http.cache.max_stale_age': 6
   +    'proxy.config.http.cache.max_stale_age': 6,
   +
   +    # From issue.
   +    'proxy.config.http.negative_revalidating_lifetime': 0,
   +    'proxy.config.http.chunking_enabled': 1,
   +    'proxy.config.http.chunking.size': 4096,
   +    'proxy.config.http.keep_alive_no_activity_timeout_in': 10,
   +
    })
    tr = Test.AddTestRun("Verify negative revalidating behavior.")
    replay_file = "replay/negative-revalidating-enabled.replay.yaml"
   diff --git a/tests/gold_tests/cache/replay/negative-revalidating-enabled.replay.yaml b/tests/gold_tests/cache/replay/negative-revalidating-enabled.replay.yaml
   index 3fdb97149..aa660764b 100644
   --- a/tests/gold_tests/cache/replay/negative-revalidating-enabled.replay.yaml
   +++ b/tests/gold_tests/cache/replay/negative-revalidating-enabled.replay.yaml
   @@ -42,8 +42,9 @@ sessions:
          reason: "OK"
          headers:
            fields:
   -        - [ Content-Length, 32 ]
            - [ Cache-Control, max-age=2 ]
   +      content:
   +        size: 3200
    
        proxy-response:
          status: 200
   ```
   
   That's a modification of this test:
   https://github.com/apache/trafficserver/blob/050b2dfe9a89348b4105183a2b31e7cc59fc389d/tests/gold_tests/cache/negative-revalidating.test.py
   
   Note I modified the server to not contain a content-length header in its response and I increased the body size to be greater than 2048 (3200 bytes). I then ran the test. The first three transactions do this:
   
   1. Populate the cache with a response to a GET request.
   2. Repeat the GET immediately request and make sure it is served out of cache when the resource is not stale.
   3. Wait 4 seconds so that the cached resource is stale (greater than the 2 second max-age), but younger than max_stale_age.
   4. Repeat the GET.
   5. Server responds with a 503.
   6. ATS replies out of cache due to negative_revalidating.
   
   Looking at the proxy verifier output, I see the following:
   1. The very first response is chunked encoded from ATS with a c80 chunk header (hex for 3200).
   2. The second one, responded out of cache while not stale, has a content-length field value of 3200.
   3. The third one is served out of the cache while stale due to negative revalidating, and it is chunk encoded and has the appropriate trailer:
   
   ```
      [0]: Received an HTTP/1 200 response for key 3 with headers:
   - "cache-control": "max-age=2"
   - "Date": "Wed, 26 May 2021 19:02:26 GMT"
   - "Expires": "Wed, 26 May 2021 19:02:31 GMT"
   - "Transfer-Encoding": "chunked"
   - "Connection": "keep-alive"
   - "Server": "ATS/10.0.0"
   
      [0]: Drained 3200 chunked body bytes with chunk stream: c80^M
   0000000 0000001 0000002 0000003 0000004 0000005 0000006 0000007 0000008 0000009 000000a 000000b 000000c 000000d 000000e 000000f 0000010 0000011 0000012 0000013 0000014 0000015 0000016 0000017 0000018 0000019 000001a 000001b 000001c 000001d 000001e 000001f
   0^M
   ^M
   
   ```
   
   The line with the chunk body is truncated to the width of my terminal, thus it doesn't contain all 3200 characters. But note that it does have the zero-lengthed chunk trailer in this case.
   
   I repeated the test by hand, now taking down the server so it is not reachable rather than having it reply with a 503. This time, when the server was down and the resource was greater than Cache-Control's max-age but less than max_stale_age, I notice that it replies with a content-length of 3200 rather than chunked encoding.
   
   Anyway, that's what I've tried so far. I'll keep thinking about what else to try to fit @lukenowak 's setup.
   
   @lukenowak : if reading the above you notice any configuration or behavior I might be missing to reproduce this, please let me know.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] lukenowak commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
lukenowak commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-849558676


   @bneradt here goes the full records.config, configure options and the way how we start. We use Debian 10 AMD64.
   
   I am ready to provide all what's needed for you to being able to reproduce it.
   
   [build.txt](https://github.com/apache/trafficserver/files/6553606/build.txt)
   [records.config.txt](https://github.com/apache/trafficserver/files/6553608/records.config.txt)
   [startup.txt](https://github.com/apache/trafficserver/files/6553612/startup.txt)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] vpelletier commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
vpelletier commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-849325845


   > the origin is not necessarily down from a network perspective
   
   Aha ! And if I actually stop origin (hence making ATS notice an immediate "connection refused"), then the issue does not manifest itself, and it is actually served in a single body with a Content-Length visibly generated by ATS itself (there is no such header visible in an `http://{cache}/` lookup).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] vpelletier edited a comment on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
vpelletier edited a comment on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-849325845


   > the origin is not necessarily down from a network perspective
   
   Aha ! And if I actually stop origin (hence making ATS notice an immediate "connection refused"), then the issue does not manifest itself, and it is actually served in a single body (EDIT: by which I mean "not chunk-encoded") with a Content-Length visibly generated by ATS itself (there is no such header visible in an `http://{cache}/` lookup).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bneradt commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
bneradt commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-850634583


   @lukenowak and @vpelletier: thank you for the detailed information. Using your data I added a regression test for this issue. It turns out that @shinrich has submitted a fix for this issue in #7577. I've requested backports for the fix for this to go into the next 8.1.x and 9.0.x releases.
   
   I'll leave this issue open until the following cherry-picks are merged in:
   * https://github.com/apache/trafficserver/pull/7908
   * The 9.0.x cherry-pick for https://github.com/apache/trafficserver/pull/7577


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bneradt commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
bneradt commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-850594062


   I've verified that the following PR, when I locally cherry-pick it to 9.0.x in my dev environment, fixes this issue:
   https://github.com/apache/trafficserver/pull/7577


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] lukenowak commented on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
lukenowak commented on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-854633953


   Thanks @bneradt . Is there a chance that this fixes will end up in 9.0.2? And do you have any due date for this release? I saw there are no dates on milestones (I assume you'll release, when it's ready).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bneradt edited a comment on issue #7880: Trafficserver holds chunked response for backend down case

Posted by GitBox <gi...@apache.org>.
bneradt edited a comment on issue #7880:
URL: https://github.com/apache/trafficserver/issues/7880#issuecomment-850634583


   @lukenowak and @vpelletier: thank you for the detailed information. Using your data I added a regression test for this issue. It turns out that @shinrich has submitted a fix for this issue in #7577. I've requested backports for the fix to go into the next 8.1.x and 9.0.x releases.
   
   I'll leave this issue open until the following cherry-picks are merged in:
   * https://github.com/apache/trafficserver/pull/7908
   * The 9.0.x cherry-pick for https://github.com/apache/trafficserver/pull/7577


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org