You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@trafficserver.apache.org by GitBox <gi...@apache.org> on 2020/10/01 23:58:24 UTC

[GitHub] [trafficserver] shinrich opened a new pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

shinrich opened a new pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237


   We first noticed this problem because we were using the per origin conntrack mechanism.  And one of the origins we had set for a limit of 3000 was getting stuck with the limit permanently set, so all requests for that origin were returning 502. Looking more closely, there were 3000 connections to that origin in the CLOSE-WAIT state (i.e. the peer had sent the FIN but ATS had not responded.
   
   We took one of the machines in this state out of rotation, created a core file, and looked at the cop_list on the net handler object on one of the threads.  Eventually found a netvc that was HTTP and pointing at our problem origin.  Looking at the history for the associated state machine, we see that it had been iterating on 
   
   ```
   {location = {file = 0x78acc0 "../../../../../../_vcs/trafficserver9/proxy/http/HttpSM.cc", 
           func = 0x78ee20 <HttpSM::state_send_server_request_header(int, void*)::__FUNCTION__> "state_send_server_request_header", line = 2026}, event = 105, reentrancy = 1}, {location = {
           file = 0x78acc0 "../../../../../../_vcs/trafficserver9/proxy/http/HttpSM.cc", func = 0x78e830 <HttpSM::handle_server_setup_error(int, void*)::__FUNCTION__> "handle_server_setup_error", line = 5573}, 
         event = 105, reentrancy = 1},
   ```
   For the whole history.   So presumably the inactivity_cop was sending the inactivity timeout event every 5 minutes (our setting for the default inactivity timeout), but the HttpSM was never getting killed.  So I created a one-off build that added a bool to the HttpSM and an assert to detected with the same state machine passed through state_send_server_request_header with the inactivity timeout twice.  Once we got a core from that, we could see that the client had been using HTTP2 and was sending a POST.  The tunnel is still alive in this case. It is set up as the post tunnel. The consumer side is finished (alive = false). It has written 2822 of 2822. The producer side is not finished (alive = true). It has read 0 of 2822.  In the ua_txn recv_end_stream is false.  So the HTTP/2 client has sent all the expected data (content-length had been set to 2822) but it did not send a DATA frame with the EOS bit set.  A bad client, but it should not have caused this zombie state.
   
   This PR fixes this issue in our production.  We see no CLOSE-WAIT connections for the origin in question.
   
   The key change is adjusting the logic for HttpTunnel::is_tunnel_alive. The original logic will return true (alive) if any producer or any consumer is alive. This was causing the problem in handle_server_setup_error. The tunnel had 1 alive producer the (the HTTP/2 post user agent) but the consumer had completed. All the post data had been passed to the server, but the HTTP/2 user agent had not received the EOS bit, so it would be stuck in this state forever.
   
   Since the tunnel was alive handle_server_setup_error assumed there was at least one active consumer left, so it would defer sending the error message until the consumer (which does not exist) completed.
   
   I changed the is_tunnel_alive logic logic to only check for active consumers (or an active producer with not consumers which occurs if post buffering is enabled). This causes the handle_server_setup_error to complete and set up an error response even if the tunnel producer is still active.
   
   I had to adjust a few other asserts/checks to deal with slow EOS READ_COMPLETES. My branch passes autest and regression tests and has been running merrily on one of our boxes in the problem colo today..


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich edited a comment on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich edited a comment on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-704299762


   @masaori335 and I have been trying to augment h2spec to reproduce the problem I was seeing in our production.  We are applying patches to https://github.com/summerwind/h2spec.  And then running some variant of "./h2spec -h localhost -p 4443 -P "/post" -t -k http2/8.1.2.6/3" against our traffic_server with the appropriate remap rule for the URL.  I had my config remap /post to https://httpbin.org/post.
   
   @masaori335's original patch is attached as no-eos.txt.  With this patch, ATS processed the no-eos's POST request and returned a response.  The Post tunnel followed the pre-complete path.  All of the data had been read from the client before the tunnel run started.  So the producer was marked as completed and the consumer read all the data and the server returned a response.  However after a while (10-30 seconds), ATS crashes with an inactivity timeout triggers on a HttpTunnel continuation which belongs to a deleted HttpSM.  But then I realized I had my config set to debug another issue. With that fixed, no crash.
   
   So I added a 2 second delay between sending the request header and sending the data.  That avoided the pre-complete, so the tunnel got into the state of an active producer but a completed consumer.  However the h2spec client shutdown shortly there after. So the client EOS was processed before any timeout occurred.  
   
   Then I added a 10 second delay between sending the post data and processing the response.  This allowed the default inactivity timeout to trigger (which I had set to 10 seconds) to trigger. The client exited after one iteration of the time out, but by that time ATS continued iterating every 10 seconds on state_send_server_request_header and handle_server_setup_error.  My final version of the patch is attached in no-eos-delays.txt
   
   
   [no-eos.txt](https://github.com/apache/trafficserver/files/5334394/no-eos.txt)
   [no-eos-delays.txt](https://github.com/apache/trafficserver/files/5334396/no-eos-delays.txt)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on a change in pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on a change in pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#discussion_r505204250



##########
File path: proxy/http/HttpSM.cc
##########
@@ -2717,7 +2717,7 @@ HttpSM::tunnel_handler_post_or_put(HttpTunnelProducer *p)
     tunnel.reset();
     // When the ua completed sending it's data we must have
     //  removed it from the tunnel
-    ink_release_assert(ua_entry->in_tunnel == false);
+    ua_entry->in_tunnel     = false;

Review comment:
       Is this necessary?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-705708632


   We have this patch running on all of our ADN/CDN machines.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on a change in pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on a change in pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#discussion_r505188928



##########
File path: proxy/http/HttpTunnel.cc
##########
@@ -1332,6 +1332,9 @@ HttpTunnel::consumer_handler(int event, HttpTunnelConsumer *c)
     if (c->producer && c->producer->handler_state == 0) {
       if (event == VC_EVENT_WRITE_COMPLETE) {
         c->producer->handler_state = HTTP_SM_POST_SUCCESS;
+        // If the consumer completed, presumably the producer successfully read and is done
+        c->producer->read_success = true;

Review comment:
       Note: This seems the key change, making the producer done on the `VC_EVENT_WRITE_COMPLETE` event on the consumer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-702871154


   I would have liked to have an autest to exercise this case to catch regressions, but short of writing a C program using the nghttp2 library, I don't know how to create a bad H2 client that sends POST DATA frames but doesn't set the EOS bit.  @maskit @masaori335 do you know of HTTP/2 testing tools that would allow this?  Otherwise, I'll eventually put up a separate PR with the bad posting H2 client and an autest.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-708226222


   It's pretty complicated, I need time to understand this change. One thing I noticed is the default inactivity timeout is fired and it comes to`HttpSM::handle_server_setup_error` on the 9.0.x branch (5ec9efb3). OTOH, I might be missing something, it's not fired on the master branch (bb847ee0). It looks like the recent change fb0bf03bcc57fb4871c9b8f5eff2ecea6d99bafe made the difference.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on a change in pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on a change in pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#discussion_r505986062



##########
File path: proxy/http/HttpSM.cc
##########
@@ -2717,7 +2717,7 @@ HttpSM::tunnel_handler_post_or_put(HttpTunnelProducer *p)
     tunnel.reset();
     // When the ua completed sending it's data we must have
     //  removed it from the tunnel
-    ink_release_assert(ua_entry->in_tunnel == false);
+    ua_entry->in_tunnel     = false;

Review comment:
       OK, got it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-705708632


   We have this patch running on all of our ADN/CDN machines.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich edited a comment on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich edited a comment on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-704299762


   @masaori335 and I have been trying to augment h2spec to reproduce the problem I was seeing in our production.  We are applying patches to https://github.com/summerwind/h2spec.  And then running some variant of "./h2spec -h localhost -p 4443 -P "/post" -t -k http2/8.1.2.6/3" against our traffic_server with the appropriate remap rule for the URL.  I had my config remap /post to https://httpbin.org/post.
   
   @masaori335's original patch is attached as no-eos.txt.  With this patch, ATS processed the no-eos's POST request and returned a response.  The Post tunnel followed the pre-complete path.  All of the data had been read from the client before the tunnel run started.  So the producer was marked as completed and the consumer read all the data and the server returned a response.  However after a while (10-30 seconds), ATS crashes with an inactivity timeout triggers on a HttpTunnel continuation which belongs to a deleted HttpSM.  But then I realized I had my config set to debug another issue. With that fixed, no crash.
   
   So I added a 2 second delay between sending the request header and sending the data.  That avoided the pre-complete, so the tunnel got into the state of an active producer but a completed consumer.  However the h2spec client shutdown shortly there after. So the client EOS was processed before any timeout occurred.  
   
   Then I added a 20 second delay between sending the post data and processing the response.  This allowed the default inactivity timeout to trigger (which I had set to 10 seconds) to trigger. The client exited after one iteration of the time out, but by that time ATS continued iterating every 10 seconds on state_send_server_request_header and handle_server_setup_error.  My final version of the patch is attached in no-eos-delays.txt
   
   ```
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpSM.cc:1835 (state_http_server_open)> (http_ss) [2] TCP Handshake complete
   +++++++++ Proxy's Request after hooks +++++++++
   -- State Machine Id: 2
   POST /post HTTP/1.1
   Content-Length: 4000
   Host: httpbin.org
   Client-ip: 127.0.0.1
   X-Forwarded-For: 127.0.0.1
   Via: https/2 traffic_server[eaf1cc17-5d00-4e02-909d-5a20c6d923bb] (ApacheTrafficServer/10.0.0)
   
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpTunnel.cc:595 (add_producer)> (http_tunnel) [2] adding producer 'user agent post'
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpTunnel.cc:650 (add_consumer)> (http_tunnel) [2] adding consumer 'http server post'
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpTunnel.cc:697 (tunnel_run)> (http_tunnel) tunnel_run started, p_arg is provided
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpTunnel.cc:1307 (consumer_handler)> (http_tunnel) [2] consumer_handler [http server post VC_EVENT_WRITE_READY/TS_EVENT_VCONN_WRITE_READY]
   [Oct  6 14:32:33.035] [ET_NET 14] DEBUG: <Http2ClientSession.cc:457 (state_start_frame_read)> (http2_cs) [2] [&Http2ClientSession::state_start_frame_read, VC_EVENT_READ_READY/TS_EVENT_VCONN_READ_READY]
   [Oct  6 14:32:33.035] [ET_NET 14] DEBUG: <Http2ClientSession.cc:471 (do_start_frame_read)> (http2_cs) [2] receiving frame header
   [Oct  6 14:32:33.035] [ET_NET 14] DEBUG: <Http2ClientSession.cc:488 (do_start_frame_read)> (http2_cs) [2] frame header length=1000, type=0, flags=0x0, streamid=1
   [Oct  6 14:32:33.035] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:89 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame
   [Oct  6 14:32:33.035] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:176 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame: rwnd con=64535/65535 stream=64535/65535
   [Oct  6 14:32:33.035] [ET_NET 14] DEBUG: <HttpTunnel.cc:1101 (producer_handler)> (http_tunnel) [2] producer_handler [user agent post VC_EVENT_READ_READY/TS_EVENT_VCONN_READ_READY]
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <HttpTunnel.cc:1145 (producer_handler)> (http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 100, state: 0
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ClientSession.cc:471 (do_start_frame_read)> (http2_cs) [2] receiving frame header
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ClientSession.cc:488 (do_start_frame_read)> (http2_cs) [2] frame header length=1000, type=0, flags=0x0, streamid=1
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:89 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:176 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame: rwnd con=63535/65535 stream=63535/65535
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <HttpTunnel.cc:1101 (producer_handler)> (http_tunnel) [2] producer_handler [user agent post VC_EVENT_READ_READY/TS_EVENT_VCONN_READ_READY]
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <HttpTunnel.cc:1145 (producer_handler)> (http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 100, state: 0
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ClientSession.cc:471 (do_start_frame_read)> (http2_cs) [2] receiving frame header
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ClientSession.cc:488 (do_start_frame_read)> (http2_cs) [2] frame header length=1000, type=0, flags=0x0, streamid=1
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:89 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:176 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame: rwnd con=62535/65535 stream=62535/65535
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <HttpTunnel.cc:1101 (producer_handler)> (http_tunnel) [2] producer_handler [user agent post VC_EVENT_READ_READY/TS_EVENT_VCONN_READ_READY]
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <HttpTunnel.cc:1145 (producer_handler)> (http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 100, state: 0
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <Http2ClientSession.cc:471 (do_start_frame_read)> (http2_cs) [2] receiving frame header
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <Http2ClientSession.cc:488 (do_start_frame_read)> (http2_cs) [2] frame header length=1000, type=0, flags=0x0, streamid=1
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:89 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:176 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame: rwnd con=61535/65535 stream=61535/65535
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <HttpTunnel.cc:1101 (producer_handler)> (http_tunnel) [2] producer_handler [user agent post VC_EVENT_READ_READY/TS_EVENT_VCONN_READ_READY]
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <HttpTunnel.cc:1145 (producer_handler)> (http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 100, state: 0
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <HttpTunnel.cc:1307 (consumer_handler)> (http_tunnel) [2] consumer_handler [http server post VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <HttpSM.cc:3611 (tunnel_handler_post_server)> (http) [2] [&HttpSM::tunnel_handler_post_server, VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
   [Oct  6 14:32:55.035] [ET_NET 14] DEBUG: <Http2ClientSession.cc:349 (main_event_handler)> (http2_cs) [2] Closing event 104
   [Oct  6 14:32:55.036] [ET_NET 14] DEBUG: <Http2ClientSession.cc:276 (do_io_close)> (http2_cs) [2] session closed
   [Oct  6 14:32:55.036] [ET_NET 14] DEBUG: <Http2Stream.cc:421 (initiating_close)> (http2_stream) [2] [1] initiating_close
   [Oct  6 14:32:55.036] [ET_NET 14] DEBUG: <Http2Stream.cc:448 (initiating_close)> (http2_stream) [2] [1] handle write from destroy (event=103)
   [Oct  6 14:33:03.074] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:03.075] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:03.075] [ET_NET 14] DEBUG: <HttpSM.cc:5606 (handle_server_setup_error)> (http) [2] [&HttpSM::handle_server_setup_error, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:03.075] [ET_NET 14] DEBUG: <HttpSM.cc:5616 (handle_server_setup_error)> (http) [2] [handle_server_setup_error] forwarding event VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT to post tunnel
   [Oct  6 14:33:14.077] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:14.077] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:14.077] [ET_NET 14] DEBUG: <HttpSM.cc:5606 (handle_server_setup_error)> (http) [2] [&HttpSM::handle_server_setup_error, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:14.077] [ET_NET 14] DEBUG: <HttpSM.cc:5616 (handle_server_setup_error)> (http) [2] [handle_server_setup_error] forwarding event VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT to post tunnel
   [Oct  6 14:33:25.076] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:25.076] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:25.076] [ET_NET 14] DEBUG: <HttpSM.cc:5606 (handle_server_setup_error)> (http) [2] [&HttpSM::handle_server_setup_error, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:25.076] [ET_NET 14] DEBUG: <HttpSM.cc:5616 (handle_server_setup_error)> (http) [2] [handle_server_setup_error] forwarding event VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT to post tunnel
   [Oct  6 14:33:37.073] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:37.073] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:37.074] [ET_NET 14] DEBUG: <HttpSM.cc:5606 (handle_server_setup_error)> (http) [2] [&HttpSM::handle_server_setup_error, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:37.074] [ET_NET 14] DEBUG: <HttpSM.cc:5616 (handle_server_setup_error)> (http) [2] [handle_server_setup_error] forwarding event VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT to post tunnel
   [Oct  6 14:33:48.075] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:48.076] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:48.076] [ET_NET 14] DEBUG: <HttpSM.cc:5606 (handle_server_setup_error)> (http) [2] [&HttpSM::handle_server_setup_error, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:48.076] [ET_NET 14] DEBUG: <HttpSM.cc:5616 (handle_server_setup_error)> (http) [2] [handle_server_setup_error] forwarding event VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT to post tunnel
   [Oct  6 14:33:59.075] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:59.075] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:59.075] [ET_NET 14] DEBUG: <HttpSM.cc:5606 (handle_server_setup_error)> (http) [2] [&HttpSM::handle_server_setup_error, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:59.075] [ET_NET 14] DEBUG: <HttpSM.cc:5616 (handle_server_setup_error)> (http) [2] [handle_server_setup_error] forwarding event VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT to post tunnel
   ```
   
   
   [no-eos.txt](https://github.com/apache/trafficserver/files/5334394/no-eos.txt)
   [no-eos-delays.txt](https://github.com/apache/trafficserver/files/5334396/no-eos-delays.txt)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-703370076


   For the testing, another approach is adding a test case on h2spec. But I don't think the h2spec project will accept this.
   <details><summary>Bad client: no EOS</summary>
   
   ```
   	tg.AddTestCase(&spec.TestCase{
   		Desc:        "Bad client: POST without EOS",
   		Requirement: "Timeout",
   		Run: func(c *config.Config, conn *spec.Conn) error {
   			var streamID uint32 = 1
   
   			err := conn.Handshake()
   			if err != nil {
   				return err
   			}
   
   			headers := spec.CommonHeaders(c)
   			headers[0].Value = "POST"
   			headers = append(headers, spec.HeaderField("content-length", "4"))
   
   			hp := http2.HeadersFrameParam{
   				StreamID:      streamID,
   				EndStream:     false,
   				EndHeaders:    true,
   				BlockFragment: conn.EncodeHeaders(headers),
   			}
   
   			conn.WriteHeaders(hp)
   			conn.WriteData(streamID, false, []byte("test"))
   
   			return nil
   		},
   	})
   ```
   </details>
   
   Realistically, we need to write a special client in C or Python to integrate with AuTest.
   Unfortunately, `hyper-0.7.0` doesn't give us control of the EOS flag. But according to the example of `hyper-h2` ( the new version of the hyper) we can control the EOS flag on POST. A big concern is `hyper-h2` introduces dependency hell on our AuTest env.
   https://python-hyper.org/projects/hyper-h2/en/stable/twisted-post-example.html


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] zwoop commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
zwoop commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-715413158


   Cherry-picked to v9.0.x branch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-704315712


   Hmm.  I was just being impatient.  Only need to add the delay between the write header and the write body to make the failure case occur.   The sleep between the write data and process response is not needed.  The client will return and appear to timeout, but ATS will spin the HttpSM as described above.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-708018646


   I pushed another commit.  It addresses the assert you saw (case 2).  With that change, I didn't have to change HttpTunnel::is_tunnel_alive().   I also adjusted the h2spec test to not send a content-length, and with this change, the inactivity_timeout seemed to kick in as expected.
   
   The second commit includes changes in HttpSM::tunnel_handler_post_server to address crashes we started seeing once we deployed the first patch (a few crashes a day over the fleet).  The following stack trace.  Seemed to occur on POST with very large post bodies (2MB) with a content-length specified, but the bad client sent another data frame after the content-length number of bytes was sent.  Our internal patch includes an additional warning message to try an get additional information about our faulty client.
   
   ```
   #0  operator= (p=0x2b819957b600, this=0x5529e2a5c0e67277) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/P_IOBuffer.h:805
   #1  append_block_internal (b=0x2b819957b600, this=0x2b81994ea800) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/P_IOBuffer.h:805
   #2  append_block (b=0x2b819957b600, this=0x2b81994ea800) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/P_IOBuffer.h:823
   #3  MIOBuffer::write (this=this@entry=0x2b81994ea800, b=0x2b8380a47840, alen=alen@entry=8192, offset=<optimized out>) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/IOBuffer.cc:127
   #4  0x00000000007668fd in MIOBuffer::write (this=this@entry=0x2b81994ea800, r=r@entry=0x2b81702d0240, len=len@entry=8192, offset=offset@entry=0)
       at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/IOBuffer.cc:96
   #5  0x00000000005df8a0 in rcv_data_frame (cstate=..., frame=...) at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ConnectionState.cc:198
   #6  0x00000000005e49a4 in Http2ConnectionState::main_event_handler(int, void*) () at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ConnectionState.cc:1063
   #7  0x00000000005d9139 in handleEvent (data=0x2b7fa9908ee0, event=2253, this=0x2b8e76fc3100)
       at /sd/workspace/src/git.vzbuilders.com/Edge/build/_build/build_release_posix-x86_64_gcc_8/trafficserver9/build/../../../../_vcs/trafficserver9/iocore/eventsystem/I_Continuation.h:167
   #8  handleEvent (data=0x2b7fa9908ee0, event=2253, this=0x2b8e76fc3100)
       at /sd/workspace/src/git.vzbuilders.com/Edge/build/_build/build_release_posix-x86_64_gcc_8/trafficserver9/build/../../../../_vcs/trafficserver9/iocore/eventsystem/I_Continuation.h:163
   #9  send_connection_event(Continuation*, int, void*) () at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ClientSession.cc:65
   #10 0x00000000005d99f7 in Http2ClientSession::do_complete_frame_read (this=0x2b8e76fc2e00) at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ClientSession.cc:545
   #11 0x00000000005d9ee5 in Http2ClientSession::state_process_frame_read(int, VIO*, bool) () at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ClientSession.cc:564
   #12 0x00000000005da455 in Http2ClientSession::state_complete_frame_read (this=0x2b8e76fc2e00, event=100, edata=0x2b95baf0c2d8)
       at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ClientSession.cc:535
   #13 0x00000000005d9443 in Http2ClientSession::main_event_handler (this=0x2b8e76fc2e00, event=100, edata=0x2b95baf0c2d8) at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ClientSession.cc:331
   #14 0x000000000072bb83 in handleEvent (data=0x2b95baf0c2d8, event=100, this=0x2b8e76fc2e00)
       at /sd/workspace/src/git.vzbuilders.com/Edge/build/_build/build_release_posix-x86_64_gcc_8/trafficserver9/build/../../../../_vcs/trafficserver9/iocore/eventsystem/I_Continuation.h:167
   #15 handleEvent (data=0x2b95baf0c2d8, event=100, this=0x2b8e76fc2e00)
       at /sd/workspace/src/git.vzbuilders.com/Edge/build/_build/build_release_posix-x86_64_gcc_8/trafficserver9/build/../../../../_vcs/trafficserver9/iocore/eventsystem/I_Continuation.h:163
   #16 read_signal_and_update (vc=vc@entry=0x2b95baf0c100, event=event@entry=100) at ../../../../../../_vcs/trafficserver9/iocore/net/UnixNetVConnection.cc:83
   #17 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2b95baf0c100, event=event@entry=100) at ../../../../../../_vcs/trafficserver9/iocore/net/UnixNetVConnection.cc:1014
   #18 0x00000000007040d8 in SSLNetVConnection::net_read_io(NetHandler*, EThread*) () at ../../../../../../_vcs/trafficserver9/iocore/net/SSLNetVConnection.cc:672
   #19 0x000000000071fcbe in NetHandler::process_ready_list (this=this@entry=0x2b7fa3775360) at ../../../../../../_vcs/trafficserver9/iocore/net/UnixNet.cc:413
   #20 0x0000000000720130 in NetHandler::waitForActivity(long) () at ../../../../../../_vcs/trafficserver9/iocore/net/UnixNet.cc:548
   #21 0x000000000076c1e1 in EThread::execute_regular (this=this@entry=0x2b7fa37712c0) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/I_PriorityEventQueue.h:115
   #22 0x000000000076c436 in execute (this=0x2b7fa37712c0) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/UnixEThread.cc:332
   #23 EThread::execute (this=0x2b7fa37712c0) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/UnixEThread.cc:310
   #24 0x000000000076a6b9 in spawn_thread_internal (a=0x2b7f9de7f600) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/Thread.cc:92
   #25 0x00002b7f9c27bdd5 in start_thread () from /lib64/libpthread.so.0
   #26 0x00002b7f9cfb0ead in clone () from /lib64/libc.so.6
   ```
   
   In any case, the change in HttpSM::tunnel_handler_post_server seems good. The original do_io_read's are being set up with tunnel buffers that are very soon to be deleted.  I changed the do_io_read's to target the ua_buffer which should exist for the lifetime of the HttpSM.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-703946033


   I ran some additional test builds. The data with only EOS is going through just fine.  We are not exercising the EOS on trailer frame case.  I added an assert on the trailer case and that never triggers.
   
   I re-added my original assert on seeing the timeout twice on HttpSM::state_send_server_request_header.  We have definitely received no EOS from Data frame or Trailer in that case.
   
   I will set up another PR in the future with an autest to exercise this case.  I'll probably create a one-off C client using the nghttp2 library.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-708956679


   The key change looks good.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich edited a comment on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich edited a comment on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-704299762


   @masaori335 and I have been trying to augment h2spec to reproduce the problem I was seeing in our production.  We are applying patches to https://github.com/summerwind/h2spec.  And then running some variant of "./h2spec -h localhost -p 4443 -P "/post" -t -k http2/8.1.2.6/3" against our traffic_server with the appropriate remap rule for the URL.
   
   @masaori335's original patch is attached as no-eos.txt.  With this patch, ATS processed the no-eos's POST request and returned a response.  The Post tunnel followed the pre-complete path.  All of the data had been read from the client before the tunnel run started.  So the producer was marked as completed and the consumer read all the data and the server returned a response.  However after a while (10-30 seconds), ATS crashes with an inactivity timeout triggers on a HttpTunnel continuation which belongs to a deleted HttpSM.
   
   So I added a 2 second delay between sending the request header and sending the data.  That avoided the pre-complete, so the tunnel got into the state of an active producer but a completed consumer.  However the h2spec client shutdown shortly there after. So the client EOS was processed before any timeout occurred.  
   
   Then I added a 10 second delay between sending the post data and processing the response.  This allowed the default inactivity timeout to trigger (which I had set to 10 seconds) to trigger. The client exited after one iteration of the time out, but by that time ATS continued iterating every 10 seconds on state_send_server_request_header and handle_server_setup_error.  My final version of the patch is attached in no-eos-delays.txt
   
   
   [no-eos.txt](https://github.com/apache/trafficserver/files/5334394/no-eos.txt)
   [no-eos-delays.txt](https://github.com/apache/trafficserver/files/5334396/no-eos-delays.txt)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-703704097


   Fairly certain it is not a Data frame with only an EOS.  I have seen that in debugging and that case seems to get handled ok. I'm less certain of the trailing header.  As I recall, we don't handle that particularly well.  I'll add some debugging messages or metrics and give it a whirl.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on a change in pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich commented on a change in pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#discussion_r505794473



##########
File path: proxy/http/HttpSM.cc
##########
@@ -2717,7 +2717,7 @@ HttpSM::tunnel_handler_post_or_put(HttpTunnelProducer *p)
     tunnel.reset();
     // When the ua completed sending it's data we must have
     //  removed it from the tunnel
-    ink_release_assert(ua_entry->in_tunnel == false);
+    ua_entry->in_tunnel     = false;

Review comment:
       At least the release_assert needs to go away.  The redirect_post autest triggers it.  Once hit that, I think the safest thing is to mark ua_entry not in tunnel, so its contents get freed up during the HttpSM cleanup.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] maskit commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
maskit commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-703019499


   I've been using my own tool [h2cli](https://github.com/maskit/h2cli) for local testing. It provides a command line tool that generates H2 binary data from [a sequence file](https://github.com/maskit/h2cli/blob/master/sequence.json). You can send the data and parse the response like this:
   ```
   h2seq sequence.json | openssl s_client -alpn h2 -connect localhost:443 -quiet | h2res
   ```
   
   However, I guess we don't want to add a dependency for Node.js.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-706027170


   The patch reproduced the issue. I saw `VC_EVENT_READ_READY` on the producer, `VC_EVENT_WRITE_COMPLETE` on the consumer, and HttpSM got stalled.
   
   1. Why inactive timeout is not fired? 
   It looks like the stream/transaction is "inactive", I expect inactive timeout is fired and kill HttpSM.
   
   2. I got a crash with this change and the h2spec patch (debug build on macOS)
   
   ```
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <Http2ConnectionState.cc:176 (rcv_data_frame)> (http2_con) [0] [1] Received DATA frame: rwnd con=61535/65535 stream=61535/65535
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <HttpTunnel.cc:1101 (producer_handler)> (http_tunnel) [0] producer_handler [user agent post VC_EVENT_READ_READY/TS_EVENT_VCONN_READ_READY]
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <HttpTunnel.cc:1145 (producer_handler)> (http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 100, state: 0
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <HttpTunnel.cc:1307 (consumer_handler)> (http_tunnel) [0] consumer_handler [http server post VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <HttpSM.cc:3611 (tunnel_handler_post_server)> (http) [0] [&HttpSM::tunnel_handler_post_server, VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [0] [HttpSM::main_handler, HTTP_TUNNEL_EVENT_DONE]
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <HttpSM.cc:2737 (tunnel_handler_post)> (http) [0] [&HttpSM::tunnel_handler_post, HTTP_TUNNEL_EVENT_DONE]
   Fatal: HttpSM.cc:2714: failed assertion `p->read_success == true`
   Process 58006 stopped
   * thread #2, name = '[ET_NET 0]', stop reason = signal SIGABRT
       frame #0: 0x00007fff6a9a633a libsystem_kernel.dylib`__pthread_kill + 10
   libsystem_kernel.dylib`__pthread_kill:
   ->  0x7fff6a9a633a <+10>: jae    0x7fff6a9a6344            ; <+20>
       0x7fff6a9a633c <+12>: movq   %rax, %rdi
       0x7fff6a9a633f <+15>: jmp    0x7fff6a9a0629            ; cerror_nocancel
       0x7fff6a9a6344 <+20>: retq
   (lldb) bt
   * thread #2, name = '[ET_NET 0]', stop reason = signal SIGABRT
     * frame #0: 0x00007fff6a9a633a libsystem_kernel.dylib`__pthread_kill + 10
       frame #1: 0x00007fff6aa62e60 libsystem_pthread.dylib`pthread_kill + 430
       frame #2: 0x00007fff6a92d808 libsystem_c.dylib`abort + 120
       frame #3: 0x000000000006a04e libtscore.10.dylib`ink_abort(message_format="%s:%d: failed assertion `%s`") at ink_error.cc:99:3
       frame #4: 0x000000000006716c libtscore.10.dylib`::_ink_assert(expression="p->read_success == true", file="HttpSM.cc", line=2714) at ink_assert.cc:37:3
       frame #5: 0x00000001000e544e traffic_server`HttpSM::tunnel_handler_post_or_put(this=0x0000000007e9d3d0, p=0x0000000007e9e540) at HttpSM.cc:2714:5
       frame #6: 0x00000001000d2a3f traffic_server`HttpSM::tunnel_handler_post(this=0x0000000007e9d3d0, event=2301, data=0x0000000007e9e330) at HttpSM.cc:2791:5
       frame #7: 0x00000001000cd94a traffic_server`HttpSM::main_handler(this=0x0000000007e9d3d0, event=2301, data=0x0000000007e9e330) at HttpSM.cc:2656:5
       frame #8: 0x00000001000054bf traffic_server`Continuation::handleEvent(this=0x0000000007e9d3d0, event=2301, data=0x0000000007e9e330) at I_Continuation.h:219:12
       frame #9: 0x00000001001478af traffic_server`HttpTunnel::main_handler(this=0x0000000007e9e330, event=103, data=0x0000000007f5ce50) at HttpTunnel.cc:1629:11
       frame #10: 0x00000001000054bf traffic_server`Continuation::handleEvent(this=0x0000000007e9e330, event=103, data=0x0000000007f5ce50) at I_Continuation.h:219:12
       frame #11: 0x00000001003c5003 traffic_server`write_signal_and_update(event=103, vc=0x0000000007f5cbf0) at UnixNetVConnection.cc:115:25
       frame #12: 0x00000001003c527d traffic_server`write_signal_done(event=103, nh=0x0000000002ee0080, vc=0x0000000007f5cbf0) at UnixNetVConnection.cc:158:7
       frame #13: 0x00000001003c4a05 traffic_server`write_to_net_io(nh=0x0000000002ee0080, vc=0x0000000007f5cbf0, thread=0x0000000002edc000) at UnixNetVConnection.cc:500:7
       frame #14: 0x00000001003c3d73 traffic_server`write_to_net(nh=0x0000000002ee0080, vc=0x0000000007f5cbf0, thread=0x0000000002edc000) at UnixNetVConnection.cc:344:3
       frame #15: 0x00000001003c8385 traffic_server`UnixNetVConnection::net_write_io(this=0x0000000007f5cbf0, nh=0x0000000002ee0080, lthread=0x0000000002edc000) at UnixNetVConnection.cc:899:3
       frame #16: 0x00000001003b77ea traffic_server`NetHandler::process_ready_list(this=0x0000000002ee0080) at UnixNet.cc:429:11
       frame #17: 0x00000001003b8156 traffic_server`NetHandler::waitForActivity(this=0x0000000002ee0080, timeout=19997000) at UnixNet.cc:546:3
       frame #18: 0x00000001003f8b21 traffic_server`EThread::execute_regular(this=0x0000000002edc000) at UnixEThread.cc:266:14
       frame #19: 0x00000001003f8f9d traffic_server`EThread::execute(this=0x0000000002edc000) at UnixEThread.cc:327:11
       frame #20: 0x00000001003f7327 traffic_server`spawn_thread_internal(a=0x0000000000a20060) at Thread.cc:92:12
       frame #21: 0x00007fff6aa63109 libsystem_pthread.dylib`_pthread_start + 148
       frame #22: 0x00007fff6aa5eb8b libsystem_pthread.dylib`thread_start + 15
   ```
   
   3. This looks assuming that clients send the "Content-Length" header.
   When clients don't send the "Content-Length" header nor "END_STREAM" flag, the consumer can't get `WRITE_COMPLETE`.
   I saw HttpSM is stalled in this case too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich edited a comment on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich edited a comment on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-704299762


   @masaori335 and I have been trying to augment h2spec to reproduce the problem I was seeing in our production.  We are applying patches to https://github.com/summerwind/h2spec.  And then running some variant of "./h2spec -h localhost -p 4443 -P "/post" -t -k http2/8.1.2.6/3" against our traffic_server with the appropriate remap rule for the URL.  I had my config remap /post to https://httpbin.org/post.
   
   @masaori335's original patch is attached as no-eos.txt.  With this patch, ATS processed the no-eos's POST request and returned a response.  The Post tunnel followed the pre-complete path.  All of the data had been read from the client before the tunnel run started.  So the producer was marked as completed and the consumer read all the data and the server returned a response.  However after a while (10-30 seconds), ATS crashes with an inactivity timeout triggers on a HttpTunnel continuation which belongs to a deleted HttpSM.  But then I realized I had my config set to debug another issue. With that fixed, no crash.
   
   So I added a 2 second delay between sending the request header and sending the data.  That avoided the pre-complete, so the tunnel got into the state of an active producer but a completed consumer.  However the h2spec client shutdown shortly there after. So the client EOS was processed before any timeout occurred.  
   
   Then I added a 10 second delay between sending the post data and processing the response.  This allowed the default inactivity timeout to trigger (which I had set to 10 seconds) to trigger. The client exited after one iteration of the time out, but by that time ATS continued iterating every 10 seconds on state_send_server_request_header and handle_server_setup_error.  My final version of the patch is attached in no-eos-delays.txt
   
   ```
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpSM.cc:1835 (state_http_server_open)> (http_ss) [2] TCP Handshake complete
   +++++++++ Proxy's Request after hooks +++++++++
   -- State Machine Id: 2
   POST /post HTTP/1.1
   Content-Length: 4000
   Host: httpbin.org
   Client-ip: 127.0.0.1
   X-Forwarded-For: 127.0.0.1
   Via: https/2 traffic_server[eaf1cc17-5d00-4e02-909d-5a20c6d923bb] (ApacheTrafficServer/10.0.0)
   
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpTunnel.cc:595 (add_producer)> (http_tunnel) [2] adding producer 'user agent post'
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpTunnel.cc:650 (add_consumer)> (http_tunnel) [2] adding consumer 'http server post'
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpTunnel.cc:697 (tunnel_run)> (http_tunnel) tunnel_run started, p_arg is provided
   [Oct  6 14:32:31.180] [ET_NET 14] DEBUG: <HttpTunnel.cc:1307 (consumer_handler)> (http_tunnel) [2] consumer_handler [http server post VC_EVENT_WRITE_READY/TS_EVENT_VCONN_WRITE_READY]
   [Oct  6 14:32:33.035] [ET_NET 14] DEBUG: <Http2ClientSession.cc:457 (state_start_frame_read)> (http2_cs) [2] [&Http2ClientSession::state_start_frame_read, VC_EVENT_READ_READY/TS_EVENT_VCONN_READ_READY]
   [Oct  6 14:32:33.035] [ET_NET 14] DEBUG: <Http2ClientSession.cc:471 (do_start_frame_read)> (http2_cs) [2] receiving frame header
   [Oct  6 14:32:33.035] [ET_NET 14] DEBUG: <Http2ClientSession.cc:488 (do_start_frame_read)> (http2_cs) [2] frame header length=1000, type=0, flags=0x0, streamid=1
   [Oct  6 14:32:33.035] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:89 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame
   [Oct  6 14:32:33.035] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:176 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame: rwnd con=64535/65535 stream=64535/65535
   [Oct  6 14:32:33.035] [ET_NET 14] DEBUG: <HttpTunnel.cc:1101 (producer_handler)> (http_tunnel) [2] producer_handler [user agent post VC_EVENT_READ_READY/TS_EVENT_VCONN_READ_READY]
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <HttpTunnel.cc:1145 (producer_handler)> (http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 100, state: 0
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ClientSession.cc:471 (do_start_frame_read)> (http2_cs) [2] receiving frame header
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ClientSession.cc:488 (do_start_frame_read)> (http2_cs) [2] frame header length=1000, type=0, flags=0x0, streamid=1
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:89 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:176 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame: rwnd con=63535/65535 stream=63535/65535
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <HttpTunnel.cc:1101 (producer_handler)> (http_tunnel) [2] producer_handler [user agent post VC_EVENT_READ_READY/TS_EVENT_VCONN_READ_READY]
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <HttpTunnel.cc:1145 (producer_handler)> (http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 100, state: 0
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ClientSession.cc:471 (do_start_frame_read)> (http2_cs) [2] receiving frame header
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ClientSession.cc:488 (do_start_frame_read)> (http2_cs) [2] frame header length=1000, type=0, flags=0x0, streamid=1
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:89 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:176 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame: rwnd con=62535/65535 stream=62535/65535
   [Oct  6 14:32:33.036] [ET_NET 14] DEBUG: <HttpTunnel.cc:1101 (producer_handler)> (http_tunnel) [2] producer_handler [user agent post VC_EVENT_READ_READY/TS_EVENT_VCONN_READ_READY]
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <HttpTunnel.cc:1145 (producer_handler)> (http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 100, state: 0
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <Http2ClientSession.cc:471 (do_start_frame_read)> (http2_cs) [2] receiving frame header
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <Http2ClientSession.cc:488 (do_start_frame_read)> (http2_cs) [2] frame header length=1000, type=0, flags=0x0, streamid=1
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:89 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <Http2ConnectionState.cc:176 (rcv_data_frame)> (http2_con) [2] [1] Received DATA frame: rwnd con=61535/65535 stream=61535/65535
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <HttpTunnel.cc:1101 (producer_handler)> (http_tunnel) [2] producer_handler [user agent post VC_EVENT_READ_READY/TS_EVENT_VCONN_READ_READY]
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <HttpTunnel.cc:1145 (producer_handler)> (http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 100, state: 0
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <HttpTunnel.cc:1307 (consumer_handler)> (http_tunnel) [2] consumer_handler [http server post VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
   [Oct  6 14:32:33.037] [ET_NET 14] DEBUG: <HttpSM.cc:3611 (tunnel_handler_post_server)> (http) [2] [&HttpSM::tunnel_handler_post_server, VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
   [Oct  6 14:32:55.035] [ET_NET 14] DEBUG: <Http2ClientSession.cc:349 (main_event_handler)> (http2_cs) [2] Closing event 104
   [Oct  6 14:32:55.036] [ET_NET 14] DEBUG: <Http2ClientSession.cc:276 (do_io_close)> (http2_cs) [2] session closed
   [Oct  6 14:32:55.036] [ET_NET 14] DEBUG: <Http2Stream.cc:421 (initiating_close)> (http2_stream) [2] [1] initiating_close
   [Oct  6 14:32:55.036] [ET_NET 14] DEBUG: <Http2Stream.cc:448 (initiating_close)> (http2_stream) [2] [1] handle write from destroy (event=103)
   [Oct  6 14:33:03.074] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:03.075] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:03.075] [ET_NET 14] DEBUG: <HttpSM.cc:5606 (handle_server_setup_error)> (http) [2] [&HttpSM::handle_server_setup_error, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:03.075] [ET_NET 14] DEBUG: <HttpSM.cc:5616 (handle_server_setup_error)> (http) [2] [handle_server_setup_error] forwarding event VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT to post tunnel
   [Oct  6 14:33:14.077] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:14.077] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:14.077] [ET_NET 14] DEBUG: <HttpSM.cc:5606 (handle_server_setup_error)> (http) [2] [&HttpSM::handle_server_setup_error, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:14.077] [ET_NET 14] DEBUG: <HttpSM.cc:5616 (handle_server_setup_error)> (http) [2] [handle_server_setup_error] forwarding event VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT to post tunnel
   [Oct  6 14:33:25.076] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:25.076] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:25.076] [ET_NET 14] DEBUG: <HttpSM.cc:5606 (handle_server_setup_error)> (http) [2] [&HttpSM::handle_server_setup_error, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:25.076] [ET_NET 14] DEBUG: <HttpSM.cc:5616 (handle_server_setup_error)> (http) [2] [handle_server_setup_error] forwarding event VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT to post tunnel
   [Oct  6 14:33:37.073] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:37.073] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:37.074] [ET_NET 14] DEBUG: <HttpSM.cc:5606 (handle_server_setup_error)> (http) [2] [&HttpSM::handle_server_setup_error, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:37.074] [ET_NET 14] DEBUG: <HttpSM.cc:5616 (handle_server_setup_error)> (http) [2] [handle_server_setup_error] forwarding event VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT to post tunnel
   [Oct  6 14:33:48.075] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:48.076] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:48.076] [ET_NET 14] DEBUG: <HttpSM.cc:5606 (handle_server_setup_error)> (http) [2] [&HttpSM::handle_server_setup_error, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:48.076] [ET_NET 14] DEBUG: <HttpSM.cc:5616 (handle_server_setup_error)> (http) [2] [handle_server_setup_error] forwarding event VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT to post tunnel
   [Oct  6 14:33:59.075] [ET_NET 14] DEBUG: <HttpSM.cc:2636 (main_handler)> (http) [2] [HttpSM::main_handler, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:59.075] [ET_NET 14] DEBUG: <HttpSM.cc:2056 (state_send_server_request_header)> (http) [2] [&HttpSM::state_send_server_request_header, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:59.075] [ET_NET 14] DEBUG: <HttpSM.cc:5606 (handle_server_setup_error)> (http) [2] [&HttpSM::handle_server_setup_error, VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT]
   [Oct  6 14:33:59.075] [ET_NET 14] DEBUG: <HttpSM.cc:5616 (handle_server_setup_error)> (http) [2] [handle_server_setup_error] forwarding event VC_EVENT_INACTIVITY_TIMEOUT/TS_EVENT_VCONN_INACTIVITY_TIMEOUT to post tunnel
   ```
   
   
   [no-eos.txt](https://github.com/apache/trafficserver/files/5334394/no-eos.txt)
   [no-eos-delays.txt](https://github.com/apache/trafficserver/files/5334396/no-eos-delays.txt)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on a change in pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on a change in pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#discussion_r505188928



##########
File path: proxy/http/HttpTunnel.cc
##########
@@ -1332,6 +1332,9 @@ HttpTunnel::consumer_handler(int event, HttpTunnelConsumer *c)
     if (c->producer && c->producer->handler_state == 0) {
       if (event == VC_EVENT_WRITE_COMPLETE) {
         c->producer->handler_state = HTTP_SM_POST_SUCCESS;
+        // If the consumer completed, presumably the producer successfully read and is done
+        c->producer->read_success = true;

Review comment:
       Note: This seems the key change, forcing to make the producer done on the `VC_EVENT_WRITE_COMPLETE` event on the consumer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 edited a comment on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 edited a comment on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-708226222


   It's pretty complicated, I need time to understand this change. One thing I noticed is the default inactivity timeout is fired and it comes to`HttpSM::handle_server_setup_error` on the 9.0.x branch (5ec9efb3). OTOH, I might be missing something, it's not fired on the master branch (bb847ee0). It looks like the recent change fb0bf03bcc57fb4871c9b8f5eff2ecea6d99bafe made the difference.
   ( I set `CONFIG proxy.config.net.default_inactivity_timeout INT 5`, not sure other timeouts will be kicked in. )


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-706027170






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-716229414


   I tried the h2spec test above against the 8.1.x branch (5ce3c75e), but the HttpSM was killed correctly. It's unclear which commit made the difference, but we don't need to backport this on the 8.1.x branch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich merged pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich merged pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-708444588


   Interesting,  I'll take a look at fb0bf03bcc57fb4871c9b8f5eff2ecea6d99bafe.  I had only been doing the h2spec tests against my branch on master, so I did not see the default inactivity timeout trigger.
   
   There have been some recent changes to adapt to other changes in default inactivity timeout.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-704680035


   @shinrich Thanks for setting up repro. I'll run it and review this one in this week.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich edited a comment on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich edited a comment on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-704299762


   @masaori335 and I have been trying to augment h2spec to reproduce the problem I was seeing in our production.  We are applying patches to https://github.com/summerwind/h2spec.  And then running some variant of "./h2spec -h localhost -p 4443 -P "/post" -t -k http2/8.1.2.6/3" against our traffic_server with the appropriate remap rule for the URL.
   
   @masaori335's original patch is attached as no-eos.txt.  With this patch, ATS processed the no-eos's POST request and returned a response.  The Post tunnel followed the pre-complete path.  All of the data had been read from the client before the tunnel run started.  So the producer was marked as completed and the consumer read all the data and the server returned a response.
   
   So I added a 2 second delay between sending the request header and sending the data.  That avoided the pre-complete, so the tunnel got into the state of an active producer but a completed consumer.  However the h2spec client shutdown shortly there after. So the client EOS was processed before any timeout occurred.  
   
   Then I added a 10 second delay between sending the post data and processing the response.  This allowed the default inactivity timeout to trigger (which I had set to 10 seconds) to trigger. The client exited after one iteration of the time out, but by that time ATS continued iterating every 10 seconds on state_send_server_request_header and handle_server_setup_error.  My final version of the patch is attached in no-eos-delays.txt
   
   
   [no-eos.txt](https://github.com/apache/trafficserver/files/5334394/no-eos.txt)
   [no-eos-delays.txt](https://github.com/apache/trafficserver/files/5334396/no-eos-delays.txt)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich edited a comment on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich edited a comment on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-704299762


   @masaori335 and I have been trying to augment h2spec to reproduce the problem I was seeing in our production.  We are applying patches to https://github.com/summerwind/h2spec.  And then running some variant of "./h2spec -h localhost -p 4443 -P "/post" -t -k http2/8.1.2.6/3" against our traffic_server with the appropriate remap rule for the URL.
   
   @masaori335's original patch is attached as no-eos.txt.  With this patch, ATS processed the no-eos's POST request and returned a response.  The Post tunnel followed the pre-complete path.  All of the data had been read from the client before the tunnel run started.  So the producer was marked as completed and the consumer read all the data and the server returned a response.  However after a while (10-30 seconds), ATS crashes with an inactivity timeout triggers on a HttpTunnel continuation which belongs to a deleted HttpSM.  But then I realized I had my config set to debug another issue. With that fixed, no crash.
   
   So I added a 2 second delay between sending the request header and sending the data.  That avoided the pre-complete, so the tunnel got into the state of an active producer but a completed consumer.  However the h2spec client shutdown shortly there after. So the client EOS was processed before any timeout occurred.  
   
   Then I added a 10 second delay between sending the post data and processing the response.  This allowed the default inactivity timeout to trigger (which I had set to 10 seconds) to trigger. The client exited after one iteration of the time out, but by that time ATS continued iterating every 10 seconds on state_send_server_request_header and handle_server_setup_error.  My final version of the patch is attached in no-eos-delays.txt
   
   
   [no-eos.txt](https://github.com/apache/trafficserver/files/5334394/no-eos.txt)
   [no-eos-delays.txt](https://github.com/apache/trafficserver/files/5334396/no-eos-delays.txt)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-708890372


   I confirmed that case 3 (no Content-Length header) doesn't a problem with the current 9.0.x branch. The  `HttpSM::handle_server_setup_error` close the connection on inactivity timeout, because the tunnel consumer is "alive".
   This issue happens only if a client sends the Content-Length header and doesn't send END_STREAM flag.
   What I observed was default inactivity timeout is not fired on master as I reported yesterday. Sorry for the confusion.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-703361375


   > In the ua_txn recv_end_stream is false. So the HTTP/2 client has sent all the expected data (content-length had been set to 2822) but it did not send a DATA frame with the EOS bit set. 
   
   Could you make sure this is not a case of the trailing header nor pure end-stream (EOS on 0-length DATA frame)?
   
   > A bad client, but it should not have caused this zombie state.
   
   Agreed. We need to fix, anyway.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on a change in pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich commented on a change in pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#discussion_r505755464



##########
File path: proxy/http/HttpSM.cc
##########
@@ -2717,7 +2717,7 @@ HttpSM::tunnel_handler_post_or_put(HttpTunnelProducer *p)
     tunnel.reset();
     // When the ua completed sending it's data we must have
     //  removed it from the tunnel
-    ink_release_assert(ua_entry->in_tunnel == false);
+    ua_entry->in_tunnel     = false;

Review comment:
       May no longer be necessary.  I ended up changing the main solution from the first iteration of the PR. Let me try backing out this change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on a change in pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on a change in pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#discussion_r505188928



##########
File path: proxy/http/HttpTunnel.cc
##########
@@ -1332,6 +1332,9 @@ HttpTunnel::consumer_handler(int event, HttpTunnelConsumer *c)
     if (c->producer && c->producer->handler_state == 0) {
       if (event == VC_EVENT_WRITE_COMPLETE) {
         c->producer->handler_state = HTTP_SM_POST_SUCCESS;
+        // If the consumer completed, presumably the producer successfully read and is done
+        c->producer->read_success = true;

Review comment:
       Note: This seems the key change, forcing to make the producer done on the `VC_EVENT_WRITE_COMPLETE` event.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on a change in pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich commented on a change in pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#discussion_r532698764



##########
File path: proxy/http/HttpTunnel.cc
##########
@@ -1332,6 +1332,9 @@ HttpTunnel::consumer_handler(int event, HttpTunnelConsumer *c)
     if (c->producer && c->producer->handler_state == 0) {
       if (event == VC_EVENT_WRITE_COMPLETE) {
         c->producer->handler_state = HTTP_SM_POST_SUCCESS;
+        // If the consumer completed, presumably the producer successfully read and is done
+        c->producer->read_success = true;

Review comment:
       Sorry, not sure what you mean by flagging your work.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 removed a comment on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 removed a comment on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-708956679


   The key change looks good.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich edited a comment on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich edited a comment on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-708018646


   I pushed another commit.  It addresses the assert you saw (case 2).  With that change, I didn't have to change HttpTunnel::is_tunnel_alive().   I also adjusted the h2spec test to not send a content-length, and with this change, the inactivity_timeout seemed to kick in as expected (case 3).
   
   The second commit includes changes in HttpSM::tunnel_handler_post_server to address crashes we started seeing once we deployed the first patch (a few crashes a day over the fleet).  The following stack trace.  Seemed to occur on POST with very large post bodies (2MB) with a content-length specified, but the bad client sent another data frame after the content-length number of bytes was sent.  Our internal patch includes an additional warning message to try an get additional information about our faulty client.
   
   ```
   #0  operator= (p=0x2b819957b600, this=0x5529e2a5c0e67277) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/P_IOBuffer.h:805
   #1  append_block_internal (b=0x2b819957b600, this=0x2b81994ea800) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/P_IOBuffer.h:805
   #2  append_block (b=0x2b819957b600, this=0x2b81994ea800) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/P_IOBuffer.h:823
   #3  MIOBuffer::write (this=this@entry=0x2b81994ea800, b=0x2b8380a47840, alen=alen@entry=8192, offset=<optimized out>) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/IOBuffer.cc:127
   #4  0x00000000007668fd in MIOBuffer::write (this=this@entry=0x2b81994ea800, r=r@entry=0x2b81702d0240, len=len@entry=8192, offset=offset@entry=0)
       at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/IOBuffer.cc:96
   #5  0x00000000005df8a0 in rcv_data_frame (cstate=..., frame=...) at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ConnectionState.cc:198
   #6  0x00000000005e49a4 in Http2ConnectionState::main_event_handler(int, void*) () at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ConnectionState.cc:1063
   #7  0x00000000005d9139 in handleEvent (data=0x2b7fa9908ee0, event=2253, this=0x2b8e76fc3100)
       at /sd/workspace/src/git.vzbuilders.com/Edge/build/_build/build_release_posix-x86_64_gcc_8/trafficserver9/build/../../../../_vcs/trafficserver9/iocore/eventsystem/I_Continuation.h:167
   #8  handleEvent (data=0x2b7fa9908ee0, event=2253, this=0x2b8e76fc3100)
       at /sd/workspace/src/git.vzbuilders.com/Edge/build/_build/build_release_posix-x86_64_gcc_8/trafficserver9/build/../../../../_vcs/trafficserver9/iocore/eventsystem/I_Continuation.h:163
   #9  send_connection_event(Continuation*, int, void*) () at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ClientSession.cc:65
   #10 0x00000000005d99f7 in Http2ClientSession::do_complete_frame_read (this=0x2b8e76fc2e00) at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ClientSession.cc:545
   #11 0x00000000005d9ee5 in Http2ClientSession::state_process_frame_read(int, VIO*, bool) () at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ClientSession.cc:564
   #12 0x00000000005da455 in Http2ClientSession::state_complete_frame_read (this=0x2b8e76fc2e00, event=100, edata=0x2b95baf0c2d8)
       at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ClientSession.cc:535
   #13 0x00000000005d9443 in Http2ClientSession::main_event_handler (this=0x2b8e76fc2e00, event=100, edata=0x2b95baf0c2d8) at ../../../../../../_vcs/trafficserver9/proxy/http2/Http2ClientSession.cc:331
   #14 0x000000000072bb83 in handleEvent (data=0x2b95baf0c2d8, event=100, this=0x2b8e76fc2e00)
       at /sd/workspace/src/git.vzbuilders.com/Edge/build/_build/build_release_posix-x86_64_gcc_8/trafficserver9/build/../../../../_vcs/trafficserver9/iocore/eventsystem/I_Continuation.h:167
   #15 handleEvent (data=0x2b95baf0c2d8, event=100, this=0x2b8e76fc2e00)
       at /sd/workspace/src/git.vzbuilders.com/Edge/build/_build/build_release_posix-x86_64_gcc_8/trafficserver9/build/../../../../_vcs/trafficserver9/iocore/eventsystem/I_Continuation.h:163
   #16 read_signal_and_update (vc=vc@entry=0x2b95baf0c100, event=event@entry=100) at ../../../../../../_vcs/trafficserver9/iocore/net/UnixNetVConnection.cc:83
   #17 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2b95baf0c100, event=event@entry=100) at ../../../../../../_vcs/trafficserver9/iocore/net/UnixNetVConnection.cc:1014
   #18 0x00000000007040d8 in SSLNetVConnection::net_read_io(NetHandler*, EThread*) () at ../../../../../../_vcs/trafficserver9/iocore/net/SSLNetVConnection.cc:672
   #19 0x000000000071fcbe in NetHandler::process_ready_list (this=this@entry=0x2b7fa3775360) at ../../../../../../_vcs/trafficserver9/iocore/net/UnixNet.cc:413
   #20 0x0000000000720130 in NetHandler::waitForActivity(long) () at ../../../../../../_vcs/trafficserver9/iocore/net/UnixNet.cc:548
   #21 0x000000000076c1e1 in EThread::execute_regular (this=this@entry=0x2b7fa37712c0) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/I_PriorityEventQueue.h:115
   #22 0x000000000076c436 in execute (this=0x2b7fa37712c0) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/UnixEThread.cc:332
   #23 EThread::execute (this=0x2b7fa37712c0) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/UnixEThread.cc:310
   #24 0x000000000076a6b9 in spawn_thread_internal (a=0x2b7f9de7f600) at ../../../../../../_vcs/trafficserver9/iocore/eventsystem/Thread.cc:92
   #25 0x00002b7f9c27bdd5 in start_thread () from /lib64/libpthread.so.0
   #26 0x00002b7f9cfb0ead in clone () from /lib64/libc.so.6
   ```
   
   In any case, the change in HttpSM::tunnel_handler_post_server seems good. The original do_io_read's are being set up with tunnel buffers that are very soon to be deleted.  I changed the do_io_read's to target the ua_buffer which should exist for the lifetime of the HttpSM.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-707970683


   Ah, I rarely run with --enable-debug. Actually I did try on this branch and typo'ed it.  
   
   I will figure out the case 3.  Agreed that must also be solved.
   
   We ran into another latent crash uncovered by this change.  I'll update this PR with that fix as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on a change in pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on a change in pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#discussion_r505188928



##########
File path: proxy/http/HttpTunnel.cc
##########
@@ -1332,6 +1332,9 @@ HttpTunnel::consumer_handler(int event, HttpTunnelConsumer *c)
     if (c->producer && c->producer->handler_state == 0) {
       if (event == VC_EVENT_WRITE_COMPLETE) {
         c->producer->handler_state = HTTP_SM_POST_SUCCESS;
+        // If the consumer completed, presumably the producer successfully read and is done
+        c->producer->read_success = true;

Review comment:
       Note: This seems the key change, making the producer done on the `VC_EVENT_WRITE_COMPLETE` event on the consumer.
   This relaxes the restriction of the END_STREAM flag instead of making errors on inactivity timeout.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-704299762


   @masaori335 and I have been trying to augment h2spec to reproduce the problem I was seeing in our production.  We are applying patches to https://github.com/summerwind/h2spec.  And then running some variant of "./h2spec -h localhost -p 4443 -P "/post" -t -k http2/8.1.2.6/3" against our traffic_server with the appropriate remap rule for the URL.
   
   @masaori335's original patch is attached as no-eos.txt.  With this patch, ATS processed the no-eos's POST request and returned a response.  The Post tunnel followed the pre-complete path.  All of the data had been read from the client before the tunnel run started.  So the producer was marked as completed and the consumer read all the data and the server returned a response.
   
   So I added a 2 second delay between sending the request header and sending the data.  That avoided the pre-complete, so the tunnel got into the state of an active producer but a completed consumer.  However the h2spec client shutdown shortly there after. So the client EOS was processed before any timeout occurred.  
   
   Then I added a 10 second delay between sending the post data and processing the response.  This allowed the default inactivity timeout to trigger (which I had set to 10 seconds) to trigger. The client exited after one iteration of the time out, but by that time ATS continued iterating every 10 seconds on state_send_server_request_header and handle_server_setup_error.
   
   
   [no-eos.txt](https://github.com/apache/trafficserver/files/5334394/no-eos.txt)
   [no-eos-delays.txt](https://github.com/apache/trafficserver/files/5334396/no-eos-delays.txt)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 edited a comment on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 edited a comment on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-708226222


   It's pretty complicated, I need time to understand this change. One thing I noticed is the default inactivity timeout is fired and it comes to`HttpSM::handle_server_setup_error` on the 9.0.x branch (5ec9efb3). OTOH, I might be missing something, it's not fired on the master branch (bb847ee0). It looks like the recent change fb0bf03bcc57fb4871c9b8f5eff2ecea6d99bafe made the difference.
   ( I set `CONFIG proxy.config.net.default_inactivity_timeout INT 5`, not sure other inactive timeouts will be kicked in. )


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bigcook12345 commented on a change in pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
bigcook12345 commented on a change in pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#discussion_r527680934



##########
File path: proxy/http/HttpTunnel.cc
##########
@@ -1332,6 +1332,9 @@ HttpTunnel::consumer_handler(int event, HttpTunnelConsumer *c)
     if (c->producer && c->producer->handler_state == 0) {
       if (event == VC_EVENT_WRITE_COMPLETE) {
         c->producer->handler_state = HTTP_SM_POST_SUCCESS;
+        // If the consumer completed, presumably the producer successfully read and is done
+        c->producer->read_success = true;

Review comment:
       So you found and fixed an issue i came across...
   
   You merged it, and flagged my work? May i ask why per say..? I've only ever contributed to branches. Never flagged and had deleted..?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-706051042


   I was blind you explained why inactive timeout is not working in the description. 
   Now my comment is, is there any other approach to make the timeout working to cover 3. (no Content-Length header case)?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on pull request #7237: Fix bad HTTP/2 post client causing stuck HttpSM

Posted by GitBox <gi...@apache.org>.
masaori335 commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-708161574


   > Seemed to occur on POST with very large post bodies (2MB) with a content-length specified, but the bad client sent another data frame after the content-length number of bytes was sent.
   
   Seems you found another bug. Definitely, this is a malformed request. ([RFC 7540 Section 8.1.2.6](https://tools.ietf.org/html/rfc7540#section-8.1.2.6) ) 
   We have to check the size before writing data to the buffer in `rcv_data_frame()`. Currently, the check is done only if the DATA frame has the END_STREAM flag.
   
   https://github.com/apache/trafficserver/blob/bb847ee03c274b7aed2993423c4fe3957cb11c2a/proxy/http2/Http2ConnectionState.cc#L142
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org