You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@trafficserver.apache.org by Susan Hinrichs <sh...@verizonmedia.com> on 2021/01/12 14:58:45 UTC

Re: [E] Some disconnections from origin

At a quick glance, your settings seem reasonable.  Judging from the debug
output you shared, I'd suggest looking at your origin.  Based on the debug
output, ATS successfully makes a connection to the origin and thinks it
sends the request (doesn't necessarily means the origin has processed the
request just that the request has been passed to the networking stack), and
then the origin closes the connection (sends an EOS/FIN/RST).  Is your
connection to origin TLS or just plain TCP?   If it is just plain TCP, the
OS will deal with the handshake for the most part even if your origin app
is overloaded.

Looking at the timestamps, it looks like the origin responds with an EOS
very quickly, so it doesn't seem to be a timeout issue on the origin.  A
packet capture between ATS and the origin, may verify that the origin is
indeed initiating the connection close.

Susan

On Tue, Jan 12, 2021 at 3:20 AM Łukasz Nowak <lu...@nowak.io> wrote:

> Hello,
>
> I am using TrafficServer which has only one origin - haproxy - and which
> serves as a cache for many sites.
>
> Details are:
>
> Version of Traffic Server used: 7.1.12 (also applies to 8.1.0)
> Platform: Linux 64 bit, gcc 8.3.0
> Any relevant configuration changes you've made from the default
> configurations (particularly for records.config), part of `traffic_ctl
> diff`:
> proxy.config.http.cache.open_write_fail_action has changed
>         Current Value   : 2
>         Default Value   : 0
> proxy.config.http.negative_revalidating_enabled has changed
>         Current Value   : 1
>         Default Value   : 0
> proxy.config.http.negative_revalidating_lifetime has changed
>         Current Value   : 86400
>         Default Value   : 1800
> proxy.config.http.insert_client_ip has changed
>         Current Value   : 0
>         Default Value   : 1
> proxy.config.http.insert_squid_x_forwarded_for has changed
>         Current Value   : 0
>         Default Value   : 1
> proxy.config.http.transaction_no_activity_timeout_in has changed
>         Current Value   : 600
>         Default Value   : 30
> proxy.config.http.transaction_no_activity_timeout_out has changed
>         Current Value   : 600
>         Default Value   : 30
> proxy.config.http.connect_attempts_max_retries has changed
>         Current Value   : 0
>         Default Value   : 3
> proxy.config.http.connect_attempts_max_retries_dead_server has changed
>         Current Value   : 0
>         Default Value   : 1
> proxy.config.http.connect_attempts_timeout has changed
>         Current Value   : 600
>         Default Value   : 30
> proxy.config.http.post_connect_attempts_timeout has changed
>         Current Value   : 600
>         Default Value   : 1800
> proxy.config.http.normalize_ae_gzip has changed
>         Current Value   : 0
>         Default Value   : 1
> proxy.config.cache.ram_cache.size has changed
>         Current Value   : 1073741824
>         Default Value   : -1
> proxy.config.url_remap.pristine_host_hdr has changed
>         Current Value   : 1
>         Default Value   : 0
>
> I have trafficserver connecting to origin (haproxy) on the same host with:
>
> cat etc/trafficserver/remap.config
> map /HTTPS/ http://10.0.251.170:21443
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.251.170-3A21443&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GAqM_xZpxNbVqsR-aGvQBjOG3d33Y2-i4ynL-JkEouY&m=O9p5L7xVpRLZu-YiV7S6QVltam7nIV5WehZUe2_uY1E&s=N294azxfZ6U1ohRzum7udfcLHzoFA1j4RrXo5EXahE8&e=>
> map / http://10.0.251.170:41080
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.251.170-3A41080&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GAqM_xZpxNbVqsR-aGvQBjOG3d33Y2-i4ynL-JkEouY&m=O9p5L7xVpRLZu-YiV7S6QVltam7nIV5WehZUe2_uY1E&s=ZfNrMG28f4Yd1NbMDDtb4nOEkIE2i_6DC3RpCPT1peA&e=>
>
> There is constant stream of requests going to trafficserver. About 0.5% of
> them fails with returning 502 to the client, with entry in
> var/log/trafficserver/error.log:
>
> 20210107.13h02m15s CONNECT: could not connect to 10.0.251.170 for '
> http://10.0.251.170:41080/path/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.251.170-3A41080_path_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GAqM_xZpxNbVqsR-aGvQBjOG3d33Y2-i4ynL-JkEouY&m=O9p5L7xVpRLZu-YiV7S6QVltam7nIV5WehZUe2_uY1E&s=up8Wan-gzFmzAX2wLp0TFAjefG9VJXVvYUDyp67CRK4&e=>'
> (setting last failure time)
> 20210107.13h02m15s RESPONSE: sent 10.0.251.170 status 502 (Server Hangup)
> for 'http://10.0.251.170:41080/path/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.251.170-3A41080_path_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GAqM_xZpxNbVqsR-aGvQBjOG3d33Y2-i4ynL-JkEouY&m=O9p5L7xVpRLZu-YiV7S6QVltam7nIV5WehZUe2_uY1E&s=up8Wan-gzFmzAX2wLp0TFAjefG9VJXVvYUDyp67CRK4&e=>
> '
>
> also seen in var/log/trafficserver/squid.log:
>
> 1610020935.756 43 10.0.251.170 TCP_REFRESH_FAIL_HIT/502 498 GET
> http://10.0.251.170:41080/path/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.251.170-3A41080_path_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GAqM_xZpxNbVqsR-aGvQBjOG3d33Y2-i4ynL-JkEouY&m=O9p5L7xVpRLZu-YiV7S6QVltam7nIV5WehZUe2_uY1E&s=up8Wan-gzFmzAX2wLp0TFAjefG9VJXVvYUDyp67CRK4&e=>
> - DIRECT/10.0.251.170
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.251.170&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GAqM_xZpxNbVqsR-aGvQBjOG3d33Y2-i4ynL-JkEouY&m=O9p5L7xVpRLZu-YiV7S6QVltam7nIV5WehZUe2_uY1E&s=Y-2vY9jkbpDRXhd711-j4bjTQMlsuyrQsd3FlKoZy8o&e=>
> text/html
>
> Thanks to enabling diagnostics with:
>
>
> CONFIG proxy.config.diags.debug.enabled INT 1
> CONFIG proxy.config.diags.debug.tags STRING http.*
>
> I saw in var/log/trafficserver/traffic.out:
>
> [Jan  7 13:02:15.757] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:2666
> (main_handler)> (http) [5687] [HttpSM::main_handler,
> VC_EVENT_WRITE_COMPLETE]
> [Jan  7 13:02:15.757] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:1994
> (state_send_server_request_header)> (http) [5687]
> [&HttpSM::state_send_server_request_header, VC_
> EVENT_WRITE_COMPLETE]
> [Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:2666
> (main_handler)> (http) [5687] [HttpSM::main_handler, VC_EVENT_EOS]
> [Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:1836
> (state_read_server_response_header)> (http) [5687]
> [&HttpSM::state_read_server_response_header, V
> C_EVENT_EOS]
> [Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:1923
> (state_read_server_response_header)> (http_seq) Error parsing server
> response header
> [Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:5513
> (handle_server_setup_error)> (http) [5687]
> [&HttpSM::handle_server_setup_error, VC_EVENT_EOS]
> [Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpTransact.cc:3394
> (HandleResponse)> (http_trans) [5687] [HttpTransact::HandleResponse]
> [Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpTransact.cc:3395
> (HandleResponse)> (http_seq) [5687] [HttpTransact::HandleResponse] Response
> received
> [Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpTransact.cc:8497
> (ink_cluster_time)> (http_trans) [ink_cluster_time] local: 1610020935,
> highest_delta: 0, cl
> uster: 1610020935
> [Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpTransact.cc:3402
> (HandleResponse)> (http_trans) [5687] [HandleResponse]
> response_received_time: 1610020935
> +++++++++ Incoming O.S. Response +++++++++
> -- State Machine Id: 5687
> HTTP/1.0 0
>
> I clearly see that setting
>
> proxy.config.http.send_http11_requests INT 0
>
> Drops the amount of problems to almost 0 (but they still appear).
>
> As a workaround I used (with keeping HTTP/1.1):
>
> CONFIG proxy.config.http.connect_attempts_max_retries INT 3
> CONFIG proxy.config.http.connect_attempts_max_retries_dead_server INT 1
>
> Then my client is never served with 502 in such case, and with
> TrafficServer 7 there is nothing in the var/log/trafficserver/error.log,
> while using TrafficServer 8 there is:
>
> 20210111.14h30m34s CONNECT:[0] could not connect [CONNECTION_CLOSED] to
> 10.0.251.170 for 'http://10.0.251.170:41080/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.251.170-3A41080_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GAqM_xZpxNbVqsR-aGvQBjOG3d33Y2-i4ynL-JkEouY&m=O9p5L7xVpRLZu-YiV7S6QVltam7nIV5WehZUe2_uY1E&s=0QSad7rE-24IoT7C0nhtuw2f8IX3yC7k6f3OQpSrb98&e=>
> '
>
> And well, TrafficServer just reconnects to the origin.
>
> My questions are:
>
>  * Is it a possible bug in TrafficServer (somewhat similar to
> https://issues.apache.org/jira/browse/TS-3959
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_TS-2D3959&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GAqM_xZpxNbVqsR-aGvQBjOG3d33Y2-i4ynL-JkEouY&m=O9p5L7xVpRLZu-YiV7S6QVltam7nIV5WehZUe2_uY1E&s=gh1bBZR5Wub-0eAqVQdOQ9xe-HWS5M2PkY29U0UKXsM&e=>
> )?
>  * Is it misconfiguration of TrafficServer?
>  * Is there all ok with TrafficServer and there is really a problem in my
> origin software?
>
> Thanks for the tips,
> Regards,
> Łukasz
>

Re: [E] Some disconnections from origin

Posted by Łukasz Nowak <lu...@nowak.io>.
Hello,

śr., 13 sty 2021 o 11:03 Łukasz Nowak <lu...@nowak.io> napisał(a):

>
>
>> Looking at the timestamps, it looks like the origin responds with an EOS
>> very quickly, so it doesn't seem to be a timeout issue on the origin.  A
>> packet capture between ATS and the origin, may verify that the origin is
>> indeed initiating the connection close.
>>
>
> Ok, that's really nice tip, I'll review the situation with captures I have.
>

Indeed, it's haproxy which sends TCP FIN after correct HTTP GET from
TrafficServer (see attached wireshark session).

So on trafficserver side all seems OK, I am debugging the other side.

Regards,
Łukasz

>

Re: [E] Some disconnections from origin

Posted by Łukasz Nowak <lu...@nowak.io>.
Hello,

wt., 12 sty 2021 o 16:07 Susan Hinrichs <sh...@verizonmedia.com>
napisał(a):

> At a quick glance, your settings seem reasonable.  Judging from the debug
> output you shared, I'd suggest looking at your origin.  Based on the debug
> output, ATS successfully makes a connection to the origin and thinks it
> sends the request (doesn't necessarily means the origin has processed the
> request just that the request has been passed to the networking stack), and
> then the origin closes the connection (sends an EOS/FIN/RST).  Is your
> connection to origin TLS or just plain TCP?   If it is just plain TCP, the
> OS will deal with the handshake for the most part even if your origin app
> is overloaded.
>

Thanks for confirming sanity of my configuration.

My connections to origin are plain TCP. I'll check your hint regarding
overloaded origin, or something with similar symptoms.


> Looking at the timestamps, it looks like the origin responds with an EOS
> very quickly, so it doesn't seem to be a timeout issue on the origin.  A
> packet capture between ATS and the origin, may verify that the origin is
> indeed initiating the connection close.
>

Ok, that's really nice tip, I'll review the situation with captures I have.

Regards,
Łukasz

>