You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/11/07 17:16:59 UTC

[jira] [Work logged] (TS-4838) After TS-3612 restructuring, very slow SSL sessions and HttpSM::state_raw_http_server_open errors

     [ https://issues.apache.org/jira/browse/TS-4838?focusedWorklogId=31730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-31730 ]

ASF GitHub Bot logged work on TS-4838:
--------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Nov/16 17:16
            Start Date: 07/Nov/16 17:16
    Worklog Time Spent: 10m 
      Work Description: GitHub user PSUdaemon opened a pull request:

    https://github.com/apache/trafficserver/pull/1206

    TS-4838: CONNECT requests get forgotten across threads.

    What happens here is that ProxyClientTransaction::adjust_thread
    reschedules the transaction onto a new thread at the start of
    HttpSM::do_http_server_open.
    
    Unfortunately, at this point the default handler is
    HttpSM::state_raw_http_server_open. When the transaction is
    rescheduled, the default handler runs, and receives the EVENT_INTERVAL
    that it so fortuitously logs an error for. We have never actually
    completed do_http_server_open, so we never make any more progress
    on this transaction.
    
    (cherry picked from commit 8fddd77c085d1a64f11de61bb42a50562cd23229)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/PSUdaemon/trafficserver bp-1002

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/trafficserver/pull/1206.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1206
    
----
commit c5ab2e686ac0dad4ebe89573cdcc0b2d2a6359a4
Author: James Peach <jp...@apache.org>
Date:   2016-09-09T22:29:05Z

    TS-4838: CONNECT requests get forgotten across threads.
    
    What happens here is that ProxyClientTransaction::adjust_thread
    reschedules the transaction onto a new thread at the start of
    HttpSM::do_http_server_open.
    
    Unfortunately, at this point the default handler is
    HttpSM::state_raw_http_server_open. When the transaction is
    rescheduled, the default handler runs, and receives the EVENT_INTERVAL
    that it so fortuitously logs an error for. We have never actually
    completed do_http_server_open, so we never make any more progress
    on this transaction.
    
    (cherry picked from commit 8fddd77c085d1a64f11de61bb42a50562cd23229)

----


Issue Time Tracking
-------------------

    Worklog Id:     (was: 31730)
    Time Spent: 1h  (was: 50m)

> After TS-3612 restructuring, very slow SSL sessions and HttpSM::state_raw_http_server_open errors
> -------------------------------------------------------------------------------------------------
>
>                 Key: TS-4838
>                 URL: https://issues.apache.org/jira/browse/TS-4838
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core, SSL
>    Affects Versions: 6.2.0, 7.0.0
>         Environment: CentOS/RHEL 7.2, x86_64
>            Reporter: Dimitry Andric
>            Assignee: James Peach
>             Fix For: 7.0.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> We have been using TrafficServer 5.3.2 for quite some time now, for forward proxying of a number of different HTML5 applications, one of the most important ones being YouTube's TV interface, e.g. https://youtube.com/tv.  This is all hosted on CentOS 7.2 x86_64 machines.
> We recently upgraded to 6.2.0, and then started having problems with the CONNECT requests for port 443 which are generated by the YouTube app.  It seems like these connections are "stalled" somehow, sometimes for >10 seconds.  Meanwhile, {{diags.log}} is getting spammed lots of the following:
> {noformat}
> [Sep  9 16:45:47.683] Server {0x2b3e50c0b700} ERROR: [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 server_entry: (nil)
> {noformat}
> Requests that seem to stall are most likely all of the CONNECT kind, e.g.:
> {noformat}
> 1473432382.474 30405 127.0.0.1 TCP_MISS/200 4916 CONNECT ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net -
> 1473432382.481 30411 127.0.0.1 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ - DIRECT/i9.ytimg.com -
> 1473432382.486 30417 127.0.0.1 TCP_MISS/200 5389 CONNECT pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com -
> 1473432390.451 42772 127.0.0.1 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/ - DIRECT/csi.gstatic.com -
> 1473432390.459 43833 127.0.0.1 TCP_MISS/200 11610 CONNECT www.youtube.com:443/ - DIRECT/www.youtube.com -
> 1473432390.483 38414 127.0.0.1 TCP_MISS/200 2870983 CONNECT r17---sn-5hnednl7.googlevideo.com:443/ - DIRECT/r17---sn-5hnednl7.googlevideo.com -
> {noformat}
> As part of figuring out how to diagnose this, I tried a downgrade to TrafficServer 6.1.1, and this made all the stalling and problems disappear.  Afterwards, I did a {{git bisect}} on master, from the branch point of 6.1 to the branch point of 6.2, and I ended up at [commit af76977|https://git-dual.apache.org/repos/asf?p=trafficserver.git;a=commit;h=af76977adb9f3c0296a232688bbcb5a1421a6768]:
> {quote}
> Author: Susan Hinrichs <sh...@draggingnagging.corp.ne1.yahoo.com>
> Date:   Wed Apr 13 19:57:39 2016 +0000
>     TS-3612: Restructure client session and transaction processing. This closes #570.
> {quote}
> Unfortunately, this is a quite big refactoring commit, so it is not possible to revert it individually to see whether it improves things.
> I read TS-3612 and #570, and I saw there were also a number of follow-up commits to fix various problems with it, but this particular problem of stalled SSL connections is still occurring with master as of today, 2016-09-09.
> I realize that this report is still missing reproduction details, since it is tricky to analyze what the YouTube app is doing, and simple {{curl https://}} tests appear to go fast, and don't seem to trigger any stalling.  But YouTube itself is pretty easy to try out, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)