You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "William Bardwell (JIRA)" <ji...@apache.org> on 2011/04/25 19:07:03 UTC

[jira] [Created] (TS-750) TS does not fail-over if one origin server for a 2 address hostname goes down

TS does not fail-over if one origin server for a 2 address hostname goes down
-----------------------------------------------------------------------------

                 Key: TS-750
                 URL: https://issues.apache.org/jira/browse/TS-750
             Project: Traffic Server
          Issue Type: Bug
          Components: HTTP
    Affects Versions: 2.1.4, 2.1.5, 2.1.6, 2.1.7
         Environment: Any
            Reporter: William Bardwell


If you have a hostname that looks up to 2 addresses, and you make a request to TS for something at that hostname, and then kill the origin server at which ever address TS just talked to, your next request (if done promptly) will fail with a 502 status code.  A request made after that will fail-over correctly.

Tracing the code I see it doing proxy.config.http.connect_attempts_max_retries retries to the same address, and it does call code to mark the address down after proxy.config.http.connect_attempts_rr_retries attempts, the address does not get marked down.
(The code calls HttpTransact::delete_server_rr_entry() which does TRANSACT_RETURN(OS_RR_MARK_DOWN, ReDNSRoundRobin) which in turns tries to set up the marking with HTTP_SM_SET_DEFAULT_HANDLER(&HttpSM::state_mark_os_down), but state_mark_os_down never actually happens, instead it just goes into the retry, I think based on ReDNSRoundRobin doing s->next_action = how_to_open_connection(s).)

I have a fix, although it doesn't seem like quite the right way to go about things, but I can't figure out how to get state_mark_os_down
to get called at the right time.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-750) TS does not fail-over if one origin server for a 2 address hostname goes down

Posted by "William Bardwell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Bardwell updated TS-750:
--------------------------------

    Attachment: svn.diff2

This fixes it, but there might be a better way that just gets state_os_mark_down() to be called properly.

> TS does not fail-over if one origin server for a 2 address hostname goes down
> -----------------------------------------------------------------------------
>
>                 Key: TS-750
>                 URL: https://issues.apache.org/jira/browse/TS-750
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: HTTP
>    Affects Versions: 2.1.7, 2.1.6, 2.1.5, 2.1.4
>         Environment: Any
>            Reporter: William Bardwell
>         Attachments: svn.diff2
>
>
> If you have a hostname that looks up to 2 addresses, and you make a request to TS for something at that hostname, and then kill the origin server at which ever address TS just talked to, your next request (if done promptly) will fail with a 502 status code.  A request made after that will fail-over correctly.
> Tracing the code I see it doing proxy.config.http.connect_attempts_max_retries retries to the same address, and it does call code to mark the address down after proxy.config.http.connect_attempts_rr_retries attempts, the address does not get marked down.
> (The code calls HttpTransact::delete_server_rr_entry() which does TRANSACT_RETURN(OS_RR_MARK_DOWN, ReDNSRoundRobin) which in turns tries to set up the marking with HTTP_SM_SET_DEFAULT_HANDLER(&HttpSM::state_mark_os_down), but state_mark_os_down never actually happens, instead it just goes into the retry, I think based on ReDNSRoundRobin doing s->next_action = how_to_open_connection(s).)
> I have a fix, although it doesn't seem like quite the right way to go about things, but I can't figure out how to get state_mark_os_down
> to get called at the right time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (TS-750) TS does not fail-over if one origin server for a 2 address hostname goes down

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom resolved TS-750.
------------------------------

    Resolution: Fixed

> TS does not fail-over if one origin server for a 2 address hostname goes down
> -----------------------------------------------------------------------------
>
>                 Key: TS-750
>                 URL: https://issues.apache.org/jira/browse/TS-750
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: HTTP
>    Affects Versions: 2.1.7, 2.1.6, 2.1.5, 2.1.4
>         Environment: Any
>            Reporter: William Bardwell
>            Assignee: Leif Hedstrom
>             Fix For: 2.1.8
>
>         Attachments: svn.diff2
>
>
> If you have a hostname that looks up to 2 addresses, and you make a request to TS for something at that hostname, and then kill the origin server at which ever address TS just talked to, your next request (if done promptly) will fail with a 502 status code.  A request made after that will fail-over correctly.
> Tracing the code I see it doing proxy.config.http.connect_attempts_max_retries retries to the same address, and it does call code to mark the address down after proxy.config.http.connect_attempts_rr_retries attempts, the address does not get marked down.
> (The code calls HttpTransact::delete_server_rr_entry() which does TRANSACT_RETURN(OS_RR_MARK_DOWN, ReDNSRoundRobin) which in turns tries to set up the marking with HTTP_SM_SET_DEFAULT_HANDLER(&HttpSM::state_mark_os_down), but state_mark_os_down never actually happens, instead it just goes into the retry, I think based on ReDNSRoundRobin doing s->next_action = how_to_open_connection(s).)
> I have a fix, although it doesn't seem like quite the right way to go about things, but I can't figure out how to get state_mark_os_down
> to get called at the right time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (TS-750) TS does not fail-over if one origin server for a 2 address hostname goes down

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom reassigned TS-750:
--------------------------------

    Assignee: Leif Hedstrom

> TS does not fail-over if one origin server for a 2 address hostname goes down
> -----------------------------------------------------------------------------
>
>                 Key: TS-750
>                 URL: https://issues.apache.org/jira/browse/TS-750
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: HTTP
>    Affects Versions: 2.1.7, 2.1.6, 2.1.5, 2.1.4
>         Environment: Any
>            Reporter: William Bardwell
>            Assignee: Leif Hedstrom
>             Fix For: 2.1.8
>
>         Attachments: svn.diff2
>
>
> If you have a hostname that looks up to 2 addresses, and you make a request to TS for something at that hostname, and then kill the origin server at which ever address TS just talked to, your next request (if done promptly) will fail with a 502 status code.  A request made after that will fail-over correctly.
> Tracing the code I see it doing proxy.config.http.connect_attempts_max_retries retries to the same address, and it does call code to mark the address down after proxy.config.http.connect_attempts_rr_retries attempts, the address does not get marked down.
> (The code calls HttpTransact::delete_server_rr_entry() which does TRANSACT_RETURN(OS_RR_MARK_DOWN, ReDNSRoundRobin) which in turns tries to set up the marking with HTTP_SM_SET_DEFAULT_HANDLER(&HttpSM::state_mark_os_down), but state_mark_os_down never actually happens, instead it just goes into the retry, I think based on ReDNSRoundRobin doing s->next_action = how_to_open_connection(s).)
> I have a fix, although it doesn't seem like quite the right way to go about things, but I can't figure out how to get state_mark_os_down
> to get called at the right time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-750) TS does not fail-over if one origin server for a 2 address hostname goes down

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-750:
-----------------------------

    Fix Version/s: 2.1.8

> TS does not fail-over if one origin server for a 2 address hostname goes down
> -----------------------------------------------------------------------------
>
>                 Key: TS-750
>                 URL: https://issues.apache.org/jira/browse/TS-750
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: HTTP
>    Affects Versions: 2.1.7, 2.1.6, 2.1.5, 2.1.4
>         Environment: Any
>            Reporter: William Bardwell
>             Fix For: 2.1.8
>
>         Attachments: svn.diff2
>
>
> If you have a hostname that looks up to 2 addresses, and you make a request to TS for something at that hostname, and then kill the origin server at which ever address TS just talked to, your next request (if done promptly) will fail with a 502 status code.  A request made after that will fail-over correctly.
> Tracing the code I see it doing proxy.config.http.connect_attempts_max_retries retries to the same address, and it does call code to mark the address down after proxy.config.http.connect_attempts_rr_retries attempts, the address does not get marked down.
> (The code calls HttpTransact::delete_server_rr_entry() which does TRANSACT_RETURN(OS_RR_MARK_DOWN, ReDNSRoundRobin) which in turns tries to set up the marking with HTTP_SM_SET_DEFAULT_HANDLER(&HttpSM::state_mark_os_down), but state_mark_os_down never actually happens, instead it just goes into the retry, I think based on ReDNSRoundRobin doing s->next_action = how_to_open_connection(s).)
> I have a fix, although it doesn't seem like quite the right way to go about things, but I can't figure out how to get state_mark_os_down
> to get called at the right time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira