You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by kshri23 <gi...@git.apache.org> on 2016/08/18 04:17:41 UTC

[GitHub] trafficserver pull request #872: TS-4735: Fix race condition in traffic_serv...

GitHub user kshri23 opened a pull request:

    https://github.com/apache/trafficserver/pull/872

    TS-4735: Fix race condition in traffic_server startup

    Set max_msgs_in_row to 1 temporarily during traffic_server startup to avoid a hitting race condition when we receive configurations continuously from traffic_manager while coming up. We will reset this limit back to 10,000 in reconfigure() once the initial synchronization is done.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kshri23/trafficserver fix_4735

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/trafficserver/pull/872.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #872
    
----
commit 0fd360bd3154a02a8c532de87f6b001adf14ed7f
Author: Shrihari Kalkar <ks...@hotmail.com>
Date:   2016-08-18T02:32:49Z

    Fix race condition in traffic_server startup
    
    Set max_msgs_in_row to 1 during traffic_server startup to avoid hitting
    race condition as seen in issue TS-4735

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by atsci <gi...@git.apache.org>.
Github user atsci commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    FreeBSD build *failed*! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/681/ for details.
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver pull request #872: TS-4735: Fix race condition in traffic_serv...

Posted by jpeach <gi...@git.apache.org>.
Github user jpeach closed the pull request at:

    https://github.com/apache/trafficserver/pull/872


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by kshri23 <gi...@git.apache.org>.
Github user kshri23 commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    James, 
    I believe that this patch doesn't address just a symptom, it addresses a fundamental flaw in the startup code. A race condition. As I mentioned in the bug description, this issue is not because of the time required for the initial message exchange, but it is because of TS-4646 where repeated and unnecessary messages are being sent with a frequency which is exactly the same as mgmt_read_timeout. Of course, by fixing TS-4646, we will not hit this, agreed. But this design of waiting for 10k messages before yielding is flawed. We cannot do that because traffic_cop expects a few things from traffic_server.
    
    The only reason I decided to address the issue this way is because of the precedent set by 'timeout' in the same class. Initially, timeout is set to '0'. And once the startup is complete, it is set back to the configured value.
    
    I am confused by your reference to TS-4646. As I mentioned there, TS-4646 should happen all the time and I think it does. It is easy to verify that by enabling debug logs. Its just that TS-4646 does not always result in TS-4735 in all cases. It is a race condition.
    
    However, on our VM's it happened all the time. We used this patch to solve the issue and it seems to be working for past few months. Please let me know if you have any concerns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by atsci <gi...@git.apache.org>.
Github user atsci commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    Linux build *successful*! See https://ci.trafficserver.apache.org/job/Github-Linux/577/ for details.
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by zwoop <gi...@git.apache.org>.
Github user zwoop commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    Curious, where do these messages come from? Is that related to  clustering? [approveci]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by atsci <gi...@git.apache.org>.
Github user atsci commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    FreeBSD build *successful*! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/673/ for details.
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by jpeach <gi...@git.apache.org>.
Github user jpeach commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    The 10k is an arbitrary number chosen in [TS-4161](https://issues.apache.org/jira/browse/TS-4161).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by zwoop <gi...@git.apache.org>.
Github user zwoop commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    This is failing on Linux because of clang-format. Please run "make clang-format" and push again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by atsci <gi...@git.apache.org>.
Github user atsci commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    Linux build *failed*! See https://ci.trafficserver.apache.org/job/Github-Linux/569/ for details.
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by kshri23 <gi...@git.apache.org>.
Github user kshri23 commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    These messages come from traffic_manager. It is due to https://issues.apache.org/jira/browse/TS-4646



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by atsci <gi...@git.apache.org>.
Github user atsci commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    Linux build *failed*! See https://ci.trafficserver.apache.org/job/Github-Linux/568/ for details.
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by jpeach <gi...@git.apache.org>.
Github user jpeach commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    [approve ci]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by jpeach <gi...@git.apache.org>.
Github user jpeach commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    As mentioned on TS-4646, this is not very reproducible. Although the patch here makes the symptom go away, I'm not comfortable that we really understand why the initial message exchange takes so long.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by jpeach <gi...@git.apache.org>.
Github user jpeach commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    @kshri23 I dig into the startup sequence a bit more and I'm now convinced that this is a reasonable approach. What do you think about just changing ``MAX_MSGS_IN_A_ROW`` to something sanely small (like 10)?
    
    @kshri23 You need to run ``make -j clang-format``.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by jpeach <gi...@git.apache.org>.
Github user jpeach commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    [approve ci]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] trafficserver issue #872: TS-4735: Fix race condition in traffic_server star...

Posted by kshri23 <gi...@git.apache.org>.
Github user kshri23 commented on the issue:

    https://github.com/apache/trafficserver/pull/872
  
    James, that is a good idea. I am not sure but I think it should be okay to do so. I will need to dig in  a bit though to understand why this value was fixed unless anyone recollects we chose 10k?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---