You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alan Conway (JIRA)" <qp...@incubator.apache.org> on 2010/01/15 22:44:54 UTC

[jira] Created: (QPID-2341) Annotate replicated broker classes with assertions.

Annotate replicated broker classes with assertions.
---------------------------------------------------

                 Key: QPID-2341
                 URL: https://issues.apache.org/jira/browse/QPID-2341
             Project: Qpid
          Issue Type: Improvement
          Components: C++ Broker
    Affects Versions: 0.6
            Reporter: Alan Conway
            Assignee: Alan Conway


A clustered broker maintains consistency of replicated objects by only modifying them in a "replication safe" thread context: while receiving an update or dispatching cluster events.

A repeated source of cluster bugs is broker code that unwittingly modifies replicated objects in an unsafe context such as a timer thread. These bugs are intermittent race conditions that are hard to track down.

Proposal: annotate broker code with assertions to identify code that modifies replicated state and log/abort if such code is called in an unsafe context:

// New class:
namespace broker {
class Replicated {
 protected: 
   void assertReplicationSafe();
}

// Existing classes
class Queue : public Replicated { // Mark Queue as state that may be replicated.
  void someQueueModifier() {
    assertReplicationSafe(); // This function should only be called in replication-safe context.
 }

The assertion is cheap: just testing a thread-local boolean value. In a non-clustered broker it does nothing.

This technique has already proven valuable in debugging a recent bug, putting the assertions permanently in the code should speed debugging of future bugs.

This would be the beginning of a formal contract between the broker code and the cluster that should make things more maintainable in the long run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] Resolved: (QPID-2341) Annotate replicated broker classes with assertions.

Posted by "Alan Conway (JIRA)" <qp...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Conway resolved QPID-2341.
-------------------------------

    Resolution: Fixed

Added in r901282

> Annotate replicated broker classes with assertions.
> ---------------------------------------------------
>
>                 Key: QPID-2341
>                 URL: https://issues.apache.org/jira/browse/QPID-2341
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.6
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>         Attachments: cluster_safe.patch
>
>
> A clustered broker maintains consistency of replicated objects by only modifying them in a "replication safe" thread context: while receiving an update or dispatching cluster events.
> A repeated source of cluster bugs is broker code that unwittingly modifies replicated objects in an unsafe context such as a timer thread. These bugs are intermittent race conditions that are hard to track down.
> Proposal: annotate broker code with assertions to identify code that modifies replicated state and log/abort if such code is called in an unsafe context:
> // New class:
> namespace broker {
> class Replicated {
>  protected: 
>    void assertReplicationSafe();
> }
> // Existing classes
> class Queue : public Replicated { // Mark Queue as state that may be replicated.
>   void someQueueModifier() {
>     assertReplicationSafe(); // This function should only be called in replication-safe context.
>  }
> The assertion is cheap: just testing a thread-local boolean value. In a non-clustered broker it does nothing.
> This technique has already proven valuable in debugging a recent bug, putting the assertions permanently in the code should speed debugging of future bugs.
> This would be the beginning of a formal contract between the broker code and the cluster that should make things more maintainable in the long run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] Commented: (QPID-2341) Annotate replicated broker classes with assertions.

Posted by "Andrew Stitcher (JIRA)" <qp...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801789#action_12801789 ] 

Andrew Stitcher commented on QPID-2341:
---------------------------------------

If the proposal is to actively mark only state that _is_ replication safe then I think this is a great overall plan (I think having to do the opposite - check/mark non safe state - would be a disaster causing as many problems as it would solve).

I would be a little concerned though if this was a really intrusive/widespread change though.

On a subsiduary note - how is someone to know if a piece of state they just added is replication safe?

> Annotate replicated broker classes with assertions.
> ---------------------------------------------------
>
>                 Key: QPID-2341
>                 URL: https://issues.apache.org/jira/browse/QPID-2341
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.6
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>
> A clustered broker maintains consistency of replicated objects by only modifying them in a "replication safe" thread context: while receiving an update or dispatching cluster events.
> A repeated source of cluster bugs is broker code that unwittingly modifies replicated objects in an unsafe context such as a timer thread. These bugs are intermittent race conditions that are hard to track down.
> Proposal: annotate broker code with assertions to identify code that modifies replicated state and log/abort if such code is called in an unsafe context:
> // New class:
> namespace broker {
> class Replicated {
>  protected: 
>    void assertReplicationSafe();
> }
> // Existing classes
> class Queue : public Replicated { // Mark Queue as state that may be replicated.
>   void someQueueModifier() {
>     assertReplicationSafe(); // This function should only be called in replication-safe context.
>  }
> The assertion is cheap: just testing a thread-local boolean value. In a non-clustered broker it does nothing.
> This technique has already proven valuable in debugging a recent bug, putting the assertions permanently in the code should speed debugging of future bugs.
> This would be the beginning of a formal contract between the broker code and the cluster that should make things more maintainable in the long run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] Commented: (QPID-2341) Annotate replicated broker classes with assertions.

Posted by "Alan Conway (JIRA)" <qp...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801819#action_12801819 ] 

Alan Conway commented on QPID-2341:
-----------------------------------

The term "Replication" is also used for queue replication which is separate from cluster replication, so probably better to avoid the word and instead use:

// New class:
namespace broker {
class ClusteredState {
 protected:
   void assertClusterSafe();
} 

> Annotate replicated broker classes with assertions.
> ---------------------------------------------------
>
>                 Key: QPID-2341
>                 URL: https://issues.apache.org/jira/browse/QPID-2341
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.6
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>
> A clustered broker maintains consistency of replicated objects by only modifying them in a "replication safe" thread context: while receiving an update or dispatching cluster events.
> A repeated source of cluster bugs is broker code that unwittingly modifies replicated objects in an unsafe context such as a timer thread. These bugs are intermittent race conditions that are hard to track down.
> Proposal: annotate broker code with assertions to identify code that modifies replicated state and log/abort if such code is called in an unsafe context:
> // New class:
> namespace broker {
> class Replicated {
>  protected: 
>    void assertReplicationSafe();
> }
> // Existing classes
> class Queue : public Replicated { // Mark Queue as state that may be replicated.
>   void someQueueModifier() {
>     assertReplicationSafe(); // This function should only be called in replication-safe context.
>  }
> The assertion is cheap: just testing a thread-local boolean value. In a non-clustered broker it does nothing.
> This technique has already proven valuable in debugging a recent bug, putting the assertions permanently in the code should speed debugging of future bugs.
> This would be the beginning of a formal contract between the broker code and the cluster that should make things more maintainable in the long run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] Updated: (QPID-2341) Annotate replicated broker classes with assertions.

Posted by "Alan Conway (JIRA)" <qp...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Conway updated QPID-2341:
------------------------------

    Attachment: cluster_safe.patch

I agree. It's also not obvious where or how widely they should be sprinkled.

Attached patch is slightly simplified (no base class, just assertions) and adds assertions just to Queue and SemanticState which are the main areas we care about. 

> Annotate replicated broker classes with assertions.
> ---------------------------------------------------
>
>                 Key: QPID-2341
>                 URL: https://issues.apache.org/jira/browse/QPID-2341
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.6
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>         Attachments: cluster_safe.patch
>
>
> A clustered broker maintains consistency of replicated objects by only modifying them in a "replication safe" thread context: while receiving an update or dispatching cluster events.
> A repeated source of cluster bugs is broker code that unwittingly modifies replicated objects in an unsafe context such as a timer thread. These bugs are intermittent race conditions that are hard to track down.
> Proposal: annotate broker code with assertions to identify code that modifies replicated state and log/abort if such code is called in an unsafe context:
> // New class:
> namespace broker {
> class Replicated {
>  protected: 
>    void assertReplicationSafe();
> }
> // Existing classes
> class Queue : public Replicated { // Mark Queue as state that may be replicated.
>   void someQueueModifier() {
>     assertReplicationSafe(); // This function should only be called in replication-safe context.
>  }
> The assertion is cheap: just testing a thread-local boolean value. In a non-clustered broker it does nothing.
> This technique has already proven valuable in debugging a recent bug, putting the assertions permanently in the code should speed debugging of future bugs.
> This would be the beginning of a formal contract between the broker code and the cluster that should make things more maintainable in the long run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] Commented: (QPID-2341) Annotate replicated broker classes with assertions.

Posted by "Gordon Sim (JIRA)" <qp...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801892#action_12801892 ] 

Gordon Sim commented on QPID-2341:
----------------------------------

I don't have a better idea (at least not a concrete one!) and I concede that helping to locate these sorts of bugs would be valuable. It does grate somewhat though to have cluster specific assertions sprinkled throughout the code. It certainly wouldn't be feasible to require this sort of 'contract' for every 'plugin'.

> Annotate replicated broker classes with assertions.
> ---------------------------------------------------
>
>                 Key: QPID-2341
>                 URL: https://issues.apache.org/jira/browse/QPID-2341
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.6
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>
> A clustered broker maintains consistency of replicated objects by only modifying them in a "replication safe" thread context: while receiving an update or dispatching cluster events.
> A repeated source of cluster bugs is broker code that unwittingly modifies replicated objects in an unsafe context such as a timer thread. These bugs are intermittent race conditions that are hard to track down.
> Proposal: annotate broker code with assertions to identify code that modifies replicated state and log/abort if such code is called in an unsafe context:
> // New class:
> namespace broker {
> class Replicated {
>  protected: 
>    void assertReplicationSafe();
> }
> // Existing classes
> class Queue : public Replicated { // Mark Queue as state that may be replicated.
>   void someQueueModifier() {
>     assertReplicationSafe(); // This function should only be called in replication-safe context.
>  }
> The assertion is cheap: just testing a thread-local boolean value. In a non-clustered broker it does nothing.
> This technique has already proven valuable in debugging a recent bug, putting the assertions permanently in the code should speed debugging of future bugs.
> This would be the beginning of a formal contract between the broker code and the cluster that should make things more maintainable in the long run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] Commented: (QPID-2341) Annotate replicated broker classes with assertions.

Posted by "Alan Conway (JIRA)" <qp...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801815#action_12801815 ] 

Alan Conway commented on QPID-2341:
-----------------------------------

We would only be marking state that is actually replicated by the cluster. 

The changes would be to add a base class, and add a 1-line function call to some member functions. In principle we only need to mark the "innermost" modifiers, e.g. 
Foo { 
 f() { if (...) g() ... } // modifies state by calling g()
 g() { assertReplicationSafe(); someMemberVar =x } // modifies state directly
}

I haven't done the analysis to determine just how many assertions that would be.

> On a subsiduary note - how is someone to know if a piece of state they just added is replication safe? 

I'm hoping this proposal will make that easier: The presence of Replicated base class and assertReplicationSafe() in code act as notices that your working in an area that is replicated by the cluster.

> Annotate replicated broker classes with assertions.
> ---------------------------------------------------
>
>                 Key: QPID-2341
>                 URL: https://issues.apache.org/jira/browse/QPID-2341
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.6
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>
> A clustered broker maintains consistency of replicated objects by only modifying them in a "replication safe" thread context: while receiving an update or dispatching cluster events.
> A repeated source of cluster bugs is broker code that unwittingly modifies replicated objects in an unsafe context such as a timer thread. These bugs are intermittent race conditions that are hard to track down.
> Proposal: annotate broker code with assertions to identify code that modifies replicated state and log/abort if such code is called in an unsafe context:
> // New class:
> namespace broker {
> class Replicated {
>  protected: 
>    void assertReplicationSafe();
> }
> // Existing classes
> class Queue : public Replicated { // Mark Queue as state that may be replicated.
>   void someQueueModifier() {
>     assertReplicationSafe(); // This function should only be called in replication-safe context.
>  }
> The assertion is cheap: just testing a thread-local boolean value. In a non-clustered broker it does nothing.
> This technique has already proven valuable in debugging a recent bug, putting the assertions permanently in the code should speed debugging of future bugs.
> This would be the beginning of a formal contract between the broker code and the cluster that should make things more maintainable in the long run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org