You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Aidan Skinner (JIRA)" <qp...@incubator.apache.org> on 2008/10/23 14:34:44 UTC

[jira] Created: (QPID-1391) Reliability tests fail, broker is unable to process connections

Reliability tests fail, broker is unable to process connections
---------------------------------------------------------------

                 Key: QPID-1391
                 URL: https://issues.apache.org/jira/browse/QPID-1391
             Project: Qpid
          Issue Type: Bug
          Components: Java Broker
    Affects Versions: M4
            Reporter: Aidan Skinner
            Assignee: Aidan Skinner
             Fix For: M4


The reliability tests eventually cause the broker to lock up, it's still up but all threads a waiting on either a BDB lock or a senderLock. This is bad. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (QPID-1391) Reliability tests fail, broker is unable to process connections

Posted by "Aidan Skinner (JIRA)" <qp...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aidan Skinner reopened QPID-1391:
---------------------------------


*sigh*

> Reliability tests fail, broker is unable to process connections
> ---------------------------------------------------------------
>
>                 Key: QPID-1391
>                 URL: https://issues.apache.org/jira/browse/QPID-1391
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: M4
>            Reporter: Aidan Skinner
>            Assignee: Aidan Skinner
>             Fix For: M4
>
>
> The reliability tests eventually cause the broker to lock up, it's still up but all threads a waiting on either a BDB lock or a senderLock. This is bad. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (QPID-1391) Reliability tests fail, broker is unable to process connections

Posted by "Aidan Skinner (JIRA)" <qp...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aidan Skinner resolved QPID-1391.
---------------------------------

    Resolution: Fixed

This is a problem with the BDB message store I was using, I've filed (and fixed) it at https://jira.jboss.org/jira/browse/RHM-7

> Reliability tests fail, broker is unable to process connections
> ---------------------------------------------------------------
>
>                 Key: QPID-1391
>                 URL: https://issues.apache.org/jira/browse/QPID-1391
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: M4
>            Reporter: Aidan Skinner
>            Assignee: Aidan Skinner
>             Fix For: M4
>
>         Attachments: stack.txt
>
>
> The reliability tests eventually cause the broker to lock up, it's still up but all threads a waiting on either a BDB lock or a senderLock. This is bad. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (QPID-1391) Reliability tests fail, broker is unable to process connections

Posted by "Aidan Skinner (JIRA)" <qp...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aidan Skinner updated QPID-1391:
--------------------------------

    Attachment: stack.txt

Stack dump from the broker which is up but refusing to start new protocol sessions. 

> Reliability tests fail, broker is unable to process connections
> ---------------------------------------------------------------
>
>                 Key: QPID-1391
>                 URL: https://issues.apache.org/jira/browse/QPID-1391
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: M4
>            Reporter: Aidan Skinner
>            Assignee: Aidan Skinner
>             Fix For: M4
>
>         Attachments: stack.txt
>
>
> The reliability tests eventually cause the broker to lock up, it's still up but all threads a waiting on either a BDB lock or a senderLock. This is bad. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (QPID-1391) Reliability tests fail, broker is unable to process connections

Posted by "Aidan Skinner (JIRA)" <qp...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aidan Skinner resolved QPID-1391.
---------------------------------

    Resolution: Invalid

This appears to have been an environmental issue with the machine in question. 

> Reliability tests fail, broker is unable to process connections
> ---------------------------------------------------------------
>
>                 Key: QPID-1391
>                 URL: https://issues.apache.org/jira/browse/QPID-1391
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: M4
>            Reporter: Aidan Skinner
>            Assignee: Aidan Skinner
>             Fix For: M4
>
>
> The reliability tests eventually cause the broker to lock up, it's still up but all threads a waiting on either a BDB lock or a senderLock. This is bad. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (QPID-1391) Reliability tests fail, broker is unable to process connections

Posted by "Aidan Skinner (JIRA)" <qp...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648136#action_12648136 ] 

Aidan Skinner commented on QPID-1391:
-------------------------------------

There are a ton of Connection objects still open that aren't closed, even though the associated socket has gone away.

> Reliability tests fail, broker is unable to process connections
> ---------------------------------------------------------------
>
>                 Key: QPID-1391
>                 URL: https://issues.apache.org/jira/browse/QPID-1391
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: M4
>            Reporter: Aidan Skinner
>            Assignee: Aidan Skinner
>             Fix For: M4
>
>
> The reliability tests eventually cause the broker to lock up, it's still up but all threads a waiting on either a BDB lock or a senderLock. This is bad. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (QPID-1391) Reliability tests fail, broker is unable to process connections

Posted by "Martin Ritchie (JIRA)" <qp...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648614#action_12648614 ] 

Martin Ritchie commented on QPID-1391:
--------------------------------------

Having taken a look at the stack trace the problem seems to be comming from the BerkeleyDB MessageStore module.

Thead 'pool-1-thread-13' is blocked trying to add a new message to the channel Unacknowledged Map.

The thread is blocked waiting for a lock held by thread 'pool-1-thread-27'  which is acknowledging a message locking the map.

Thread 27 is currently waiting in the BDBStore code for the completion of the commit. Question is why is it taking so long, as in it is not completing at all. Hours have passed and the code is still sitting at wait();
When (if) this wait-ing thread returns then that will release the locks for all the currently waiting threads.

So points for further discussion:

1) [Slightly off Apache Qpid] BDBMessageStore L1804 synchronizes on 'this', IMO this is a poor design as you cannot tell if the BDB code is also going to lock on that object.

2) UnacknowledgeMessageMapImpl L:141 acknowledgeMessage: This is synchronizing around the whole acknowledge method per TranscationalContext. This seems unnecessary as we pass in the UMMI (this) to the method which then uses the visitors to safely access the map in the NonTransactionalContext and the LoclaTransactionalContext does not actually update the map so should not need to lock at all.


> Reliability tests fail, broker is unable to process connections
> ---------------------------------------------------------------
>
>                 Key: QPID-1391
>                 URL: https://issues.apache.org/jira/browse/QPID-1391
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: M4
>            Reporter: Aidan Skinner
>            Assignee: Aidan Skinner
>             Fix For: M4
>
>         Attachments: stack.txt
>
>
> The reliability tests eventually cause the broker to lock up, it's still up but all threads a waiting on either a BDB lock or a senderLock. This is bad. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.