You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/09/14 11:31:34 UTC

[jira] Created: (JCR-2746) Deadlock caused by ObservationDispatcher

Deadlock caused by ObservationDispatcher 
-----------------------------------------

                 Key: JCR-2746
                 URL: https://issues.apache.org/jira/browse/JCR-2746
             Project: Jackrabbit Content Repository
          Issue Type: Bug
          Components: jackrabbit-core, observation
    Affects Versions: 2.1.1, 2.1.0, 2.0.0
            Reporter: Jukka Zitting
             Fix For: 2.2.0


The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).

To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2746) Deadlock triggered by ObservationDispatcher

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912917#action_12912917 ] 

Thomas Mueller commented on JCR-2746:
-------------------------------------

The deadlock in DefaultISMLocking is now tracked in JCR-2753.

Even if this is fixed, there is a problem with the ObservationDispatcher: the loop to wait for the observation queue to shrink may be endless, because locks are held by the current thread. It doesn't looks like there is an easy solution, except for only waiting once (slowing down the main thread) instead of waiting until the queue is shorter. In most cases, this will solve the original problem (events are generated faster than processed), but it can't result in an endless loop, even if the observation listener tries to read from or write to the repository. Another option is to never wait / sleep, only to log a warning.

> Deadlock triggered by ObservationDispatcher 
> --------------------------------------------
>
>                 Key: JCR-2746
>                 URL: https://issues.apache.org/jira/browse/JCR-2746
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core, observation
>    Affects Versions: 2.0.0, 2.1.0, 2.1.1
>            Reporter: Jukka Zitting
>            Assignee: Thomas Mueller
>             Fix For: 2.2.0
>
>
> The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).
> To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (JCR-2746) Sleep in possibly endless loop in ObservationDispatcher

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller resolved JCR-2746.
---------------------------------

    Resolution: Fixed

> Sleep in possibly endless loop in ObservationDispatcher
> -------------------------------------------------------
>
>                 Key: JCR-2746
>                 URL: https://issues.apache.org/jira/browse/JCR-2746
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core, observation
>    Affects Versions: 2.0.0, 2.1.0, 2.1.1
>            Reporter: Jukka Zitting
>            Assignee: Thomas Mueller
>             Fix For: 2.2.0
>
>         Attachments: TestObservationEndlessLoop.java
>
>
> The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).
> To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2746) Deadlock caused by ObservationDispatcher

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912864#action_12912864 ] 

Thomas Mueller commented on JCR-2746:
-------------------------------------

The bug in DefaultISMLocking was introduces with JCR-2089 (Use java.util.concurrent), revisions 995411 and 995412. I think the safe solution is to revert those to commits. Test case:

public void test() throws InterruptedException {
    final ISMLocking lock = new DefaultISMLocking();
    ReadLock r1 = lock.acquireReadLock(null);
    final InterruptedException[] ex = new InterruptedException[1];
    Thread thread = new Thread() {
        public void run() {
            try {
                lock.acquireWriteLock(null).release();
            } catch (InterruptedException e) {
                ex[0] = e;
            }
        }
    };
    thread.start();
    Thread.sleep(100);
    lock.acquireReadLock(null).release();
    r1.release();
    thread.join();
    if (ex[0] != null) {
        throw ex[0];
    }
}

This will result in a deadlock with the current DefaultISMLocking, but not with the old.


> Deadlock caused by ObservationDispatcher 
> -----------------------------------------
>
>                 Key: JCR-2746
>                 URL: https://issues.apache.org/jira/browse/JCR-2746
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core, observation
>    Affects Versions: 2.0.0, 2.1.0, 2.1.1
>            Reporter: Jukka Zitting
>            Assignee: Thomas Mueller
>             Fix For: 2.2.0
>
>
> The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).
> To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2746) Deadlock triggered by ObservationDispatcher

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller updated JCR-2746:
--------------------------------

    Summary: Deadlock triggered by ObservationDispatcher   (was: Deadlock caused by ObservationDispatcher )

> Deadlock triggered by ObservationDispatcher 
> --------------------------------------------
>
>                 Key: JCR-2746
>                 URL: https://issues.apache.org/jira/browse/JCR-2746
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core, observation
>    Affects Versions: 2.0.0, 2.1.0, 2.1.1
>            Reporter: Jukka Zitting
>            Assignee: Thomas Mueller
>             Fix For: 2.2.0
>
>
> The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).
> To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2746) Deadlock triggered by ObservationDispatcher

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller updated JCR-2746:
--------------------------------

    Attachment: TestObservationEndlessLoop.java

Test case to get the endless loop.

> Deadlock triggered by ObservationDispatcher 
> --------------------------------------------
>
>                 Key: JCR-2746
>                 URL: https://issues.apache.org/jira/browse/JCR-2746
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core, observation
>    Affects Versions: 2.0.0, 2.1.0, 2.1.1
>            Reporter: Jukka Zitting
>            Assignee: Thomas Mueller
>             Fix For: 2.2.0
>
>         Attachments: TestObservationEndlessLoop.java
>
>
> The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).
> To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2746) Sleep in possibly endless loop in ObservationDispatcher

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller updated JCR-2746:
--------------------------------

    Summary: Sleep in possibly endless loop in ObservationDispatcher  (was: Deadlock triggered by ObservationDispatcher )

> Sleep in possibly endless loop in ObservationDispatcher
> -------------------------------------------------------
>
>                 Key: JCR-2746
>                 URL: https://issues.apache.org/jira/browse/JCR-2746
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core, observation
>    Affects Versions: 2.0.0, 2.1.0, 2.1.1
>            Reporter: Jukka Zitting
>            Assignee: Thomas Mueller
>             Fix For: 2.2.0
>
>         Attachments: TestObservationEndlessLoop.java
>
>
> The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).
> To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2746) Sleep in possibly endless loop in ObservationDispatcher

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-2746:
-------------------------------

    Attachment: JCR-2746.patch

Could we come up with a solution that doesn't require the extra bits of state and responsibility in SessionContext and the new public method in SessionImpl? Also, having the delay carried over to the next session operation seems a bit troublesome to me. It would be better to keep the extra delay within the scope of the currently executed operation.

How about the attached patch where I move the delay to the finally block of SessionState.perform() and use just an extra ObservationDispatcher.isEventQueueOverloaded() predicate to carry the required bit of state across the system?

> Sleep in possibly endless loop in ObservationDispatcher
> -------------------------------------------------------
>
>                 Key: JCR-2746
>                 URL: https://issues.apache.org/jira/browse/JCR-2746
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core, observation
>    Affects Versions: 2.0.0, 2.1.0, 2.1.1
>            Reporter: Jukka Zitting
>            Assignee: Thomas Mueller
>             Fix For: 2.2.0
>
>         Attachments: JCR-2746.patch, TestObservationEndlessLoop.java
>
>
> The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).
> To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (JCR-2746) Deadlock caused by ObservationDispatcher

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller reassigned JCR-2746:
-----------------------------------

    Assignee: Thomas Mueller

> Deadlock caused by ObservationDispatcher 
> -----------------------------------------
>
>                 Key: JCR-2746
>                 URL: https://issues.apache.org/jira/browse/JCR-2746
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core, observation
>    Affects Versions: 2.0.0, 2.1.0, 2.1.1
>            Reporter: Jukka Zitting
>            Assignee: Thomas Mueller
>             Fix For: 2.2.0
>
>
> The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).
> To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2746) Sleep in possibly endless loop in ObservationDispatcher

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-2746:
-------------------------------

    Attachment: JCR-2746.patch

The updated version of the patch changes the key method to ObservationDispatcher.delayIfEventQueueOverloaded(), and makes sure that no delay is added when executing within the observation thread.

> Sleep in possibly endless loop in ObservationDispatcher
> -------------------------------------------------------
>
>                 Key: JCR-2746
>                 URL: https://issues.apache.org/jira/browse/JCR-2746
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core, observation
>    Affects Versions: 2.0.0, 2.1.0, 2.1.1
>            Reporter: Jukka Zitting
>            Assignee: Thomas Mueller
>             Fix For: 2.2.0
>
>         Attachments: JCR-2746.patch, JCR-2746.patch, TestObservationEndlessLoop.java
>
>
> The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).
> To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2746) Sleep in possibly endless loop in ObservationDispatcher

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914527#action_12914527 ] 

Jukka Zitting commented on JCR-2746:
------------------------------------

I committed the proposed patch in revision 1000950.

> Sleep in possibly endless loop in ObservationDispatcher
> -------------------------------------------------------
>
>                 Key: JCR-2746
>                 URL: https://issues.apache.org/jira/browse/JCR-2746
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core, observation
>    Affects Versions: 2.0.0, 2.1.0, 2.1.1
>            Reporter: Jukka Zitting
>            Assignee: Thomas Mueller
>             Fix For: 2.2.0
>
>         Attachments: JCR-2746.patch, JCR-2746.patch, TestObservationEndlessLoop.java
>
>
> The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).
> To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2746) Deadlock caused by ObservationDispatcher

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912473#action_12912473 ] 

Thomas Mueller commented on JCR-2746:
-------------------------------------

Thread.sleep can't be the root cause of a new deadlock. (Thread.sleep was added in JCR-2402, but no additional synchronization or locking was introduced). But Thread.sleep can *trigger* a deadlock that normally doesn't show up. So it it's really a deadlock, it could also occur even without JCR-2402 (but less likely).

JCR-2402 calls Thread.sleep in a loop. If I replace the loop with an "if" then I can reproduce some kind of deadlock. Currently it looks more like a bug in DefaultISMLocking however... I'm analyzing it.

> Deadlock caused by ObservationDispatcher 
> -----------------------------------------
>
>                 Key: JCR-2746
>                 URL: https://issues.apache.org/jira/browse/JCR-2746
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core, observation
>    Affects Versions: 2.0.0, 2.1.0, 2.1.1
>            Reporter: Jukka Zitting
>            Assignee: Thomas Mueller
>             Fix For: 2.2.0
>
>
> The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).
> To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2746) Deadlock caused by ObservationDispatcher

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912477#action_12912477 ] 

Thomas Mueller commented on JCR-2746:
-------------------------------------

There seems to be a bug in DefaultISMLocking:

1) The main thread gets a read lock.

2) The ObservationManager thread tries to lock for writing, which is blocked because there is still a read lock.

3) Then the main thread tries to get a second read lock, which is blocked because there is a waiting write lock.

It looks like JCR-2402 is triggering this (so far hidden / undetected?) problem.

> Deadlock caused by ObservationDispatcher 
> -----------------------------------------
>
>                 Key: JCR-2746
>                 URL: https://issues.apache.org/jira/browse/JCR-2746
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core, observation
>    Affects Versions: 2.0.0, 2.1.0, 2.1.1
>            Reporter: Jukka Zitting
>            Assignee: Thomas Mueller
>             Fix For: 2.2.0
>
>
> The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).
> To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2746) Sleep in possibly endless loop in ObservationDispatcher

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912970#action_12912970 ] 

Thomas Mueller commented on JCR-2746:
-------------------------------------

Sleeping no longer happens in a loop. Also, instead of sleeping in the observation dispatcher (while locks are held), the delay is now done before the next operation (before locks are acquired).

> Sleep in possibly endless loop in ObservationDispatcher
> -------------------------------------------------------
>
>                 Key: JCR-2746
>                 URL: https://issues.apache.org/jira/browse/JCR-2746
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core, observation
>    Affects Versions: 2.0.0, 2.1.0, 2.1.1
>            Reporter: Jukka Zitting
>            Assignee: Thomas Mueller
>             Fix For: 2.2.0
>
>         Attachments: TestObservationEndlessLoop.java
>
>
> The rate-limitation code we added in JCR-2402 to prevent the observation queue from growing too large was a good idea, but the current implementation is a bit troublesome since it blocks the thread while it still holds the journal lock, the SISM reader lock, and the SessionState lock. This can cause a deadlock under heavy workloads if any of the observation listeners attempts to reuse the session (not recommended/supported, but can still happen) or write to the repository (quite likely).
> To solve this problem we should move the rate-limiter code to outside the scope of any internal locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.