You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by "Ian Boston (JIRA)" <ji...@apache.org> on 2007/05/18 09:41:16 UTC

[jira] Created: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Under Heavy load in a Cluster HTTP Threads Block and stall requests
-------------------------------------------------------------------

                 Key: JCR-929
                 URL: https://issues.apache.org/jira/browse/JCR-929
             Project: Jackrabbit
          Issue Type: Bug
          Components: core
    Affects Versions: 1.3
         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
            Reporter: Ian Boston


Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.

Once that happens that node becomes unusable.
More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Dominique Pfister (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501159 ] 

Dominique Pfister commented on JCR-929:
---------------------------------------

Hi Ian,

thank you for looking at this problem and providing a patch! I'll take a look at it asap. I will possibly extend the o.a.j.c.cluster.LockEventChannel to provide the locking functionality you added in order not to expose the ClusterNode class directly to the LockManagerImpl.

Kind regards
Dominique

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>            Assignee: Dominique Pfister
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt, JCR-929.patch
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499181 ] 

Ian Boston commented on JCR-929:
--------------------------------



I am seeing some collisions on the revision numbers. Which causes the local cache to become out of date.... but looking at the code paths, there is always an release operation.

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>         Assigned To: Dominique Pfister
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12497390 ] 

Marcel Reutegger commented on JCR-929:
--------------------------------------

Can you please attach the full stacktrace of the involved threads?

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496808 ] 

Ian Boston commented on JCR-929:
--------------------------------

Restarting the cluster node with the problem appears to clear the problem with no bad effects to the JCR,

There are no deadlocks reported on any threads on the locked nodes and other https requests that do not use the JCR are operating correctly. 

Connecting a Profiler like JProfiler to the JVM on startup indicates that there are no blocked threads, and no deadlocks are detected.

Adding a mbean monitor thread to analyse the threads within the JVM indicates that there are no blocked threads when the node stops responding.

It does not look like this is a deadlock condition.

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499122 ] 

Ian Boston commented on JCR-929:
--------------------------------



Im not certain about this, but I think the aquire in te sync from the ClusterNode should be an attempt, and if it fails, it should back put and wait for the next cycle.

I'll give it a go and see what happens



> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>         Assigned To: Dominique Pfister
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499138 ] 

Ian Boston commented on JCR-929:
--------------------------------



The thread that is in the Abstract Journal aquired a lock in the LockManagerImpl at about line 313 (1.3 TAG in svn) and then went into AbsracctJournal.lockAndSync via the cluster node.

All the other threads are waiting for this lock to be released prior to continuing, this is ok,

since there are no other threads waiting, I can only think that something is not releasing the journal lock in AbscractJournal (thankfully these are singletons !)

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>         Assigned To: Dominique Pfister
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Dominique Pfister (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dominique Pfister resolved JCR-929.
-----------------------------------

    Resolution: Fixed

Thanks Ian, for pointing out that one! Tests now pass smoothly. 

Fixed in revision 546038.

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>            Assignee: Dominique Pfister
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt, JCR-929.patch, patch.544769
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Dominique Pfister (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499711 ] 

Dominique Pfister commented on JCR-929:
---------------------------------------

Hi Ian,

thank you for your stack traces and all your work! In your second thread dump, I'd say that the first one (Thread http-8080-Processor25) still holds the AbstractJournal's RWLock (acquired in SharedItemStateManager$Update.begin) and therefore the VM's state is similar to the first thread dump you provided: one thread holds the AbstractJournal's RWLock and will start an item update (1), while other threads interoperate with the LockManager and therefore lock that one (2). When the item update (1) triggers a synchronization on the journal (because another instance made some changes) it might encounter a lock operation and will try to inform the LockManager about this event. Because of all other threads in (2) this will cause the deadlock.

IMO, to solve this problem, LockManager operations will have to adopt the same pattern as SharedItemStateManager updates already do: lock-and-sync the journal when the operation starts, unlock at the end of it.

Kind regards
Dominique

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>            Assignee: Dominique Pfister
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500805 ] 

Ian Boston commented on JCR-929:
--------------------------------

I now have a patch for this which I will upload on saturday night, currently out of network range.

Ian

Sent from my Pearl, sorry about the briefness and spelling!  



> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>            Assignee: Dominique Pfister
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502265 ] 

Ian Boston commented on JCR-929:
--------------------------------

Yes you patch fixes the problem.... its also much cleaner and my gut feeling is that its quicker

Thanks


> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>            Assignee: Dominique Pfister
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt, JCR-929.patch, patch.544769
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503046 ] 

Ian Boston commented on JCR-929:
--------------------------------

There is one small failiure in teh tests when not in a cluster 
About line 324

    void internalUnlock(NodeImpl node)
            throws LockException, RepositoryException {

    	
        ClusterOperation operation = null;
        if ( eventChannel != null ) {
        	operation = eventChannel.create(node.getNodeId());
        }
        boolean successful = false;

        acquire();




> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>            Assignee: Dominique Pfister
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt, JCR-929.patch, patch.544769
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499128 ] 

Ian Boston commented on JCR-929:
--------------------------------

Doing that causes the ClusterNode thread to timeout but the 2 http threads go into waiting state, so they must be interlocking on waits somewhere. 

I cant really do an attempt in the main threads since if the operation doesnt get to the journal then it wont propagate, and doing a back off would have to back off far enough to undo the interlock.


The pattern looks slightly different now,  to I might have fixed the first problem.... sorry about all the long stack traces but  its going to be confusing if not in context.

     Starting Thread Monitor ==================
Thread Transient File Reaper waiting by java.lang.ref.ReferenceQueue$Lock@8da92 ::WAITING at org.apache.jackrabbit.util.TransientFileFactory$ReaperThread.run(TransientFileFactory.java:148)
Thread Transient File Reaper waiting by java.lang.ref.ReferenceQueue$Lock@b410a ::WAITING at org.apache.jackrabbit.util.TransientFileFactory$ReaperThread.run(TransientFileFactory.java:148)
Thread TP-Processor3 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@64886c ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread TP-Processor2 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@388e0b ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread TP-Processor1 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@f59ac1 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor25 waiting by EDU.oswego.cs.dl.util.concurrent.ReentrantLock@eb5563 ::WAITING at EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
     at java.lang.Object.wait(Object.java:-2)
     at java.lang.Object.wait(Object.java:474)
     at EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.acquire(LockManagerImpl.java:599)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.nodeAdded(LockManagerImpl.java:840)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.onEvent(LockManagerImpl.java:745)
     at org.apache.jackrabbit.core.observation.EventConsumer.consumeEvents(EventConsumer.java:231)
     at org.apache.jackrabbit.core.observation.ObservationDispatcher.dispatchEvents(ObservationDispatcher.java:201)
     at org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:424)
     at org.apache.jackrabbit.core.state.SharedItemStateManager$Update.end(SharedItemStateManager.java:721)
     at org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:855)
     at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:326)
     at org.apache.jackrabbit.core.state.XAItemStateManager.update(XAItemStateManager.java:313)
     at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:302)
     at org.apache.jackrabbit.core.state.SessionItemStateManager.update(SessionItemStateManager.java:306)
     at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:1214)
     at org.apache.jackrabbit.webdav.simple.DavResourceImpl.addMember(DavResourceImpl.java:517)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.doPut(AbstractWebdavServlet.java:504)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.execute(AbstractWebdavServlet.java:241)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.service(AbstractWebdavServlet.java:193)
     at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
     at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
     at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
     at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
     at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
     at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
     at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
     at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
     at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
     at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
     at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
     at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
     at java.lang.Thread.run(Thread.java:613)
Thread http-8080-Processor23 waiting by EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock@bd12a5 ::WAITING at EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)
     at java.lang.Object.wait(Object.java:-2)
     at java.lang.Object.wait(Object.java:474)
     at EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)
     at org.apache.jackrabbit.core.journal.AbstractJournal.lockAndSync(AbstractJournal.java:233)
     at org.apache.jackrabbit.core.journal.DefaultRecordProducer.append(DefaultRecordProducer.java:51)
     at org.apache.jackrabbit.core.cluster.ClusterNode$WorkspaceLockChannel.unlocked(ClusterNode.java:637)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.internalUnlock(LockManagerImpl.java:338)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.unlock(LockManagerImpl.java:428)
     at org.apache.jackrabbit.core.lock.XALockManager.unlock(XALockManager.java:103)
     at org.apache.jackrabbit.core.NodeImpl.unlock(NodeImpl.java:4133)
     at org.apache.jackrabbit.webdav.simple.DavResourceImpl.unlock(DavResourceImpl.java:739)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.doUnlock(AbstractWebdavServlet.java:710)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.execute(AbstractWebdavServlet.java:262)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.service(AbstractWebdavServlet.java:193)
     at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
     at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
     at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
     at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
     at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
     at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
     at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
     at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
     at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
     at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
     at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
     at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
     at java.lang.Thread.run(Thread.java:613)
Thread http-8080-Processor22 waiting by EDU.oswego.cs.dl.util.concurrent.ReentrantLock@eb5563 ::WAITING at EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
     at java.lang.Object.wait(Object.java:-2)
     at java.lang.Object.wait(Object.java:474)
     at EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.acquire(LockManagerImpl.java:599)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.getLockInfo(LockManagerImpl.java:356)
     at org.apache.jackrabbit.core.lock.XALockManager.isLocked(XALockManager.java:143)
     at org.apache.jackrabbit.core.NodeImpl.isLocked(NodeImpl.java:4161)
     at org.apache.jackrabbit.webdav.simple.DavResourceImpl.getLock(DavResourceImpl.java:648)
     at org.apache.jackrabbit.webdav.simple.DavResourceImpl.initProperties(DavResourceImpl.java:312)
     at org.apache.jackrabbit.webdav.simple.DavResourceImpl.getProperties(DavResourceImpl.java:271)
     at org.apache.jackrabbit.webdav.MultiStatusResponse.<init>(MultiStatusResponse.java:180)
     at org.apache.jackrabbit.webdav.MultiStatus.addResourceProperties(MultiStatus.java:62)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.doPropFind(AbstractWebdavServlet.java:435)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.execute(AbstractWebdavServlet.java:232)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.service(AbstractWebdavServlet.java:193)
     at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
     at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
     at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
     at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
     at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
     at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
     at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
     at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
     at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
     at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
     at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
     at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
     at java.lang.Thread.run(Thread.java:613)
Thread http-8080-Processor21 waiting by EDU.oswego.cs.dl.util.concurrent.ReentrantLock@eb5563 ::WAITING at EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
     at java.lang.Object.wait(Object.java:-2)
     at java.lang.Object.wait(Object.java:474)
     at EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.acquire(LockManagerImpl.java:599)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.getLockInfo(LockManagerImpl.java:356)
     at org.apache.jackrabbit.core.lock.XALockManager.isLocked(XALockManager.java:143)
     at org.apache.jackrabbit.core.NodeImpl.isLocked(NodeImpl.java:4161)
     at org.apache.jackrabbit.webdav.simple.DavResourceImpl.getLock(DavResourceImpl.java:648)
     at org.apache.jackrabbit.webdav.simple.DavResourceImpl.isLocked(DavResourceImpl.java:855)
     at org.apache.jackrabbit.webdav.simple.DavResourceImpl.addMember(DavResourceImpl.java:501)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.doPut(AbstractWebdavServlet.java:504)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.execute(AbstractWebdavServlet.java:241)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.service(AbstractWebdavServlet.java:193)
     at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
     at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
     at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
     at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
     at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
     at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
     at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
     at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
     at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
     at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
     at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
     at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
     at java.lang.Thread.run(Thread.java:613)
Thread http-8080-Processor20 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@42de17 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor19 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@2ea67 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor18 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@6b1a74 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor17 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@95dc5a ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor16 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@b5cc4d ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor15 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@77e923 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor14 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@6afba3 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor13 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@ae8c15 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor12 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@6eea56 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor11 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@daf576 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor10 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@293c5e ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor9 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@3a774f ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor8 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@6efb3f ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor7 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@30e551 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor6 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@1f6576 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor5 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@3e2817 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor4 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@ba1328 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor3 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@3fbf92 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor2 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@3a3c1f ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8080-Processor1 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@4127c0 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread IndexMerger waiting by org.apache.commons.collections.buffer.BlockingBuffer@44b625 ::WAITING at org.apache.commons.collections.buffer.BlockingBuffer.remove(BlockingBuffer.java:107)
Thread IndexMerger waiting by org.apache.commons.collections.buffer.BlockingBuffer@a7252d ::WAITING at org.apache.commons.collections.buffer.BlockingBuffer.remove(BlockingBuffer.java:107)
Thread ObservationManager waiting by org.apache.commons.collections.buffer.BlockingBuffer@79d52f ::WAITING at org.apache.commons.collections.buffer.BlockingBuffer.remove(BlockingBuffer.java:107)
Thread IndexMerger waiting by org.apache.commons.collections.buffer.BlockingBuffer@8088da ::WAITING at org.apache.commons.collections.buffer.BlockingBuffer.remove(BlockingBuffer.java:107)
Thread IndexMerger waiting by org.apache.commons.collections.buffer.BlockingBuffer@7000ea ::WAITING at org.apache.commons.collections.buffer.BlockingBuffer.remove(BlockingBuffer.java:107)
Thread ObservationManager waiting by org.apache.commons.collections.buffer.BlockingBuffer@4c15d9 ::WAITING at org.apache.commons.collections.buffer.BlockingBuffer.remove(BlockingBuffer.java:107)
Thread Finalizer waiting by java.lang.ref.ReferenceQueue$Lock@c31c7d ::WAITING
Thread Reference Handler waiting by java.lang.ref.Reference$Lock@9c7650 ::WAITING
     Done Thread Monitor ==================





> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>         Assigned To: Dominique Pfister
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Boston updated JCR-929:
---------------------------

    Attachment: JCR-929.patch

This is my patch that appears to fix this problem.

Its not what I would call perfect since it may be too wide, and the patch may not apply as there are some line shifts where I have other logging statements in the files.

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>            Assignee: Dominique Pfister
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt, JCR-929.patch
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496810 ] 

Ian Boston commented on JCR-929:
--------------------------------


If the mbean monitor code looks at waiting threads 2 threads are found to go into a permanent wait state.

The ClusterNode thread (used by the cluster node to replay journal entries) goes into a permanent wait state  on a RenetrantLock object within the LockManagerImpl

Thread ClusterNode-localhost2 waiting by EDU.oswego.cs.dl.util.concurrent.ReentrantLock@6c6ff1 ::WAITING at EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
     at java.lang.Object.wait(Object.java:-2)
     at java.lang.Object.wait(Object.java:474)
     at EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.acquire(LockManagerImpl.java:599)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.nodeAdded(LockManagerImpl.java:838)

And the HTTP threads all go into a permanent wait state  (when they access the JCR) in the AbstractJournal.lockAndSync on a WriterLock

Thread http-8580-Processor23 waiting by EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock@8065c9 ::WAITING at EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)
     at java.lang.Object.wait(Object.java:-2)
     at java.lang.Object.wait(Object.java:474) 
     at EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)
     at org.apache.jackrabbit.core.journal.AbstractJournal.lockAndSync(AbstractJournal.java:228)
     at org.apache.jackrabbit.core.journal.DefaultRecordProducer.append(DefaultRecordProducer.java:51) 


The LockManagerImpl.acquire contains a for(;;) loop that will loop forever if the lock is not aquired, I am putting some debug in the catch to see if the loop is spinning or if the wait is forever.


I will also try and tracedown the objects being waited on to see if they give any clues to what is effectively deadlocking.


> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499768 ] 

Ian Boston commented on JCR-929:
--------------------------------

Dominique,

thanks for the pointer, that makes sense to me and fits other problems I am seeing where locks fail on non existent nodes during journal replay.


I will give it a go and get back to you. Unfortunately, I will not have full network access until this Saturday.

Thanks
Ian

Sent from my Pearl, sorry about the briefness and spelling!  



> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>            Assignee: Dominique Pfister
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500059 ] 

Ian Boston commented on JCR-929:
--------------------------------

That appears to have fixed this problem,  in the internal lock and unlock methods I have added an effective lockAndSync if and only if the event channel is notified of the lock or unlock which would cause a lock and sync after the lock manager jvm lock had been aquired. So deadlocks can't happen as you predicted.

I notice there may be some more places where this can happen.
 
The NodeTypeRegistry and the NameSpaceRegistry 

I have some additional errors appearing when the path can't be built due to nonexistant child nodes, leading me to believe that something might be being dropped.

Unfortunately, I don't have net access and my blackberry won't hook up to OSX so I can't send a patch till Saturday at the earliest. Sorry

Ian

Sent from my Pearl, sorry about the briefness and spelling!  



> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>            Assignee: Dominique Pfister
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Dominique Pfister (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dominique Pfister reassigned JCR-929:
-------------------------------------

    Assignee: Dominique Pfister

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>         Assigned To: Dominique Pfister
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496813 ] 

Ian Boston commented on JCR-929:
--------------------------------

The reverse pattern also appears,

The ClusterNode thread waiting in AbstractJournal.sync

and the http threads waiting in the LockManagerImpl.aquire

-----

The previous case was ClusterNode thread in LockManagerImpl.aquire 

and http thread waiting in AbstractJournal.lockAndSync

This indicates that both sets of threads interact with the locks in both places raising the potential for an interlock to happen.

Since the AbstractJournal is the newer code, perhapse it should perform a LockManagerImpl aquire earlier than it does ?

-----

There is no indication that the spin lock inside LockManagerImpl.aquire ever comes out of the wait condition, except on a interupt to the JVM, at which point it goes back into the aquire.

This prevents the JCR from shutting down since the shutdown operation also needs to aquire a lock.

Will investicate the call tree to see if its possible to change the locking order to prevent the interlock without hitting performance

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Boston updated JCR-929:
---------------------------

    Attachment: catalina.out.node2.txt
                catalina.out.node1.txt

2 Log files, node 1 has locked up,

It appears to be locked after being unable to deliver lock event,

which looks like it was caused by a revision collision.



> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>         Assigned To: Dominique Pfister
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Xiaohua Lu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500761 ] 

Xiaohua Lu commented on JCR-929:
--------------------------------

I had a similar problem but the stack trace is slight different 
The setup is a 4 nodes cluster and under heavy load (mainly updates), they all hang, from database side, three transaction updates are waiting for a select lock. The select lock seems to be blocked by one of the threads underneath

thread 1 
Thread 25141: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - java.lang.Object.wait() @bci=2, line=474 (Compiled frame)
 - org.apache.jackrabbit.core.journal.AbstractJournal.sync() @bci=9, line=160 (Compiled frame)
 - org.apache.jackrabbit.core.cluster.ClusterNode.sync() @bci=27, line=283 (Interpreted frame)
 - org.apache.jackrabbit.core.cluster.ClusterNode.run() @bci=38, line=254 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)


thread 2 
Thread 25137: (state = BLOCKED)
 - org.apache.commons.collections.map.AbstractHashedMap.get(java.lang.Object) @bci=62, line=182 (Compiled frame; information may be imprecise)
 - org.apache.jackrabbit.core.state.NodeState.getReorderedChildNodeEntries() @bci=57, line=671 (Compiled frame)
 - org.apache.jackrabbit.core.CachingHierarchyManager.nodesReplaced(org.apache.jackrabbit.core.state.NodeState) @bci=1, line=385 (Interpreted frame)
 - org.apache.jackrabbit.core.state.StateChangeDispatcher.notifyNodesReplaced(org.apache.jackrabbit.core.state.NodeState) @bci=29, line=132 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SessionItemStateManager.nodesReplaced(org.apache.jackrabbit.core.state.NodeState) @bci=29, line=874 (Interpreted frame)
 - org.apache.jackrabbit.core.state.NodeState.notifyNodesReplaced() @bci=12, line=793 (Interpreted frame)
 - org.apache.jackrabbit.core.state.NodeState.setChildNodeEntries(java.util.List) @bci=73, line=473 (Interpreted frame)
 - org.apache.jackrabbit.core.state.NodeStateMerger.merge(org.apache.jackrabbit.core.state.NodeState, org.apache.jackrabbit.core.state.NodeStateMerger$MergeContext) @bci=291, line=139 (Compiled frame)
 - org.apache.jackrabbit.core.state.SessionItemStateManager.stateModified(org.apache.jackrabbit.core.state.ItemState) @bci=58, line=802 (Interpreted frame)
 - org.apache.jackrabbit.core.state.StateChangeDispatcher.notifyStateModified(org.apache.jackrabbit.core.state.ItemState) @bci=29, line=85 (Interpreted frame)
 - org.apache.jackrabbit.core.state.LocalItemStateManager.stateModified(org.apache.jackrabbit.core.state.ItemState) @bci=49, line=427 (Interpreted frame)
 - org.apache.jackrabbit.core.state.StateChangeDispatcher.notifyStateModified(org.apache.jackrabbit.core.state.ItemState) @bci=29, line=85 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SharedItemStateManager.stateModified(org.apache.jackrabbit.core.state.ItemState) @bci=5, line=390 (Interpreted frame)
 - org.apache.jackrabbit.core.state.ItemState.notifyStateUpdated() @bci=12, line=241 (Interpreted frame)
 - org.apache.jackrabbit.core.state.ChangeLog.persisted() @bci=30, line=271 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SharedItemStateManager.doExternalUpdate(org.apache.jackrabbit.core.state.ChangeLog) @bci=264, line=945 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SharedItemStateManager.externalUpdate(org.apache.jackrabbit.core.state.ChangeLog, org.apache.jackrabbit.core.observation.EventStateCollection) @bci=10, line=871 (Interpreted frame)
 - org.apache.jackrabbit.core.RepositoryImpl$WorkspaceInfo.externalUpdate(org.apache.jackrabbit.core.state.ChangeLog, java.util.List) @bci=25, line=1957 (Interpreted frame)
 - org.apache.jackrabbit.core.cluster.ClusterNode.end() @bci=182, line=834 (Interpreted frame)
 - org.apache.jackrabbit.core.cluster.ClusterNode.consume(org.apache.jackrabbit.core.journal.Record) @bci=469, line=929 (Compiled frame)
 - org.apache.jackrabbit.core.journal.AbstractJournal.doSync(long) @bci=108, line=191 (Compiled frame)
 - org.apache.jackrabbit.core.journal.AbstractJournal.lockAndSync() @bci=42, line=241 (Interpreted frame)
 - org.apache.jackrabbit.core.journal.DefaultRecordProducer.append() @bci=6, line=51 (Interpreted frame)
 - org.apache.jackrabbit.core.cluster.ClusterNode$WorkspaceUpdateChannel.updateCreated(org.apache.jackrabbit.core.cluster.Update) @bci=36, line=466 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SharedItemStateManager$Update.begin() @bci=44, line=530 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SharedItemStateManager.beginUpdate(org.apache.jackrabbit.core.state.ChangeLog, org.apache.jackrabbit.core.observation.EventStateCollectionFactory, org.apache.jackrabbit.core.virtual.VirtualItemStateProvider) @bci=15, line=825 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SharedItemStateManager.update(org.apache.jackrabbit.core.state.ChangeLog, org.apache.jackrabbit.core.observation.EventStateCollectionFactory) @bci=4, line=855 (Interpreted frame)
 - org.apache.jackrabbit.core.state.LocalItemStateManager.update(org.apache.jackrabbit.core.state.ChangeLog) @bci=9, line=326 (Interpreted frame)
 - org.apache.jackrabbit.core.state.XAItemStateManager.update(org.apache.jackrabbit.core.state.ChangeLog) @bci=20, line=313 (Interpreted frame)
 - org.apache.jackrabbit.core.state.LocalItemStateManager.update() @bci=22, line=302 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SessionItemStateManager.update() @bci=4, line=306 (Interpreted frame)
 - org.apache.jackrabbit.core.ItemImpl.save() @bci=594, line=1214 (Interpreted frame)
 - net.maven.mcr.event.AssetCompleteEventListener.markAssetComplete(javax.jcr.Node, boolean) @bci=137, line=185 (Interpreted frame)
 - net.maven.mcr.event.AssetCompleteEventListener.handleAssetCompleteCheck(java.lang.String) @bci=241, line=169 (Interpreted frame)
 - net.maven.mcr.event.AssetCompleteEventListener.onEvent(javax.jcr.observation.EventIterator) @bci=112, line=82 (Interpreted frame)
 - org.apache.jackrabbit.core.observation.EventConsumer.consumeEvents(org.apache.jackrabbit.core.observation.EventStateCollection) @bci=165, line=231 (Compiled frame)
 - org.apache.jackrabbit.core.observation.ObservationDispatcher.run() @bci=104, line=145 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)


Since Thread 2 is blocked by JVM lock, it is also holding the select lock in doSync.getRecords. That explained the deadlock on database level. 

I am not sure these two problems are exactly the same, if not, I can file a seperate bug. Thanks.





> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>            Assignee: Dominique Pfister
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499116 ] 

Ian Boston commented on JCR-929:
--------------------------------


There is a stack trace of a node locked in waiting, the HTTP are locked and every request to this node, that hits the http thread will block in the same wait pattern



     Starting Thread Monitor ==================
Thread Transient File Reaper waiting by java.lang.ref.ReferenceQueue$Lock@4622f ::WAITING at org.apache.jackrabbit.util.TransientFileFactory$ReaperThread.run(TransientFileFactory.java:148)
Thread TP-Processor3 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@eb38d2 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread TP-Processor2 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@df787e ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread TP-Processor1 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@5fa15c ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor25 waiting by EDU.oswego.cs.dl.util.concurrent.ReentrantLock@5036f6 ::WAITING at EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
     at java.lang.Object.wait(Object.java:-2)
     at java.lang.Object.wait(Object.java:474)
     at EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.acquire(LockManagerImpl.java:599)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.externalLock(LockManagerImpl.java:973)
     at org.apache.jackrabbit.core.cluster.ClusterNode.process(ClusterNode.java:723)
     at org.apache.jackrabbit.core.cluster.ClusterNode.consume(ClusterNode.java:910)
     at org.apache.jackrabbit.core.journal.AbstractJournal.doSync(AbstractJournal.java:191)
     at org.apache.jackrabbit.core.journal.AbstractJournal.lockAndSync(AbstractJournal.java:241)
     at org.apache.jackrabbit.core.journal.DefaultRecordProducer.append(DefaultRecordProducer.java:51)
     at org.apache.jackrabbit.core.cluster.ClusterNode$WorkspaceUpdateChannel.updateCreated(ClusterNode.java:466)
     at org.apache.jackrabbit.core.state.SharedItemStateManager$Update.begin(SharedItemStateManager.java:530)
     at org.apache.jackrabbit.core.state.SharedItemStateManager.beginUpdate(SharedItemStateManager.java:825)
     at org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:855)
     at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:326)
     at org.apache.jackrabbit.core.state.XAItemStateManager.update(XAItemStateManager.java:313)
     at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:302)
     at org.apache.jackrabbit.core.state.SessionItemStateManager.update(SessionItemStateManager.java:306)
     at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:1214)
     at org.apache.jackrabbit.core.NodeImpl.lock(NodeImpl.java:4070)
     at org.apache.jackrabbit.webdav.simple.DavResourceImpl.lock(DavResourceImpl.java:685)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.doLock(AbstractWebdavServlet.java:689)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.execute(AbstractWebdavServlet.java:259)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.service(AbstractWebdavServlet.java:193)
     at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
     at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
     at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
     at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
     at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
     at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
     at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
     at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
     at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
     at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
     at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
     at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
     at java.lang.Thread.run(Thread.java:613)
Thread http-8580-Processor24 waiting by EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock@488a98 ::WAITING at EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)
     at java.lang.Object.wait(Object.java:-2)
     at java.lang.Object.wait(Object.java:474)
     at EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)
     at org.apache.jackrabbit.core.journal.AbstractJournal.lockAndSync(AbstractJournal.java:228)
     at org.apache.jackrabbit.core.journal.DefaultRecordProducer.append(DefaultRecordProducer.java:51)
     at org.apache.jackrabbit.core.cluster.ClusterNode$WorkspaceLockChannel.unlocked(ClusterNode.java:637)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.internalUnlock(LockManagerImpl.java:338)
     at org.apache.jackrabbit.core.lock.LockManagerImpl.unlock(LockManagerImpl.java:428)
     at org.apache.jackrabbit.core.lock.XALockManager.unlock(XALockManager.java:103)
     at org.apache.jackrabbit.core.NodeImpl.unlock(NodeImpl.java:4133)
     at org.apache.jackrabbit.webdav.simple.DavResourceImpl.unlock(DavResourceImpl.java:739)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.doUnlock(AbstractWebdavServlet.java:710)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.execute(AbstractWebdavServlet.java:262)
     at org.apache.jackrabbit.server.AbstractWebdavServlet.service(AbstractWebdavServlet.java:193)
     at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
     at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
     at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
     at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
     at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
     at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
     at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
     at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
     at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
     at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
     at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
     at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
     at java.lang.Thread.run(Thread.java:613)
Thread http-8580-Processor22 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@40eb2a ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor21 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@f45d1 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor20 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@b0cea2 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor19 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@ce1b2b ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor18 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@59c4c6 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor17 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@85ded7 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor16 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@e90ea7 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor15 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@c11ce0 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor14 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@4ecfa4 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor13 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@225841 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor12 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@d034cf ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor11 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@3b1da2 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor10 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@16b458 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor9 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@9a3769 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor8 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@abd36b ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor7 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@c54d06 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor6 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@43f502 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor5 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@8a7951 ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor4 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@22da8f ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor3 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@3af3cb ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor2 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@5ba3ee ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread http-8580-Processor1 waiting by org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@67770f ::WAITING at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656)
Thread ClusterNode-node2 waiting by EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$ReaderLock@aaab5d ::WAITING at EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$ReaderLock.acquire(null:-1)
     at java.lang.Object.wait(Object.java:-2)
     at java.lang.Object.wait(Object.java:474)
     at EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$ReaderLock.acquire(null:-1)
     at org.apache.jackrabbit.core.journal.AbstractJournal.sync(AbstractJournal.java:160)
     at org.apache.jackrabbit.core.cluster.ClusterNode.sync(ClusterNode.java:283)
     at org.apache.jackrabbit.core.cluster.ClusterNode.run(ClusterNode.java:254)
     at java.lang.Thread.run(Thread.java:613)
Thread IndexMerger waiting by org.apache.commons.collections.buffer.BlockingBuffer@7dfc02 ::WAITING at org.apache.commons.collections.buffer.BlockingBuffer.remove(BlockingBuffer.java:107)
Thread IndexMerger waiting by org.apache.commons.collections.buffer.BlockingBuffer@9d85b ::WAITING at org.apache.commons.collections.buffer.BlockingBuffer.remove(BlockingBuffer.java:107)
Thread ObservationManager waiting by org.apache.commons.collections.buffer.BlockingBuffer@5b60bf ::WAITING at org.apache.commons.collections.buffer.BlockingBuffer.remove(BlockingBuffer.java:107)
Thread Finalizer waiting by java.lang.ref.ReferenceQueue$Lock@99d183 ::WAITING
Thread Reference Handler waiting by java.lang.ref.Reference$Lock@1de95a ::WAITING
     Done Thread Monitor ==================

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>         Assigned To: Dominique Pfister
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Dominique Pfister (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dominique Pfister updated JCR-929:
----------------------------------

    Attachment: patch.544769

Hi Ian,

I implemented your changes, introducing a new interface ClusterOperation that hides the details of locking (unlocking) the journal before (after) some cluster operation starts (ends), which will be useful for nodetype/namespace operations as well. I've attached a patch file that should be directly applied to the jackrabbit 1.3 sources (patch -p0 < patch.544769) and I'd be grateful if you could test whether this patch resolves this issue.

Kind regards
Dominique

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>            Assignee: Dominique Pfister
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt, JCR-929.patch, patch.544769
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-929:
------------------------------

    Fix Version/s: 1.3.1

Merged to the 1.3 branch in revision 558172.

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>            Assignee: Dominique Pfister
>             Fix For: 1.3.1
>
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt, JCR-929.patch, patch.544769
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.