You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lécharny <el...@gmail.com> on 2012/05/05 12:15:26 UTC

Release troubles and failing tests

Hi guys,

I'm trying to get ADS 2.0.0-M7 release done today, but I'm having hard 
time... Appart a bug in maven release plugin (mvn release:rollback does 
not remove the tags, which leads to some very tricky issues with the rat 
plugin...), we stll have some random failures in server-integ and 
client-api-tests.

Those failures are well known : timeout in the LRUCache, most certainly 
due to some cursors not being closed.

Despite the 3 days I spent 2 weeks ago trying to get all the cursors 
closed, we may have a few that are still opened. We definitively need to 
work something that detects the cursor openng/closing, and inform us 
when we have unpaired opened cursors (see the discussion we have had 2 
weeks ago).
Or we may have a bug in the way we manage the cache, but my bet is on 
cursor not being carefully closed.

Anyway. The server is working, and the problem occurs because we run 
concurrent tests with a lot of borderline usecases.

I wish I will be able to get the release done today...

thanks !

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/8/12 8:05 AM, Selcuk AYA a écrit :
> Hi,
>
> On Sun, May 6, 2012 at 7:26 AM, Emmanuel Lécharny<el...@gmail.com>  wrote:
>> Le 5/6/12 12:07 PM, Emmanuel Lécharny a écrit :
>>
>>> Ok, I think I have found the cursor that was not closed.
>> Still having the same failure :/ All the cursors are correctly closed. If I
>> disable the logs (that I use to check the pairing between open and close), I
>> get a failure, and if I activate the log, then I get no failure.
>>
>> I will try to run all the server-integ tests with NOT_THEADSAFE to see if it
>> works better.
> I was trying to build the trunk and see if I can chase the issue we
> see with the caches. However  I am getting the following error. Note
> that I tried cleaning the repository.  Can you let me know if there is
> workaround?
Sure there is  : change line 47 in apacheds/pom.xml
<org.apache.directory.shared.version>1.0.0-M12</org.apache.directory.shared.version>

to
<org.apache.directory.shared.version>1.0.0-M12-SNAPSHOT</org.apache.directory.shared.version>

As I ws trying to cut a ApacheDS release, I made it to point on the 
shared that is currently being voted, 1.0.0-M12. Reverting it to 
1.0.0-M12-SNAPSHOT will let you refer to your local shared version 
(shared-1.0.0-M12 has not yet been propagated to the maven repository).

FYI, I have added some logs that dump all the cursor openings and 
closings, and I wrote a small matcher that check that all the cursors 
are correctly close. When the tests finish correctly, I have no remainig 
cursor in open state.

Here is the code that does the match :

import static org.junit.Assert.fail;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.HashMap;
import java.util.Map;

import org.junit.Test;

/**
  * Test parsing of a request
  */
public class CursorMatch extends AbstractTest
{
     protected Map<String, String> creatings = new HashMap<String, 
String>();
     protected Map<String, String> closings = new HashMap<String, String>();


     /**
      * Test parsing of a request
      */
     @Test
     public void testCursorMatch()
     {
         try
         {
             BufferedReader reader = new  BufferedReader(
                 new FileReader( CursorMatch.class.getResource( 
"cursor.txt" ).getFile() ) );

             readLines( reader );
         }
         catch ( Exception e )
         {
             System.out.println( e.getMessage() );
             fail( e.getMessage() );
         }
     }


     protected void readLines( Reader reader ) throws Exception
     {
         String line;
         int nbLine = 1;

         while ( ( line = getLine( reader ) ) != null )
         {
             String[] values = line.split( "@" );

             if ( values.length != 2 )
             {
                 throw new Exception( "Bad line " + nbLine );
             }

             if ( values[0].startsWith( "Creating" ) )
             {
                 creatings.put( values[1], values[0] );
             }
             else
             {
                 closings.put( values[1], values[0] );
             }

             nbLine ++;
         }

         for ( String key : creatings.keySet() )
         {
             if ( closings.containsKey( key ) )
             {
                 closings.remove( key );
             }
             else
             {
                 System.out.println( "Key " + key + " not found (" + 
creatings.get( key ) );
             }
         }
    }


     protected String getLine( Reader reader ) throws IOException
     {
         return ( ( BufferedReader ) reader ).readLine();
     }
}

What is strange is that server-integ is failing frequenty now that the 
server is must faster. If I generate some logs on disk, or on sysout, 
then the tests are likely to succeed.

I just ran the server-integ tests without log, and it get blocked. Here 
is a thread dump :
(I ran mvn clean install -Dintegration -rf apacheds/server-integ, so 
that you don't have to wait until everything is built to get 
server-integ being executed. Of course, you need first to build 
everything, with mvn clean install -DskipTests)

-------------------------------------------------------
  T E S T S
-------------------------------------------------------
Running org.apache.directory.server.replication.ClientServerReplicationIT
Tests run: 5, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 29.203 sec
Running org.apache.directory.server.suites.StockServerISuite
2012-05-08 08:44:34
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.6-b01-415 mixed 
mode):
"NioSocketAcceptor-123" prio=5 tid=7fd20cf2a000 nid=0x1132f0000 runnable 
[1132ef000]
    java.lang.Thread.State: RUNNABLE
     at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
     at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:136)
     at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:69)
     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
     - locked <7bb08d898> (a sun.nio.ch.Util$2)
     - locked <7bb08d888> (a java.util.Collections$UnmodifiableSet)
     - locked <7bb08d680> (a sun.nio.ch.KQueueSelectorImpl)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
     at 
org.apache.mina.transport.socket.nio.NioSocketAcceptor.select(NioSocketAcceptor.java:238)
     at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:413)
     at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:680)
"pool-62-thread-1" prio=5 tid=7fd20baf1000 nid=0x11503e000 waiting on 
condition [11503d000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c14a91e0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:315)
     at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:680)
"ExpiringMapExpirer-1" daemon prio=5 tid=7fd20ce02800 nid=0x115290000 
waiting on condition [11528f000]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
     at java.lang.Thread.sleep(Native Method)
     at org.apache.mina.util.ExpiringMap$Expirer.run(ExpiringMap.java:301)
     at java.lang.Thread.run(Thread.java:680)
"NioSocketAcceptor-25" prio=5 tid=7fd20b2fa000 nid=0x114b2f000 runnable 
[114b2e000]
    java.lang.Thread.State: RUNNABLE
     at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
     at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:136)
     at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:69)
     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
     - locked <7c1646420> (a sun.nio.ch.Util$2)
     - locked <7c1646438> (a java.util.Collections$UnmodifiableSet)
     - locked <7c16533d8> (a sun.nio.ch.KQueueSelectorImpl)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
     at 
org.apache.mina.transport.socket.nio.NioSocketAcceptor.select(NioSocketAcceptor.java:238)
     at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:413)
     at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:680)
"NioSocketAcceptor-24" prio=5 tid=7fd20b699000 nid=0x114a2c000 runnable 
[114a2b000]
    java.lang.Thread.State: RUNNABLE
     at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
     at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:136)
     at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:69)
     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
     - locked <7c140c630> (a sun.nio.ch.Util$2)
     - locked <7c140c648> (a java.util.Collections$UnmodifiableSet)
     - locked <7c1573178> (a sun.nio.ch.KQueueSelectorImpl)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
     at 
org.apache.mina.transport.socket.nio.NioSocketAcceptor.select(NioSocketAcceptor.java:238)
     at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:413)
     at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:680)
"SynchWorkerThread" prio=5 tid=7fd20b691000 nid=0x114929000 waiting on 
condition [114928000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c14feec0> (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
     at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
     at 
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
     at 
jdbm.recman.SnapshotRecordManager.commit(SnapshotRecordManager.java:576)
     at 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmIndex.sync(JdbmIndex.java:676)
     - locked <7c1361758> (a 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmRdnIndex)
     at 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmPartition.sync(JdbmPartition.java:242)
     - locked <7c0621f00> (a 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmPartition)
     at 
org.apache.directory.server.core.shared.partition.DefaultPartitionNexus.sync(DefaultPartitionNexus.java:418)
     at 
org.apache.directory.server.core.DefaultDirectoryService$SynchWorker.run(DefaultDirectoryService.java:333)
     at java.lang.Thread.run(Thread.java:680)
"kdcReplayCache.data" daemon prio=5 tid=7fd20bb8b000 nid=0x114826000 
waiting on condition [114825000]
    java.lang.Thread.State: TIMED_WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c057d9b8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
     at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
     at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:680)
"changePwdReplayCache.data" daemon prio=5 tid=7fd20bb8a000 
nid=0x113941000 waiting on condition [113940000]
    java.lang.Thread.State: TIMED_WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c05db2b8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
     at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
     at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:680)
"groupCache.data" daemon prio=5 tid=7fd20bb89800 nid=0x1131ed000 waiting 
on condition [1131ec000]
    java.lang.Thread.State: TIMED_WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c057da18> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
     at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
     at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:680)
"net.sf.ehcache.CacheManager@58c9e870" daemon prio=5 tid=7fd20b884800 
nid=0x112c7f000 in Object.wait() [112c7e000]
    java.lang.Thread.State: WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     - waiting on <7c057da78> (a java.util.TaskQueue)
     at java.lang.Object.wait(Object.java:485)
     at java.util.TimerThread.mainLoop(Timer.java:483)
     - locked <7c057da78> (a java.util.TaskQueue)
     at java.util.TimerThread.run(Timer.java:462)
"Thread-18" daemon prio=5 tid=7fd20bef2000 nid=0x114723000 waiting on 
condition [114722000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c0361258> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at 
org.apache.directory.ldap.client.api.future.ResponseFuture.get(ResponseFuture.java:108)
     at 
org.apache.directory.ldap.client.api.future.SearchFuture.get(SearchFuture.java:64)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.doSyncSearch(ReplicationConsumerImpl.java:630)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.startSync(ReplicationConsumerImpl.java:505)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.start(ReplicationConsumerImpl.java:548)
     at 
org.apache.directory.server.ldap.LdapServer$2.run(LdapServer.java:660)
     at java.lang.Thread.run(Thread.java:680)
"Thread-17" daemon prio=5 tid=7fd20b0fc000 nid=0x114620000 waiting on 
condition [11461f000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c035f380> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at 
org.apache.directory.ldap.client.api.future.ResponseFuture.get(ResponseFuture.java:108)
     at 
org.apache.directory.ldap.client.api.future.SearchFuture.get(SearchFuture.java:64)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.doSyncSearch(ReplicationConsumerImpl.java:630)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.startSync(ReplicationConsumerImpl.java:505)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.start(ReplicationConsumerImpl.java:548)
     at 
org.apache.directory.server.ldap.LdapServer$2.run(LdapServer.java:660)
     at java.lang.Thread.run(Thread.java:680)
"Thread-16" daemon prio=5 tid=7fd20b694800 nid=0x11451d000 waiting on 
condition [11451c000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c03612b8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at 
org.apache.directory.ldap.client.api.future.ResponseFuture.get(ResponseFuture.java:108)
     at 
org.apache.directory.ldap.client.api.future.SearchFuture.get(SearchFuture.java:64)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.doSyncSearch(ReplicationConsumerImpl.java:630)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.startSync(ReplicationConsumerImpl.java:505)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.start(ReplicationConsumerImpl.java:548)
     at 
org.apache.directory.server.ldap.LdapServer$2.run(LdapServer.java:660)
     at java.lang.Thread.run(Thread.java:680)
"Thread-15" daemon prio=5 tid=7fd20bef2800 nid=0x11441a000 waiting on 
condition [114419000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c035f350> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at 
org.apache.directory.ldap.client.api.future.ResponseFuture.get(ResponseFuture.java:108)
     at 
org.apache.directory.ldap.client.api.future.SearchFuture.get(SearchFuture.java:64)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.doSyncSearch(ReplicationConsumerImpl.java:630)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.startSync(ReplicationConsumerImpl.java:505)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.start(ReplicationConsumerImpl.java:548)
     at 
org.apache.directory.server.ldap.LdapServer$2.run(LdapServer.java:660)
     at java.lang.Thread.run(Thread.java:680)
"Thread-14" daemon prio=5 tid=7fd20b0fa000 nid=0x114317000 waiting on 
condition [114316000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c0361360> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at 
org.apache.directory.ldap.client.api.future.ResponseFuture.get(ResponseFuture.java:108)
     at 
org.apache.directory.ldap.client.api.future.SearchFuture.get(SearchFuture.java:64)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.doSyncSearch(ReplicationConsumerImpl.java:630)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.startSync(ReplicationConsumerImpl.java:505)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.start(ReplicationConsumerImpl.java:548)
     at 
org.apache.directory.server.ldap.LdapServer$2.run(LdapServer.java:660)
     at java.lang.Thread.run(Thread.java:680)
"Thread-13" daemon prio=5 tid=7fd20b0f5000 nid=0x114214000 waiting on 
condition [114213000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c035f2d8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at 
org.apache.directory.ldap.client.api.future.ResponseFuture.get(ResponseFuture.java:108)
     at 
org.apache.directory.ldap.client.api.future.SearchFuture.get(SearchFuture.java:64)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.doSyncSearch(ReplicationConsumerImpl.java:630)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.startSync(ReplicationConsumerImpl.java:505)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.start(ReplicationConsumerImpl.java:548)
     at 
org.apache.directory.server.ldap.LdapServer$2.run(LdapServer.java:660)
     at java.lang.Thread.run(Thread.java:680)
"Thread-12" daemon prio=5 tid=7fd20b214800 nid=0x114111000 waiting on 
condition [114110000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c03627b8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at 
org.apache.directory.ldap.client.api.future.ResponseFuture.get(ResponseFuture.java:108)
     at 
org.apache.directory.ldap.client.api.future.SearchFuture.get(SearchFuture.java:64)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.doSyncSearch(ReplicationConsumerImpl.java:630)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.startSync(ReplicationConsumerImpl.java:505)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.start(ReplicationConsumerImpl.java:548)
     at 
org.apache.directory.server.ldap.LdapServer$2.run(LdapServer.java:660)
     at java.lang.Thread.run(Thread.java:680)
"Thread-11" daemon prio=5 tid=7fd20b216000 nid=0x11400e000 waiting on 
condition [11400d000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c0370c40> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at 
org.apache.directory.ldap.client.api.future.ResponseFuture.get(ResponseFuture.java:108)
     at 
org.apache.directory.ldap.client.api.future.SearchFuture.get(SearchFuture.java:64)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.doSyncSearch(ReplicationConsumerImpl.java:630)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.startSync(ReplicationConsumerImpl.java:505)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.start(ReplicationConsumerImpl.java:548)
     at 
org.apache.directory.server.ldap.LdapServer$2.run(LdapServer.java:660)
     at java.lang.Thread.run(Thread.java:680)
"Thread-10" daemon prio=5 tid=7fd20b814800 nid=0x113a44000 waiting on 
condition [113a43000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c03075b8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at 
org.apache.directory.ldap.client.api.future.ResponseFuture.get(ResponseFuture.java:108)
     at 
org.apache.directory.ldap.client.api.future.SearchFuture.get(SearchFuture.java:64)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.doSyncSearch(ReplicationConsumerImpl.java:630)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.startSync(ReplicationConsumerImpl.java:505)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.start(ReplicationConsumerImpl.java:548)
     at 
org.apache.directory.server.ldap.LdapServer$2.run(LdapServer.java:660)
     at java.lang.Thread.run(Thread.java:680)
"Thread-9" daemon prio=5 tid=7fd20b217800 nid=0x11383e000 waiting on 
condition [11383d000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c0302f20> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at 
org.apache.directory.ldap.client.api.future.ResponseFuture.get(ResponseFuture.java:108)
     at 
org.apache.directory.ldap.client.api.future.SearchFuture.get(SearchFuture.java:64)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.doSyncSearch(ReplicationConsumerImpl.java:630)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.startSync(ReplicationConsumerImpl.java:505)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.start(ReplicationConsumerImpl.java:548)
     at 
org.apache.directory.server.ldap.LdapServer$2.run(LdapServer.java:660)
     at java.lang.Thread.run(Thread.java:680)
"pool-1-thread-1" prio=5 tid=7fd20b128000 nid=0x113f0b000 waiting on 
condition [113f0a000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7bebee5a0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:315)
     at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:680)
"pool-11-thread-1" prio=5 tid=7fd20bc10800 nid=0x113b47000 waiting on 
condition [113b44000]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
     at java.lang.Thread.sleep(Native Method)
     at jdbm.helper.LRUCache.put(LRUCache.java:330)
     at 
jdbm.recman.SnapshotRecordManager.update(SnapshotRecordManager.java:401)
     at jdbm.btree.BPage.remove(BPage.java:605)
     at jdbm.btree.BPage.remove(BPage.java:611)
     at jdbm.btree.BTree.remove(BTree.java:464)
     at 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmTable.remove(JdbmTable.java:741)
     - locked <7c14e5da8> (a 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmTable)
     at 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmRdnIndex.drop(JdbmRdnIndex.java:156)
     at 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmRdnIndex.drop(JdbmRdnIndex.java:49)
     at 
org.apache.directory.server.core.partition.impl.btree.AbstractBTreePartition.updateRdnIdx(AbstractBTreePartition.java:807)
     at 
org.apache.directory.server.core.partition.impl.btree.AbstractBTreePartition.add(AbstractBTreePartition.java:644)
     at 
org.apache.directory.server.core.shared.partition.DefaultPartitionNexus.add(DefaultPartitionNexus.java:450)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor$1.add(BaseInterceptor.java:150)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.journal.JournalInterceptor.add(JournalInterceptor.java:139)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.changelog.ChangeLogInterceptor.add(ChangeLogInterceptor.java:113)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.trigger.TriggerInterceptor.add(TriggerInterceptor.java:300)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.event.EventInterceptor.add(EventInterceptor.java:172)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.subtree.SubentryInterceptor.add(SubentryInterceptor.java:992)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.collective.CollectiveAttributeInterceptor.add(CollectiveAttributeInterceptor.java:125)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.operational.OperationalAttributeInterceptor.add(OperationalAttributeInterceptor.java:234)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.schema.SchemaInterceptor.add(SchemaInterceptor.java:1286)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.exception.ExceptionInterceptor.add(ExceptionInterceptor.java:194)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.admin.AdministrativePointInterceptor.add(AdministrativePointInterceptor.java:1203)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.authz.AciAuthorizationInterceptor.add(AciAuthorizationInterceptor.java:527)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.referral.ReferralInterceptor.add(ReferralInterceptor.java:249)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.authn.AuthenticationInterceptor.add(AuthenticationInterceptor.java:317)
     at 
org.apache.directory.server.core.api.interceptor.BaseInterceptor.next(BaseInterceptor.java:410)
     at 
org.apache.directory.server.core.normalization.NormalizationInterceptor.add(NormalizationInterceptor.java:125)
     at 
org.apache.directory.server.core.DefaultOperationManager.add(DefaultOperationManager.java:324)
     at 
org.apache.directory.server.core.shared.DefaultCoreSession.add(DefaultCoreSession.java:163)
     at 
org.apache.directory.server.core.shared.DefaultCoreSession.add(DefaultCoreSession.java:140)
     at 
org.apache.directory.server.operations.search.ReferralSearchIT.setupReferrals(ReferralSearchIT.java:155)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
     at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
     at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
     at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
     at 
org.apache.directory.junit.tools.MultiThreadedMultiInvoker$1$1.run(MultiThreadedMultiInvoker.java:160)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:680)
"Thread-7" daemon prio=5 tid=7fd20bb8d800 nid=0x11373b000 waiting on 
condition [11373a000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7c00237e8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at 
org.apache.directory.ldap.client.api.future.ResponseFuture.get(ResponseFuture.java:108)
     at 
org.apache.directory.ldap.client.api.future.SearchFuture.get(SearchFuture.java:64)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.doSyncSearch(ReplicationConsumerImpl.java:630)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.startSync(ReplicationConsumerImpl.java:505)
     at 
org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl.start(ReplicationConsumerImpl.java:548)
     at 
org.apache.directory.server.ldap.LdapServer$2.run(LdapServer.java:660)
     at java.lang.Thread.run(Thread.java:680)
"SynchWorkerThread" prio=5 tid=7fd20ba77000 nid=0x112870000 in 
Object.wait() [11286f000]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     - waiting on <7bedf1958> (a java.lang.Object)
     at 
org.apache.directory.server.core.DefaultDirectoryService$SynchWorker.run(DefaultDirectoryService.java:323)
     - locked <7bedf1958> (a java.lang.Object)
     at java.lang.Thread.run(Thread.java:680)
"changePwdReplayCache.data" daemon prio=5 tid=7fd20b072000 
nid=0x11348f000 waiting on condition [11348e000]
    java.lang.Thread.State: TIMED_WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7bf002370> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
     at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
     at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:680)
"groupCache.data" daemon prio=5 tid=7fd20b071800 nid=0x11304d000 waiting 
on condition [11304c000]
    java.lang.Thread.State: TIMED_WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7bf0023d0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
     at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
     at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:680)
"kdcReplayCache.data" daemon prio=5 tid=7fd20b0e4000 nid=0x112f4a000 
waiting on condition [112f49000]
    java.lang.Thread.State: TIMED_WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7bf002430> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
     at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
     at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:680)
"net.sf.ehcache.CacheManager@3979c7cf" daemon prio=5 tid=7fd20b0e1800 
nid=0x11145d000 in Object.wait() [11145c000]
    java.lang.Thread.State: WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     - waiting on <7bf0035b8> (a java.util.TaskQueue)
     at java.lang.Object.wait(Object.java:485)
     at java.util.TimerThread.mainLoop(Timer.java:483)
     - locked <7bf0035b8> (a java.util.TaskQueue)
     at java.util.TimerThread.run(Timer.java:462)
"Thread-4" daemon prio=5 tid=7fd209cd1000 nid=0x112e47000 waiting on 
condition [112e46000]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
     at java.lang.Thread.sleep(Native Method)
     at 
org.apache.directory.server.ldap.replication.provider.SyncReplRequestHandler$1.run(SyncReplRequestHandler.java:972)
     at java.lang.Thread.run(Thread.java:680)
"SynchWorkerThread" prio=5 tid=7fd20ac1c000 nid=0x1129f3000 in 
Object.wait() [1129f2000]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     - waiting on <7bd8cf210> (a java.lang.Object)
     at 
org.apache.directory.server.core.DefaultDirectoryService$SynchWorker.run(DefaultDirectoryService.java:323)
     - locked <7bd8cf210> (a java.lang.Object)
     at java.lang.Thread.run(Thread.java:680)
"groupCache.data" daemon prio=5 tid=7fd20aab6800 nid=0x112b7c000 waiting 
on condition [112b7b000]
    java.lang.Thread.State: TIMED_WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7bda4dcd8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
     at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
     at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:680)
"changePwdReplayCache.data" daemon prio=5 tid=7fd209ac2000 
nid=0x112316000 waiting on condition [112315000]
    java.lang.Thread.State: TIMED_WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7bda2ed08> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
     at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
     at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:680)
"kdcReplayCache.data" daemon prio=5 tid=7fd20aab5800 nid=0x1120e1000 
waiting on condition [1120e0000]
    java.lang.Thread.State: TIMED_WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <7bd823570> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
     at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
     at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:680)
"net.sf.ehcache.CacheManager@5f7779e3" daemon prio=5 tid=7fd20ae3b000 
nid=0x1125e5000 in Object.wait() [1125e4000]
    java.lang.Thread.State: WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     - waiting on <7bda2ed68> (a java.util.TaskQueue)
     at java.lang.Object.wait(Object.java:485)
     at java.util.TimerThread.mainLoop(Timer.java:483)
     - locked <7bda2ed68> (a java.util.TaskQueue)
     at java.util.TimerThread.run(Timer.java:462)
"Poller SunPKCS11-Darwin" daemon prio=1 tid=7fd20ae3a000 nid=0x1124e2000 
waiting on condition [1124e1000]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
     at java.lang.Thread.sleep(Native Method)
     at sun.security.pkcs11.SunPKCS11$TokenPoller.run(SunPKCS11.java:692)
     at java.lang.Thread.run(Thread.java:680)
"Low Memory Detector" daemon prio=5 tid=7fd20a8be000 nid=0x11196f000 
runnable [00000000]
    java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=9 tid=7fd20a8bd800 nid=0x11186c000 
waiting on condition [00000000]
    java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=9 tid=7fd20a8bc800 nid=0x111769000 
waiting on condition [00000000]
    java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=9 tid=7fd20a8bc000 nid=0x111666000 
waiting on condition [00000000]
    java.lang.Thread.State: RUNNABLE
"Surrogate Locker Thread (Concurrent GC)" daemon prio=5 tid=7fd209803000 
nid=0x111563000 waiting on condition [00000000]
    java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=8 tid=7fd20a8a6000 nid=0x11129f000 in 
Object.wait() [11129e000]
    java.lang.Thread.State: WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     - waiting on <7bd7b0860> (a java.lang.ref.ReferenceQueue$Lock)
     at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
     - locked <7bd7b0860> (a java.lang.ref.ReferenceQueue$Lock)
     at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
     at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=10 tid=7fd20a8a5800 nid=0x11119c000 in 
Object.wait() [11119b000]
    java.lang.Thread.State: WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     - waiting on <7bd7b0878> (a java.lang.ref.Reference$Lock)
     at java.lang.Object.wait(Object.java:485)
     at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
     - locked <7bd7b0878> (a java.lang.ref.Reference$Lock)
"main" prio=5 tid=7fd20a800800 nid=0x1096ba000 waiting on condition 
[1096b8000]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
     at java.lang.Thread.sleep(Native Method)
     at 
org.apache.directory.junit.tools.MultiThreadedMultiInvoker$1.evaluate(MultiThreadedMultiInvoker.java:189)
     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
     at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
     at 
org.apache.directory.server.core.integ.FrameworkRunner.runChild(FrameworkRunner.java:488)
     at 
org.apache.directory.server.core.integ.FrameworkRunner.runChild(FrameworkRunner.java:58)
     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
     at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
     at 
org.apache.directory.server.core.integ.FrameworkRunner.run(FrameworkRunner.java:256)
     at org.junit.runners.Suite.runChild(Suite.java:128)
     at 
org.apache.directory.server.core.integ.FrameworkSuite.runChild(FrameworkSuite.java:212)
     at 
org.apache.directory.server.core.integ.FrameworkSuite.runChild(FrameworkSuite.java:49)
     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
     at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
     at 
org.apache.directory.server.core.integ.FrameworkSuite.run(FrameworkSuite.java:178)
     at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236)
     at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134)
     at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
     at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
     at 
org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
     at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103)
     at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74)
"VM Thread" prio=9 tid=7fd20a8a0800 nid=0x111099000 runnable
"Gang worker#0 (Parallel GC Threads)" prio=9 tid=7fd20a802000 
nid=0x10ca0d000 runnable
"Gang worker#1 (Parallel GC Threads)" prio=9 tid=7fd20a802800 
nid=0x10cb10000 runnable
"Concurrent Mark-Sweep GC Thread" prio=9 tid=7fd20a84d000 
nid=0x110d39000 runnable
"VM Periodic Task Thread" prio=10 tid=7fd20a8c7800 nid=0x111a72000 
waiting on condition
"Exception Catcher Thread" prio=10 tid=7fd20a801800 nid=0x109834000 
runnable
JNI global references: 1133
Heap
  par new generation   total 19136K, used 16027K [7bae00000, 7bc2c0000, 
7bd790000)
   eden space 17024K,  85% used [7bae00000, 7bbc46c70, 7bbea0000)
   from space 2112K,  66% used [7bbea0000, 7bc000208, 7bc0b0000)
   to   space 2112K,   0% used [7bc0b0000, 7bc0b0000, 7bc2c0000)
  concurrent mark-sweep generation total 178176K, used 130249K 
[7bd790000, 7c8590000, 7fae00000)
  concurrent-mark-sweep perm gen total 52740K, used 32381K [7fae00000, 
7fe181000, 800000000)


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Btw, I have @Ignored the PagedSearch test that was blocking the server.

I'll try to get it fixed later.

Just FYI.


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Hi guys,

as I'm up early, I was thinking about the search issue again.

AFAICT, the only possible option we have here is to grab everything from 
the server when doing a search, before sending back anything to the client.
FTR, a while back (2008 ?), we were doing that but it was an accident : 
as we were not waiting for the client to be ready, we were injecting 
into MINA all the read entries. It ended with frequent OOM when the 
search request was big.

We changed that so the results are only returned when the client is 
ready to read more data.

But back to my initial thoughts : if we were relying on a MVCC capable 
backend (MVCC BTree), then we won't have to gather all the entries, we 
can rely on the backend to return the entries one by one, assuming the 
backend keeps each valid version on disk.

Of course, teh problem is that we have many backends to deal with, so 
the current implementation still holds.

In the ong run, I wonder if using in-memory MVCC Btree for Ldif 
partitions and rely on a MVCC JDBM ould not be a viable solution...

thoughts ?

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Interesting thread...

I must say that on the first for points, you are just spot on.

We discussed that with Alex this morning, and we may have overlooked the 
impacts, thus the proposed "solution".

Le 5/10/12 7:18 PM, Selcuk AYA a écrit :

>
>
> As for the paged search, one way to deal with it would be to read all
> the data from the cursors at the beginning of the paged search and
> close the cursor.
That was an option we also discussed, but we found it extremely costly. 
Sadly, i'm afraid that this is the only viable solution...

> This would be similar to a normal search.
Well, normal searches don't do that. They fetch the entries one by one, 
using a cursor over an index that may not be valid when we fetch teh 
entries.

That means we also have to make a normal search fetch all the entries...

> If we get
> worried about memory consumption of this, the entries to be returned
> could be spilled over to temp files.
That's certainly something we can do. We can even store the data into a 
JDBM HTree, temporarily.

> You might say this might lead to
> temp file that are never claimed but if there are not many of them
> then no big deal.
No, that's something we can handle easily.
> Users are supposed to deal with cleaning up their
> contexts.
Hmm, well, you know, users are... users :)
Anyway, as of today, I have cut the 2.0.0-M7 release (yeah!), will start 
a vote on it, and we will have plenty of time to discuss the best 
possible solution for this critical search issue. We can even postpone 
this to 2.0.0-M8, because I think that it's way more critical atm to 
merge your branch into trunk.

All in all, we are aware that there is a problem, even if the proposed 
solutions are not sane, atm.

That's just plain good...

Many thanks !

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Alex Karasulu <ak...@apache.org>.
I am in agreement with Selcuk's analysis. I did not presume just how nasty
the inconsistency handling would get.

On Thu, May 10, 2012 at 8:18 PM, Selcuk AYA <ay...@gmail.com> wrote:

> On Thu, May 10, 2012 at 5:51 AM, Emmanuel Lécharny <el...@gmail.com>
> wrote:
> > Le 5/10/12 9:58 AM, Emmanuel Lécharny a écrit :
> >
> >> Le 5/10/12 7:57 AM, Selcuk AYA a écrit :
> >>>
> >>> The problem seems to be caused by the test
> >>> testPagedSearchWrongCookie(). This tests failure in pages search by
> >>> sending a bad cookie. After failing, it relies on ctx.close() to
> >>> cleanup the session. Cleanup of the session will close all the cursors
> >>> related to paged searches through the session.
> >>>
> >>> It seems that somehow ctx.close does not result in an unbind message
> >>> at the server side time to time. I do not know what causes this but
> >>> this leaves a cursor open(specifically a NoDups cursor on rdn index).
> >>> Eventually as changes happen to the Rdn index, we run out of freeable
> >>> cache headers. After ignoring this test, pagedsearchit and searchit
> >>> pass fine together. It would be good to understand why arrival of
> >>> unbind message is a hit and miss case in this test.
> >>
> >>
> >> It's absolutly strange... Neither an UnbindRequest nor an AbandonRequest
> >> is sent by JNDI when closing the context, which is a huge bug.
> >>
> >> I have checked the other tests, and an Ubind request is always sent when
> >> we close teh context, except when we get an UnwillingToPerform
> exception.
> >> It seems like the context is in a state where it considers that no
> unbind
> >> should be send after an exception. Although I can do a lookup (and get
> back
> >> the correct response from the server after this excption), the
> connection is
> >> still borked :/
> >>
> >> I'll try to rewite the test using our API to see if it works better, and
> >> investigate with som Sun guys to see if there is an issue in JNDI.
> >>
> >>
> >>
> > Ok, we have had a long discussion with Alex about this problem...
> >
> > The thing is that even for standard PagedSearch, where everything goes
> fine
> > (ie, when the client is done, he has correctly closed the connextion,
> which
> > sends a UbindRequest, which close the cursor etc), we may have dozens of
> > opened cursors for some extend period of time.
> >
> > At some point, we may have a exhausted cache, with no way to evict any
> > elements from it, leading to a server freeze.
> >
> > Not something we can accept from a LDAP server...
> >
> > A suggestion would be to add some parameter in the OperationContext
> telling
> > the underlying layer that a search is done outside of any transaction.
> When
> > we fetch an ID from an index, and we try to get the associated Entry from
> > the master table, if we get an error  because the ID does not exist
> anymore,
> > then we should just ignore the error, and continue the search.
> >
> > But we still want to be sure that in some case, inside the server, we
> still
> > can have transactions over some searches.
> >
> > Thoughts ?
> >
>
> I dont think having non transactional search is a good idea. I agree
> there is a problem with non closed cursors but I dont think this is
> the right way to solve it. We currently do not have transactions for
> the search but a cursor over the jdbm  B tree gets a snapshot view.
> This snapshot view is not only for getting a snapshot view of the data
> but also the structure itself. If you do not have this(and on top of
> this if you dont have txns):
>
>  - you will have to deal with inconsistencies in the Btree data structure
>  - you might get data as NULL from the Btree and you might have to
> deal with it. Or you might have to deal with cases like you counted 10
> children but you actually end up with 9 children while doing a DFS
> search over your data structure.This might look easy but I think it is
> not.
>  - you might get not only stale data but complete garbage. This
> garbage might confuse the code completely(for example if the garbage
> you read was supposed to be a Btree redirect).
>
> Code from ldap protocol handlers down to search is written in a way
> assuming cursors get consistent data. I dont think it is impossible to
> write code expecting all kinds of inconsistencies but it is very
> difficult and the code will be brittle.
>
>
> As for the paged search, one way to deal with it would be to read all
> the data from the cursors at the beginning of the paged search and
> close the cursor. This would be similar to a normal search. If we get
> worried about memory consumption of this, the entries to be returned
> could be spilled over to temp files.You might say this might lead to
> temp file that are never claimed but if there are not many of them
> then no big deal. Users are supposed to deal with cleaning up their
> contexts. Not doing is similar to opening file handles or socket
> connections and never closing them. Such things are bound to create
> problems.
>
>
> >
> >
> > --
> > Regards,
> > Cordialement,
> > Emmanuel Lécharny
> > www.iktek.com
> >
>
> thanks
> Selcuk
>



-- 
Best Regards,
-- Alex

Re: Release troubles and failing tests

Posted by Selcuk AYA <ay...@gmail.com>.
On Thu, May 10, 2012 at 5:51 AM, Emmanuel Lécharny <el...@gmail.com> wrote:
> Le 5/10/12 9:58 AM, Emmanuel Lécharny a écrit :
>
>> Le 5/10/12 7:57 AM, Selcuk AYA a écrit :
>>>
>>> The problem seems to be caused by the test
>>> testPagedSearchWrongCookie(). This tests failure in pages search by
>>> sending a bad cookie. After failing, it relies on ctx.close() to
>>> cleanup the session. Cleanup of the session will close all the cursors
>>> related to paged searches through the session.
>>>
>>> It seems that somehow ctx.close does not result in an unbind message
>>> at the server side time to time. I do not know what causes this but
>>> this leaves a cursor open(specifically a NoDups cursor on rdn index).
>>> Eventually as changes happen to the Rdn index, we run out of freeable
>>> cache headers. After ignoring this test, pagedsearchit and searchit
>>> pass fine together. It would be good to understand why arrival of
>>> unbind message is a hit and miss case in this test.
>>
>>
>> It's absolutly strange... Neither an UnbindRequest nor an AbandonRequest
>> is sent by JNDI when closing the context, which is a huge bug.
>>
>> I have checked the other tests, and an Ubind request is always sent when
>> we close teh context, except when we get an UnwillingToPerform exception.
>> It seems like the context is in a state where it considers that no unbind
>> should be send after an exception. Although I can do a lookup (and get back
>> the correct response from the server after this excption), the connection is
>> still borked :/
>>
>> I'll try to rewite the test using our API to see if it works better, and
>> investigate with som Sun guys to see if there is an issue in JNDI.
>>
>>
>>
> Ok, we have had a long discussion with Alex about this problem...
>
> The thing is that even for standard PagedSearch, where everything goes fine
> (ie, when the client is done, he has correctly closed the connextion, which
> sends a UbindRequest, which close the cursor etc), we may have dozens of
> opened cursors for some extend period of time.
>
> At some point, we may have a exhausted cache, with no way to evict any
> elements from it, leading to a server freeze.
>
> Not something we can accept from a LDAP server...
>
> A suggestion would be to add some parameter in the OperationContext telling
> the underlying layer that a search is done outside of any transaction. When
> we fetch an ID from an index, and we try to get the associated Entry from
> the master table, if we get an error  because the ID does not exist anymore,
> then we should just ignore the error, and continue the search.
>
> But we still want to be sure that in some case, inside the server, we still
> can have transactions over some searches.
>
> Thoughts ?
>

I dont think having non transactional search is a good idea. I agree
there is a problem with non closed cursors but I dont think this is
the right way to solve it. We currently do not have transactions for
the search but a cursor over the jdbm  B tree gets a snapshot view.
This snapshot view is not only for getting a snapshot view of the data
but also the structure itself. If you do not have this(and on top of
this if you dont have txns):

 - you will have to deal with inconsistencies in the Btree data structure
 - you might get data as NULL from the Btree and you might have to
deal with it. Or you might have to deal with cases like you counted 10
children but you actually end up with 9 children while doing a DFS
search over your data structure.This might look easy but I think it is
not.
 - you might get not only stale data but complete garbage. This
garbage might confuse the code completely(for example if the garbage
you read was supposed to be a Btree redirect).

Code from ldap protocol handlers down to search is written in a way
assuming cursors get consistent data. I dont think it is impossible to
write code expecting all kinds of inconsistencies but it is very
difficult and the code will be brittle.


As for the paged search, one way to deal with it would be to read all
the data from the cursors at the beginning of the paged search and
close the cursor. This would be similar to a normal search. If we get
worried about memory consumption of this, the entries to be returned
could be spilled over to temp files.You might say this might lead to
temp file that are never claimed but if there are not many of them
then no big deal. Users are supposed to deal with cleaning up their
contexts. Not doing is similar to opening file handles or socket
connections and never closing them. Such things are bound to create
problems.


>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>

thanks
Selcuk

Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/10/12 9:58 AM, Emmanuel Lécharny a écrit :
> Le 5/10/12 7:57 AM, Selcuk AYA a écrit :
>> The problem seems to be caused by the test
>> testPagedSearchWrongCookie(). This tests failure in pages search by
>> sending a bad cookie. After failing, it relies on ctx.close() to
>> cleanup the session. Cleanup of the session will close all the cursors
>> related to paged searches through the session.
>>
>> It seems that somehow ctx.close does not result in an unbind message
>> at the server side time to time. I do not know what causes this but
>> this leaves a cursor open(specifically a NoDups cursor on rdn index).
>> Eventually as changes happen to the Rdn index, we run out of freeable
>> cache headers. After ignoring this test, pagedsearchit and searchit
>> pass fine together. It would be good to understand why arrival of
>> unbind message is a hit and miss case in this test.
>
> It's absolutly strange... Neither an UnbindRequest nor an 
> AbandonRequest is sent by JNDI when closing the context, which is a 
> huge bug.
>
> I have checked the other tests, and an Ubind request is always sent 
> when we close teh context, except when we get an UnwillingToPerform 
> exception.
> It seems like the context is in a state where it considers that no 
> unbind should be send after an exception. Although I can do a lookup 
> (and get back the correct response from the server after this 
> excption), the connection is still borked :/
>
> I'll try to rewite the test using our API to see if it works better, 
> and investigate with som Sun guys to see if there is an issue in JNDI.
>
>
>
Ok, we have had a long discussion with Alex about this problem...

The thing is that even for standard PagedSearch, where everything goes 
fine (ie, when the client is done, he has correctly closed the 
connextion, which sends a UbindRequest, which close the cursor etc), we 
may have dozens of opened cursors for some extend period of time.

At some point, we may have a exhausted cache, with no way to evict any 
elements from it, leading to a server freeze.

Not something we can accept from a LDAP server...

A suggestion would be to add some parameter in the OperationContext 
telling the underlying layer that a search is done outside of any 
transaction. When we fetch an ID from an index, and we try to get the 
associated Entry from the master table, if we get an error  because the 
ID does not exist anymore, then we should just ignore the error, and 
continue the search.

But we still want to be sure that in some case, inside the server, we 
still can have transactions over some searches.

Thoughts ?


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/10/12 7:57 AM, Selcuk AYA a écrit :
> The problem seems to be caused by the test
> testPagedSearchWrongCookie(). This tests failure in pages search by
> sending a bad cookie. After failing, it relies on ctx.close() to
> cleanup the session. Cleanup of the session will close all the cursors
> related to paged searches through the session.
>
> It seems that somehow ctx.close does not result in an unbind message
> at the server side time to time. I do not know what causes this but
> this leaves a cursor open(specifically a NoDups cursor on rdn index).
> Eventually as changes happen to the Rdn index, we run out of freeable
> cache headers. After ignoring this test, pagedsearchit and searchit
> pass fine together. It would be good to understand why arrival of
> unbind message is a hit and miss case in this test.

It's absolutly strange... Neither an UnbindRequest nor an AbandonRequest 
is sent by JNDI when closing the context, which is a huge bug.

I have checked the other tests, and an Ubind request is always sent when 
we close teh context, except when we get an UnwillingToPerform exception.
It seems like the context is in a state where it considers that no 
unbind should be send after an exception. Although I can do a lookup 
(and get back the correct response from the server after this excption), 
the connection is still borked :/

I'll try to rewite the test using our API to see if it works better, and 
investigate with som Sun guys to see if there is an issue in JNDI.



-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Selcuk AYA <ay...@gmail.com>.
The problem seems to be caused by the test
testPagedSearchWrongCookie(). This tests failure in pages search by
sending a bad cookie. After failing, it relies on ctx.close() to
cleanup the session. Cleanup of the session will close all the cursors
related to paged searches through the session.

It seems that somehow ctx.close does not result in an unbind message
at the server side time to time. I do not know what causes this but
this leaves a cursor open(specifically a NoDups cursor on rdn index).
Eventually as changes happen to the Rdn index, we run out of freeable
cache headers. After ignoring this test, pagedsearchit and searchit
pass fine together. It would be good to understand why arrival of
unbind message is a hit and miss case in this test.


Note that for debugging the problem, I changed JDBM browser(cursor of
jdbm) to take a comment. Comment changes based on where the browser is
opened. I added the browsers to a global list and printed this list
when the out of cache headers situation happened. This pointed to the
code path that left the browser open. This seems to be an inexpensive
way of tracking these kinds of problems.

thanks
Selcuk

On Wed, May 9, 2012 at 4:20 PM, Emmanuel Lécharny <el...@gmail.com> wrote:
> Le 5/9/12 10:39 PM, Selcuk AYA a écrit :
>
>> What I saw happening was there is a reader at version 188. Some reader
>> somehow does not decremented the ref count at this version  and
>> writers keep bumping up the version upto until ~600. Then we run out
>> of freeable cache elements.
>>
>> We always have problem with the same version(188) and the problem
>> always happens when updateding the nbchildren nbdescendants of the
>> parents in rdn index. This is a new code but I dont see anything wrong
>> with it. What I am thinking is that this new code exposed some bug in
>> LRUCache that was there before.
>
> May be because we now are updating the RdnIndex many times instead of only
> one when adding or deleting an entry...
>
>>
>> I am stuck at this point because of my ignorance of the build
>> environment. I use the debugger to step through the code and I have
>> the apacheds-jdbm as a module but the runtime uses a jdbm jar in
>> repository rather than the code I modify so I cannot add debug code to
>> jdbm. I really would appreciate some help here.
>
>
> Here is what I do :
>
> 1) svn co https://svn.apache.org/repos/asf/directory/trunks
> it grabs all the projects from the repo
>
> 2) cd trunks
>
> 3) vi pom.xml
>
> 4) comment the <module>studio</module> module (no need to spend CPU
> compiling studio)
>
> 5) mvn clean install -DskipTests
> It builds everything without tests, saving 20 minutes
>
> 6) mvn eclipse:eclipse
> Now, you shoud have all the .classpath pointing to projects in Eclipse, and
> not on jars
>
> 7) launch eclipse
>
> 8) create the jdbm, apacheds and shared workingsets
>
> 9) import jdbm into jdbm
>
> 10) import shared into shared
>
> 11) Import apacheds into apacheds
>
> 12 in the apacheds/apacheds-jdbm-partition, I open the properties, and in
> the Java Build Path, I add the jdbm project  in the projects tab, and I
> remove the reference to the jdbm lib in the Librairies tab.
>
> Now, you shoul dbe able to put a breakpoint on the jdbm code, and run it in
> debug mode. You will still be able to modify the jdbm code.
>
> of course, I drescribed in extenso what you should do, but if you already
> have almost all present, the only omportant step is (12).
>
> I'm going to crash now, try to catch me early tomorrow (around 7h30 my time,
> 10:30PM your time) if you have any problem...
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>

Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/9/12 10:39 PM, Selcuk AYA a écrit :
> What I saw happening was there is a reader at version 188. Some reader
> somehow does not decremented the ref count at this version  and
> writers keep bumping up the version upto until ~600. Then we run out
> of freeable cache elements.
>
> We always have problem with the same version(188) and the problem
> always happens when updateding the nbchildren nbdescendants of the
> parents in rdn index. This is a new code but I dont see anything wrong
> with it. What I am thinking is that this new code exposed some bug in
> LRUCache that was there before.
May be because we now are updating the RdnIndex many times instead of 
only one when adding or deleting an entry...
>
> I am stuck at this point because of my ignorance of the build
> environment. I use the debugger to step through the code and I have
> the apacheds-jdbm as a module but the runtime uses a jdbm jar in
> repository rather than the code I modify so I cannot add debug code to
> jdbm. I really would appreciate some help here.

Here is what I do :

1) svn co https://svn.apache.org/repos/asf/directory/trunks
it grabs all the projects from the repo

2) cd trunks

3) vi pom.xml

4) comment the <module>studio</module> module (no need to spend CPU 
compiling studio)

5) mvn clean install -DskipTests
It builds everything without tests, saving 20 minutes

6) mvn eclipse:eclipse
Now, you shoud have all the .classpath pointing to projects in Eclipse, 
and not on jars

7) launch eclipse

8) create the jdbm, apacheds and shared workingsets

9) import jdbm into jdbm

10) import shared into shared

11) Import apacheds into apacheds

12 in the apacheds/apacheds-jdbm-partition, I open the properties, and 
in the Java Build Path, I add the jdbm project  in the projects tab, and 
I remove the reference to the jdbm lib in the Librairies tab.

Now, you shoul dbe able to put a breakpoint on the jdbm code, and run it 
in debug mode. You will still be able to modify the jdbm code.

of course, I drescribed in extenso what you should do, but if you 
already have almost all present, the only omportant step is (12).

I'm going to crash now, try to catch me early tomorrow (around 7h30 my 
time, 10:30PM your time) if you have any problem...

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Selcuk AYA <ay...@gmail.com>.
What I saw happening was there is a reader at version 188. Some reader
somehow does not decremented the ref count at this version  and
writers keep bumping up the version upto until ~600. Then we run out
of freeable cache elements.

We always have problem with the same version(188) and the problem
always happens when updateding the nbchildren nbdescendants of the
parents in rdn index. This is a new code but I dont see anything wrong
with it. What I am thinking is that this new code exposed some bug in
LRUCache that was there before.

I am stuck at this point because of my ignorance of the build
environment. I use the debugger to step through the code and I have
the apacheds-jdbm as a module but the runtime uses a jdbm jar in
repository rather than the code I modify so I cannot add debug code to
jdbm. I really would appreciate some help here.

thanks
Selcuk
.
On Wed, May 9, 2012 at 10:40 AM, Emmanuel Lécharny <el...@gmail.com> wrote:
> Le 5/9/12 5:34 PM, Emmanuel Lécharny a écrit :
>
>> Le 5/9/12 6:47 AM, Emmanuel Lécharny a écrit :
>>>
>>>
>>>> I'll run the tests and let you know if it's any better.
>>>
>>>
>>> Still having issues, even after having closed the cursors in the
>>> PagedSearch :/
>>
>>
>> Ok, one step farther...
>>
>> I was able to reprduce the priblem, with some extra logs.
>>
>> When we get blocked, we have a full LRUCache :
>>
>> LRUCache:
>> (numEntries:1024,maxEntries:1024,cacheGets:8071,cacheMisses:35,cachePuts:1462,cachePutSleeps:57)
>>
>> but as I'm dumping the cache content just after, I can see that it
>> contains only 10 elements...
>>
>> Here, there are two possibilities
>> 1- the dump is wrong, and does not dump what is in the cache, but what is
>> in the cache after the blockage, and after some elements have been released
>> 2- we have some serious issue with the counters
>>
>> I do think that -2- is the problem. Will investigate more.
>
> Well, counters are just fine. So there must be somethong wrong elsewhere...
>
> Time for me to seriously check the LRUCache code, in order to understand how
> it works ! We have a cool tool at Apache to share some review :
> https://reviews.apache.org/dashboard/
> We could use it to comment the code, Selcuk.
>
> Sorry if I haven't spent as much time as needed on this part of the code
> today, I had to deal with some other things. Will be back later tonite.
>
> Thanks !
>
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>

Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/9/12 5:34 PM, Emmanuel Lécharny a écrit :
> Le 5/9/12 6:47 AM, Emmanuel Lécharny a écrit :
>>
>>> I'll run the tests and let you know if it's any better.
>>
>> Still having issues, even after having closed the cursors in the 
>> PagedSearch :/
>
> Ok, one step farther...
>
> I was able to reprduce the priblem, with some extra logs.
>
> When we get blocked, we have a full LRUCache :
>
> LRUCache: 
> (numEntries:1024,maxEntries:1024,cacheGets:8071,cacheMisses:35,cachePuts:1462,cachePutSleeps:57)
>
> but as I'm dumping the cache content just after, I can see that it 
> contains only 10 elements...
>
> Here, there are two possibilities
> 1- the dump is wrong, and does not dump what is in the cache, but what 
> is in the cache after the blockage, and after some elements have been 
> released
> 2- we have some serious issue with the counters
>
> I do think that -2- is the problem. Will investigate more.
Well, counters are just fine. So there must be somethong wrong elsewhere...

Time for me to seriously check the LRUCache code, in order to understand 
how it works ! We have a cool tool at Apache to share some review : 
https://reviews.apache.org/dashboard/
We could use it to comment the code, Selcuk.

Sorry if I haven't spent as much time as needed on this part of the code 
today, I had to deal with some other things. Will be back later tonite.

Thanks !


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/9/12 6:47 AM, Emmanuel Lécharny a écrit :
>
>> I'll run the tests and let you know if it's any better.
>
> Still having issues, even after having closed the cursors in the 
> PagedSearch :/

Ok, one step farther...

I was able to reprduce the priblem, with some extra logs.

When we get blocked, we have a full LRUCache :

LRUCache: 
(numEntries:1024,maxEntries:1024,cacheGets:8071,cacheMisses:35,cachePuts:1462,cachePutSleeps:57)

but as I'm dumping the cache content just after, I can see that it 
contains only 10 elements...

Here, there are two possibilities
1- the dump is wrong, and does not dump what is in the cache, but what 
is in the cache after the blockage, and after some elements have been 
released
2- we have some serious issue with the counters

I do think that -2- is the problem. Will investigate more.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/8/12 10:13 PM, Emmanuel Lécharny a écrit :
> Le 5/8/12 9:44 PM, Selcuk AYA a écrit :
>> On Tue, May 8, 2012 at 12:34 PM, Emmanuel 
>> Lécharny<el...@gmail.com>  wrote:
>>> Le 5/8/12 4:40 PM, Emmanuel Lécharny a écrit :
>>>
>>>> Le 5/8/12 8:05 AM, Selcuk AYA a écrit :
>>>>> Hi,
>>>>>
>>>>> On Sun, May 6, 2012 at 7:26 AM, Emmanuel 
>>>>> Lécharny<el...@gmail.com>
>>>>>   wrote:
>>>>>> Le 5/6/12 12:07 PM, Emmanuel Lécharny a écrit :
>>>>>>
>>>>>>> Ok, I think I have found the cursor that was not closed.
>>>>>> Still having the same failure :/ All the cursors are correctly 
>>>>>> closed.
>>>>>> If I
>>>>>> disable the logs (that I use to check the pairing between open and
>>>>>> close), I
>>>>>> get a failure, and if I activate the log, then I get no failure.
>>>>>>
>>>>>> I will try to run all the server-integ tests with NOT_THEADSAFE 
>>>>>> to see
>>>>>> if it
>>>>>> works better.
>>>>> I was trying to build the trunk and see if I can chase the issue we
>>>>> see with the caches.
>>>> FYI, I'm trying to narrow the pb to the tests that make the server to
>>>> freeze, by removing tests from the server-integ suite. It will take 
>>>> a bt of
>>>> time, but as soon as I will have reduced the set of test to the 
>>>> minimum,
>>>> I'll let you know.
>>>>
>>> Ok, I was able to reproduce the pb with just two tests in the suite :
>>>
>>> package org.apache.directory.server.suites;
>>>
>>>
>>> import org.apache.directory.server.annotations.CreateLdapServer;
>>> import org.apache.directory.server.annotations.CreateTransport;
>>> import org.apache.directory.server.core.annotations.ContextEntry;
>>> import org.apache.directory.server.core.annotations.CreateDS;
>>> import org.apache.directory.server.core.annotations.CreateIndex;
>>> import org.apache.directory.server.core.annotations.CreatePartition;
>>> import org.apache.directory.server.core.integ.FrameworkSuite;
>>> import org.apache.directory.server.operations.search.PagedSearchIT;
>>> import org.apache.directory.server.operations.search.SearchIT;
>>> import org.junit.runner.RunWith;
>>> import org.junit.runners.Suite;
>>>
>>>
>>> /**
>>>   * Stock (default configuration) server integration test suite.
>>>   *
>>>   * @author<a href="mailto:dev@directory.apache.org">Apache Directory
>>> Project</a>
>>>   */
>>> @RunWith(FrameworkSuite.class)
>>> @Suite.SuiteClasses(
>>>     {
>>>         // operations.search
>>>         PagedSearchIT.class,
>>>         SearchIT.class,
>>> })
>>> @CreateDS(
>>> name = "SuiteDS",
>>> partitions =
>>>     {
>>>         @CreatePartition(
>>>             name = "example",
>>>             suffix = "dc=example,dc=com",
>>>             contextEntry = @ContextEntry(
>>>                 entryLdif =
>>>                 "dn: dc=example,dc=com\n" +
>>>                     "dc: example\n" +
>>>                     "objectClass: top\n" +
>>>                     "objectClass: domain\n\n"),
>>>             indexes =
>>>                 {
>>>                     @CreateIndex(attribute = "objectClass"),
>>>                     @CreateIndex(attribute = "dc"),
>>>                     @CreateIndex(attribute = "ou")
>>>             })
>>> })
>>> @CreateLdapServer(
>>> transports =
>>>     {
>>>         @CreateTransport(protocol = "LDAP"),
>>>         @CreateTransport(protocol = "LDAPS")
>>> })
>>> public class StockServerISuite
>>> {
>>> }
>>>
>>> It seems like the PersistentSearch is keeping a cursor in the server 
>>> until
>>> the search is abandonned or the user unbinds. That might well be an 
>>> issue.
>>>
>>> Now, what if we have some user starting a pagedSearch, which can 
>>> last for a
>>> very long period of time ? How would it impact the txn layer and the 
>>> cache ?
>> That would have memory implications for the txn layer. Non flushed
>> part of the log would keep growing.
>>
>> Currently that could lead to deadlock at JDBM layer. However if we
>> allow snapshot cache pages to be allocated on demand rather than
>> having a fixed number of cache pages, that would translare into a
>> memory problem as well.
>
> Beside the global consequences on the server (we need to discuss 
> further the item), I just checked the PagedSearch handling code, and 
> found that we don't close the cursors when we run the PagedSearchIT 
> tests, except at the end of the test when the context is closed.
>
> I have added the needed code to close the cursors when abandoning the 
> search request, in the AbandonHandler.
>
> It's quite normal we don't see any unclosed cursor when we log all teh 
> open/close pairs, because when the unbind occurs, the cursors are 
> correctly closed.
>
> I'll run the tests and let you know if it's any better.
Still having issues, even after having closed the cursors in the 
PagedSearch :/
>
>


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/8/12 9:44 PM, Selcuk AYA a écrit :
> On Tue, May 8, 2012 at 12:34 PM, Emmanuel Lécharny<el...@gmail.com>  wrote:
>> Le 5/8/12 4:40 PM, Emmanuel Lécharny a écrit :
>>
>>> Le 5/8/12 8:05 AM, Selcuk AYA a écrit :
>>>> Hi,
>>>>
>>>> On Sun, May 6, 2012 at 7:26 AM, Emmanuel Lécharny<el...@gmail.com>
>>>>   wrote:
>>>>> Le 5/6/12 12:07 PM, Emmanuel Lécharny a écrit :
>>>>>
>>>>>> Ok, I think I have found the cursor that was not closed.
>>>>> Still having the same failure :/ All the cursors are correctly closed.
>>>>> If I
>>>>> disable the logs (that I use to check the pairing between open and
>>>>> close), I
>>>>> get a failure, and if I activate the log, then I get no failure.
>>>>>
>>>>> I will try to run all the server-integ tests with NOT_THEADSAFE to see
>>>>> if it
>>>>> works better.
>>>> I was trying to build the trunk and see if I can chase the issue we
>>>> see with the caches.
>>> FYI, I'm trying to narrow the pb to the tests that make the server to
>>> freeze, by removing tests from the server-integ suite. It will take a bt of
>>> time, but as soon as I will have reduced the set of test to the minimum,
>>> I'll let you know.
>>>
>> Ok, I was able to reproduce the pb with just two tests in the suite :
>>
>> package org.apache.directory.server.suites;
>>
>>
>> import org.apache.directory.server.annotations.CreateLdapServer;
>> import org.apache.directory.server.annotations.CreateTransport;
>> import org.apache.directory.server.core.annotations.ContextEntry;
>> import org.apache.directory.server.core.annotations.CreateDS;
>> import org.apache.directory.server.core.annotations.CreateIndex;
>> import org.apache.directory.server.core.annotations.CreatePartition;
>> import org.apache.directory.server.core.integ.FrameworkSuite;
>> import org.apache.directory.server.operations.search.PagedSearchIT;
>> import org.apache.directory.server.operations.search.SearchIT;
>> import org.junit.runner.RunWith;
>> import org.junit.runners.Suite;
>>
>>
>> /**
>>   * Stock (default configuration) server integration test suite.
>>   *
>>   * @author<a href="mailto:dev@directory.apache.org">Apache Directory
>> Project</a>
>>   */
>> @RunWith(FrameworkSuite.class)
>> @Suite.SuiteClasses(
>>     {
>>         // operations.search
>>         PagedSearchIT.class,
>>         SearchIT.class,
>> })
>> @CreateDS(
>> name = "SuiteDS",
>> partitions =
>>     {
>>         @CreatePartition(
>>             name = "example",
>>             suffix = "dc=example,dc=com",
>>             contextEntry = @ContextEntry(
>>                 entryLdif =
>>                 "dn: dc=example,dc=com\n" +
>>                     "dc: example\n" +
>>                     "objectClass: top\n" +
>>                     "objectClass: domain\n\n"),
>>             indexes =
>>                 {
>>                     @CreateIndex(attribute = "objectClass"),
>>                     @CreateIndex(attribute = "dc"),
>>                     @CreateIndex(attribute = "ou")
>>             })
>> })
>> @CreateLdapServer(
>> transports =
>>     {
>>         @CreateTransport(protocol = "LDAP"),
>>         @CreateTransport(protocol = "LDAPS")
>> })
>> public class StockServerISuite
>> {
>> }
>>
>> It seems like the PersistentSearch is keeping a cursor in the server until
>> the search is abandonned or the user unbinds. That might well be an issue.
>>
>> Now, what if we have some user starting a pagedSearch, which can last for a
>> very long period of time ? How would it impact the txn layer and the cache ?
> That would have memory implications for the txn layer. Non flushed
> part of the log would keep growing.
>
> Currently that could lead to deadlock at JDBM layer. However if we
> allow snapshot cache pages to be allocated on demand rather than
> having a fixed number of cache pages, that would translare into a
> memory problem as well.

Beside the global consequences on the server (we need to discuss further 
the item), I just checked the PagedSearch handling code, and found that 
we don't close the cursors when we run the PagedSearchIT tests, except 
at the end of the test when the context is closed.

I have added the needed code to close the cursors when abandoning the 
search request, in the AbandonHandler.

It's quite normal we don't see any unclosed cursor when we log all teh 
open/close pairs, because when the unbind occurs, the cursors are 
correctly closed.

I'll run the tests and let you know if it's any better.


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Selcuk AYA <ay...@gmail.com>.
On Tue, May 8, 2012 at 12:34 PM, Emmanuel Lécharny <el...@gmail.com> wrote:
> Le 5/8/12 4:40 PM, Emmanuel Lécharny a écrit :
>
>> Le 5/8/12 8:05 AM, Selcuk AYA a écrit :
>>>
>>> Hi,
>>>
>>> On Sun, May 6, 2012 at 7:26 AM, Emmanuel Lécharny<el...@gmail.com>
>>>  wrote:
>>>>
>>>> Le 5/6/12 12:07 PM, Emmanuel Lécharny a écrit :
>>>>
>>>>> Ok, I think I have found the cursor that was not closed.
>>>>
>>>> Still having the same failure :/ All the cursors are correctly closed.
>>>> If I
>>>> disable the logs (that I use to check the pairing between open and
>>>> close), I
>>>> get a failure, and if I activate the log, then I get no failure.
>>>>
>>>> I will try to run all the server-integ tests with NOT_THEADSAFE to see
>>>> if it
>>>> works better.
>>>
>>> I was trying to build the trunk and see if I can chase the issue we
>>> see with the caches.
>>
>> FYI, I'm trying to narrow the pb to the tests that make the server to
>> freeze, by removing tests from the server-integ suite. It will take a bt of
>> time, but as soon as I will have reduced the set of test to the minimum,
>> I'll let you know.
>>
> Ok, I was able to reproduce the pb with just two tests in the suite :
>
> package org.apache.directory.server.suites;
>
>
> import org.apache.directory.server.annotations.CreateLdapServer;
> import org.apache.directory.server.annotations.CreateTransport;
> import org.apache.directory.server.core.annotations.ContextEntry;
> import org.apache.directory.server.core.annotations.CreateDS;
> import org.apache.directory.server.core.annotations.CreateIndex;
> import org.apache.directory.server.core.annotations.CreatePartition;
> import org.apache.directory.server.core.integ.FrameworkSuite;
> import org.apache.directory.server.operations.search.PagedSearchIT;
> import org.apache.directory.server.operations.search.SearchIT;
> import org.junit.runner.RunWith;
> import org.junit.runners.Suite;
>
>
> /**
>  * Stock (default configuration) server integration test suite.
>  *
>  * @author <a href="mailto:dev@directory.apache.org">Apache Directory
> Project</a>
>  */
> @RunWith(FrameworkSuite.class)
> @Suite.SuiteClasses(
>    {
>        // operations.search
>        PagedSearchIT.class,
>        SearchIT.class,
> })
> @CreateDS(
> name = "SuiteDS",
> partitions =
>    {
>        @CreatePartition(
>            name = "example",
>            suffix = "dc=example,dc=com",
>            contextEntry = @ContextEntry(
>                entryLdif =
>                "dn: dc=example,dc=com\n" +
>                    "dc: example\n" +
>                    "objectClass: top\n" +
>                    "objectClass: domain\n\n"),
>            indexes =
>                {
>                    @CreateIndex(attribute = "objectClass"),
>                    @CreateIndex(attribute = "dc"),
>                    @CreateIndex(attribute = "ou")
>            })
> })
> @CreateLdapServer(
> transports =
>    {
>        @CreateTransport(protocol = "LDAP"),
>        @CreateTransport(protocol = "LDAPS")
> })
> public class StockServerISuite
> {
> }
>
> It seems like the PersistentSearch is keeping a cursor in the server until
> the search is abandonned or the user unbinds. That might well be an issue.
>
> Now, what if we have some user starting a pagedSearch, which can last for a
> very long period of time ? How would it impact the txn layer and the cache ?

That would have memory implications for the txn layer. Non flushed
part of the log would keep growing.

Currently that could lead to deadlock at JDBM layer. However if we
allow snapshot cache pages to be allocated on demand rather than
having a fixed number of cache pages, that would translare into a
memory problem as well.

>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>

thanks
Selcuk

Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/8/12 4:40 PM, Emmanuel Lécharny a écrit :
> Le 5/8/12 8:05 AM, Selcuk AYA a écrit :
>> Hi,
>>
>> On Sun, May 6, 2012 at 7:26 AM, Emmanuel 
>> Lécharny<el...@gmail.com>  wrote:
>>> Le 5/6/12 12:07 PM, Emmanuel Lécharny a écrit :
>>>
>>>> Ok, I think I have found the cursor that was not closed.
>>> Still having the same failure :/ All the cursors are correctly 
>>> closed. If I
>>> disable the logs (that I use to check the pairing between open and 
>>> close), I
>>> get a failure, and if I activate the log, then I get no failure.
>>>
>>> I will try to run all the server-integ tests with NOT_THEADSAFE to 
>>> see if it
>>> works better.
>> I was trying to build the trunk and see if I can chase the issue we
>> see with the caches.
> FYI, I'm trying to narrow the pb to the tests that make the server to 
> freeze, by removing tests from the server-integ suite. It will take a 
> bt of time, but as soon as I will have reduced the set of test to the 
> minimum, I'll let you know.
>
Ok, I was able to reproduce the pb with just two tests in the suite :

package org.apache.directory.server.suites;


import org.apache.directory.server.annotations.CreateLdapServer;
import org.apache.directory.server.annotations.CreateTransport;
import org.apache.directory.server.core.annotations.ContextEntry;
import org.apache.directory.server.core.annotations.CreateDS;
import org.apache.directory.server.core.annotations.CreateIndex;
import org.apache.directory.server.core.annotations.CreatePartition;
import org.apache.directory.server.core.integ.FrameworkSuite;
import org.apache.directory.server.operations.search.PagedSearchIT;
import org.apache.directory.server.operations.search.SearchIT;
import org.junit.runner.RunWith;
import org.junit.runners.Suite;


/**
  * Stock (default configuration) server integration test suite.
  *
  * @author <a href="mailto:dev@directory.apache.org">Apache Directory 
Project</a>
  */
@RunWith(FrameworkSuite.class)
@Suite.SuiteClasses(
     {
         // operations.search
         PagedSearchIT.class,
         SearchIT.class,
})
@CreateDS(
name = "SuiteDS",
partitions =
     {
         @CreatePartition(
             name = "example",
             suffix = "dc=example,dc=com",
             contextEntry = @ContextEntry(
                 entryLdif =
                 "dn: dc=example,dc=com\n" +
                     "dc: example\n" +
                     "objectClass: top\n" +
                     "objectClass: domain\n\n"),
             indexes =
                 {
                     @CreateIndex(attribute = "objectClass"),
                     @CreateIndex(attribute = "dc"),
                     @CreateIndex(attribute = "ou")
             })
})
@CreateLdapServer(
transports =
     {
         @CreateTransport(protocol = "LDAP"),
         @CreateTransport(protocol = "LDAPS")
})
public class StockServerISuite
{
}

It seems like the PersistentSearch is keeping a cursor in the server 
until the search is abandonned or the user unbinds. That might well be 
an issue.

Now, what if we have some user starting a pagedSearch, which can last 
for a very long period of time ? How would it impact the txn layer and 
the cache ?

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/8/12 8:05 AM, Selcuk AYA a écrit :
> Hi,
>
> On Sun, May 6, 2012 at 7:26 AM, Emmanuel Lécharny<el...@gmail.com>  wrote:
>> Le 5/6/12 12:07 PM, Emmanuel Lécharny a écrit :
>>
>>> Ok, I think I have found the cursor that was not closed.
>> Still having the same failure :/ All the cursors are correctly closed. If I
>> disable the logs (that I use to check the pairing between open and close), I
>> get a failure, and if I activate the log, then I get no failure.
>>
>> I will try to run all the server-integ tests with NOT_THEADSAFE to see if it
>> works better.
> I was trying to build the trunk and see if I can chase the issue we
> see with the caches.
FYI, I'm trying to narrow the pb to the tests that make the server to 
freeze, by removing tests from the server-integ suite. It will take a bt 
of time, but as soon as I will have reduced the set of test to the 
minimum, I'll let you know.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Selcuk AYA <ay...@gmail.com>.
Hi,

On Sun, May 6, 2012 at 7:26 AM, Emmanuel Lécharny <el...@gmail.com> wrote:
> Le 5/6/12 12:07 PM, Emmanuel Lécharny a écrit :
>
>> Ok, I think I have found the cursor that was not closed.
>
> Still having the same failure :/ All the cursors are correctly closed. If I
> disable the logs (that I use to check the pairing between open and close), I
> get a failure, and if I activate the log, then I get no failure.
>
> I will try to run all the server-integ tests with NOT_THEADSAFE to see if it
> works better.

I was trying to build the trunk and see if I can chase the issue we
see with the caches. However  I am getting the following error. Note
that I tried cleaning the repository.  Can you let me know if there is
workaround?

thanks
Selcuk

ERROR] Failed to execute goal on project apacheds-core-api: Could not
resolve dependencies for project
org.apache.directory.server:apacheds-core-api:bundle:2.0.0-M8-SNAPSHOT:
The following artifacts could not be resolved:
org.apache.directory.shared:shared-i18n:jar:1.0.0-M12,
org.apache.directory.shared:shared-asn1-api:jar:1.0.0-M12,
org.apache.directory.shared:shared-ldap-client-api:jar:1.0.0-M12,
org.apache.directory.shared:shared-ldap-codec-core:jar:1.0.0-M12,
org.apache.directory.shared:shared-ldap-extras-aci:jar:1.0.0-M12,
org.apache.directory.shared:shared-ldap-extras-util:jar:1.0.0-M12,
org.apache.directory.shared:shared-ldap-model:jar:1.0.0-M12,
org.apache.directory.shared:shared-ldap-schema-data:jar:1.0.0-M12,
org.apache.directory.shared:shared-util:jar:1.0.0-M12: Could not find
artifact org.apache.directory.shared:shared-i18n:jar:1.0.0-M12 in
central (http://repo1.maven.org/maven2)



>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>

Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/6/12 12:07 PM, Emmanuel Lécharny a écrit :
> Ok, I think I have found the cursor that was not closed.
Still having the same failure :/ All the cursors are correctly closed. 
If I disable the logs (that I use to check the pairing between open and 
close), I get a failure, and if I activate the log, then I get no failure.

I will try to run all the server-integ tests with NOT_THEADSAFE to see 
if it works better.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Ok, I think I have found the cursor that was not closed.

I'll try to cut the release this week-end (if I get enough intenret 
connectivity).

Le 5/5/12 1:57 PM, Emmanuel Lécharny a écrit :
> Le 5/5/12 12:20 PM, Alex Karasulu a écrit :
>> On Sat, May 5, 2012 at 1:15 PM, Emmanuel 
>> Lécharny<el...@gmail.com>  wrote:
>>> Hi guys,
>>>
>>> I'm trying to get ADS 2.0.0-M7 release done today, but I'm having hard
>>> time... Appart a bug in maven release plugin (mvn release:rollback 
>>> does not
>>> remove the tags, which leads to some very tricky issues with the rat
>>> plugin...), we stll have some random failures in server-integ and
>>> client-api-tests.
>>>
>>> Those failures are well known : timeout in the LRUCache, most 
>>> certainly due
>>> to some cursors not being closed.
>> One way in which we can approach this for the sake of debugging for
>> unclosed cursors is to use bytecode splicing just for testing purposes
>> (not for production or normal development use).
>>
>> If we use something like aspectj to add logging statements to cursor
>> operations and regions of the code where operations occur we can run
>> your tests and then do the accounting on the log output. We should be
>> able to find mismatched opened/closed cursors this way.
>>
> Ok, I don't think it's worthful to continue the release unless the 
> failing tests are fixed. This is the 6th attempt that fails, I'll 
> rollback the ApacheDS releas and try to see why we have those failures.
>


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/5/12 12:20 PM, Alex Karasulu a écrit :
> On Sat, May 5, 2012 at 1:15 PM, Emmanuel Lécharny<el...@gmail.com>  wrote:
>> Hi guys,
>>
>> I'm trying to get ADS 2.0.0-M7 release done today, but I'm having hard
>> time... Appart a bug in maven release plugin (mvn release:rollback does not
>> remove the tags, which leads to some very tricky issues with the rat
>> plugin...), we stll have some random failures in server-integ and
>> client-api-tests.
>>
>> Those failures are well known : timeout in the LRUCache, most certainly due
>> to some cursors not being closed.
> One way in which we can approach this for the sake of debugging for
> unclosed cursors is to use bytecode splicing just for testing purposes
> (not for production or normal development use).
>
> If we use something like aspectj to add logging statements to cursor
> operations and regions of the code where operations occur we can run
> your tests and then do the accounting on the log output. We should be
> able to find mismatched opened/closed cursors this way.
>
Ok, I don't think it's worthful to continue the release unless the 
failing tests are fixed. This is the 6th attempt that fails, I'll 
rollback the ApacheDS releas and try to see why we have those failures.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Release troubles and failing tests

Posted by Alex Karasulu <ak...@apache.org>.
On Sat, May 5, 2012 at 1:15 PM, Emmanuel Lécharny <el...@gmail.com> wrote:
> Hi guys,
>
> I'm trying to get ADS 2.0.0-M7 release done today, but I'm having hard
> time... Appart a bug in maven release plugin (mvn release:rollback does not
> remove the tags, which leads to some very tricky issues with the rat
> plugin...), we stll have some random failures in server-integ and
> client-api-tests.
>
> Those failures are well known : timeout in the LRUCache, most certainly due
> to some cursors not being closed.

One way in which we can approach this for the sake of debugging for
unclosed cursors is to use bytecode splicing just for testing purposes
(not for production or normal development use).

If we use something like aspectj to add logging statements to cursor
operations and regions of the code where operations occur we can run
your tests and then do the accounting on the log output. We should be
able to find mismatched opened/closed cursors this way.

-- 
Best Regards,
-- Alex