You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Matthieu Morel (JIRA)" <ji...@apache.org> on 2011/09/13 19:37:12 UTC

[jira] [Created] (BOOKKEEPER-67) BookieReadWriteTest gets blocked and never finishes

BookieReadWriteTest gets blocked and never finishes
---------------------------------------------------

                 Key: BOOKKEEPER-67
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-67
             Project: Bookkeeper
          Issue Type: Bug
         Environment: RHEL4.8 and Debian 6
            Reporter: Matthieu Morel


I systematically reproduce this behaviour on the linux boxes I tested with.

The test gets stuck acquiring permits from a semaphore, normally used for throttling:

"main" prio=10 tid=0x08058c00 nid=0x588d waiting on condition [0xf723c000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0xb5619728> (a java.util.concurrent.Semaphore$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:394)
	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:366)
	at org.apache.bookkeeper.test.BookieReadWriteTest.testShutdown(BookieReadWriteTest.java:815)


The issue might come from the synchronization mechanism used in the test itself. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-67) BookieReadWriteTest gets blocked and never finishes

Posted by "Matthieu Morel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104426#comment-13104426 ] 

Matthieu Morel commented on BOOKKEEPER-67:
------------------------------------------

By systematically, I mean: on every run I did on _linux_ machines.

I'm using a vanilla bookkeeper version from trunk. 

I'm attaching logs. They contain the test output as well as a full thread dump taken when the test was stuck.

As you can see, the issue is most probably due to reaching the max open files limit, which is usually 1024 by default on linux.


> BookieReadWriteTest gets blocked and never finishes
> ---------------------------------------------------
>
>                 Key: BOOKKEEPER-67
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-67
>             Project: Bookkeeper
>          Issue Type: Bug
>         Environment: RHEL4.8 and Debian 6
>            Reporter: Matthieu Morel
>         Attachments: BookieReadWriteTest-RHEL4.8.log
>
>
> I systematically reproduce this behaviour on the linux boxes I tested with.
> The test gets stuck acquiring permits from a semaphore, normally used for throttling:
> "main" prio=10 tid=0x08058c00 nid=0x588d waiting on condition [0xf723c000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0xb5619728> (a java.util.concurrent.Semaphore$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:394)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:366)
> 	at org.apache.bookkeeper.test.BookieReadWriteTest.testShutdown(BookieReadWriteTest.java:815)
> The issue might come from the synchronization mechanism used in the test itself. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-67) BookieReadWriteTest gets blocked and never finishes

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104597#comment-13104597 ] 

Benjamin Reed commented on BOOKKEEPER-67:
-----------------------------------------

on my linux machine i have to increase the number of open files:


ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15758
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 15758
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

perhaps OSX ignores the open file limit

> BookieReadWriteTest gets blocked and never finishes
> ---------------------------------------------------
>
>                 Key: BOOKKEEPER-67
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-67
>             Project: Bookkeeper
>          Issue Type: Bug
>         Environment: RHEL4.8 and Debian 6
>            Reporter: Matthieu Morel
>         Attachments: BookieReadWriteTest-RHEL4.8.log
>
>
> I systematically reproduce this behaviour on the linux boxes I tested with.
> The test gets stuck acquiring permits from a semaphore, normally used for throttling:
> "main" prio=10 tid=0x08058c00 nid=0x588d waiting on condition [0xf723c000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0xb5619728> (a java.util.concurrent.Semaphore$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:394)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:366)
> 	at org.apache.bookkeeper.test.BookieReadWriteTest.testShutdown(BookieReadWriteTest.java:815)
> The issue might come from the synchronization mechanism used in the test itself. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-67) BookieReadWriteTest gets blocked and never finishes

Posted by "Matthieu Morel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105473#comment-13105473 ] 

Matthieu Morel commented on BOOKKEEPER-67:
------------------------------------------

It looks like now the issue boils down to making this decision: should bookkeeper / hedwig build out of the box on standard linux systems?

I think that should be the case. But it won't (and shouldn't) if tests fail. It's really annoying if one cannot directly build from source, especially for new/prospective bookkeeper/hedwig users and users that cannot change a low soft open files limit. 

Maybe we could make some compromise: the shutdown test could only be activated on machines with a large soft limit for open files? And we would make sure that our continuous integration host is configured that way, so that the related bug could be detected in case it reappears with a given patch... 

> BookieReadWriteTest gets blocked and never finishes
> ---------------------------------------------------
>
>                 Key: BOOKKEEPER-67
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-67
>             Project: Bookkeeper
>          Issue Type: Bug
>         Environment: RHEL4.8 and Debian 6
>            Reporter: Matthieu Morel
>         Attachments: BookieReadWriteTest-RHEL4.8.log, ShowFileDescriptorsInfo.java
>
>
> I systematically reproduce this behaviour on the linux boxes I tested with.
> The test gets stuck acquiring permits from a semaphore, normally used for throttling:
> "main" prio=10 tid=0x08058c00 nid=0x588d waiting on condition [0xf723c000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0xb5619728> (a java.util.concurrent.Semaphore$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:394)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:366)
> 	at org.apache.bookkeeper.test.BookieReadWriteTest.testShutdown(BookieReadWriteTest.java:815)
> The issue might come from the synchronization mechanism used in the test itself. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-67) BookieReadWriteTest gets blocked and never finishes

Posted by "Matthieu Morel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104760#comment-13104760 ] 

Matthieu Morel commented on BOOKKEEPER-67:
------------------------------------------

Flavio, I suppose the output you show is from a macosx system.

I confirm that on macosx, the default JVM (1.6.0_26 on my machine) actually seems to _ignore_ the soft max open file limit, and uses a hardcoded limit of 10240. Some hints there http://lists.apple.com/archives/Java-dev/2008/Aug/msg00212.html

Attached is a small program that outputs the current number of open files and maximum number of open files for a java process, as seen by the JVM. (pardon the circumvoluted way to fetch the info!) You run it as: java -Dcom.sun.management.jmxremote.port=4086 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -cp . ShowFileDescriptorsInfo

output on macosx 10.6.8: 

$ ulimit -a | grep files
open files                      (-n) 256
$ java -Dcom.sun.management.jmxremote.port=4086 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -cp . ShowFileDescriptorsInfo
OpenFileDescriptorCount -> 43
MaxFileDescriptorCount -> 10240


On linux (RHEL4.8), the soft limit is taken into account by the JVM (standard oracle/sun), therefore we reach the default soft limit of 1024, and things break (I tracked the creation of files in bookkeeper.bookie.FileInfo and can confirm that).

$ ulimit -a |grep files
open files                      (-n) 1024
$ java -Dcom.sun.management.jmxremote.port=4086 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -cp . ShowFileDescriptorsInfo
OpenFileDescriptorCount -> 15
MaxFileDescriptorCount -> 1024



I'm not sure if we need to create that many ledgers in this test, couldn't the same point be made with 100 ledgers? Then the test would pass out of the box on linux as well.



> BookieReadWriteTest gets blocked and never finishes
> ---------------------------------------------------
>
>                 Key: BOOKKEEPER-67
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-67
>             Project: Bookkeeper
>          Issue Type: Bug
>         Environment: RHEL4.8 and Debian 6
>            Reporter: Matthieu Morel
>         Attachments: BookieReadWriteTest-RHEL4.8.log, ShowFileDescriptorsInfo.java
>
>
> I systematically reproduce this behaviour on the linux boxes I tested with.
> The test gets stuck acquiring permits from a semaphore, normally used for throttling:
> "main" prio=10 tid=0x08058c00 nid=0x588d waiting on condition [0xf723c000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0xb5619728> (a java.util.concurrent.Semaphore$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:394)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:366)
> 	at org.apache.bookkeeper.test.BookieReadWriteTest.testShutdown(BookieReadWriteTest.java:815)
> The issue might come from the synchronization mechanism used in the test itself. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-67) BookieReadWriteTest gets blocked and never finishes

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104441#comment-13104441 ] 

Flavio Junqueira commented on BOOKKEEPER-67:
--------------------------------------------

Interestingly, when I run "ulimit -a", I get that the maximum number of open file descriptors is 256:

{noformat}
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) 6144
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 266
virtual memory          (kbytes, -v) unlimited
{noformat}

and yet I don't see the same problem. I'm puzzled...

> BookieReadWriteTest gets blocked and never finishes
> ---------------------------------------------------
>
>                 Key: BOOKKEEPER-67
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-67
>             Project: Bookkeeper
>          Issue Type: Bug
>         Environment: RHEL4.8 and Debian 6
>            Reporter: Matthieu Morel
>         Attachments: BookieReadWriteTest-RHEL4.8.log
>
>
> I systematically reproduce this behaviour on the linux boxes I tested with.
> The test gets stuck acquiring permits from a semaphore, normally used for throttling:
> "main" prio=10 tid=0x08058c00 nid=0x588d waiting on condition [0xf723c000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0xb5619728> (a java.util.concurrent.Semaphore$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:394)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:366)
> 	at org.apache.bookkeeper.test.BookieReadWriteTest.testShutdown(BookieReadWriteTest.java:815)
> The issue might come from the synchronization mechanism used in the test itself. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-67) BookieReadWriteTest gets blocked and never finishes

Posted by "Matthieu Morel (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthieu Morel updated BOOKKEEPER-67:
-------------------------------------

    Attachment: ShowFileDescriptorsInfo.java

program that displays file descriptor info as seen by the vm

> BookieReadWriteTest gets blocked and never finishes
> ---------------------------------------------------
>
>                 Key: BOOKKEEPER-67
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-67
>             Project: Bookkeeper
>          Issue Type: Bug
>         Environment: RHEL4.8 and Debian 6
>            Reporter: Matthieu Morel
>         Attachments: BookieReadWriteTest-RHEL4.8.log, ShowFileDescriptorsInfo.java
>
>
> I systematically reproduce this behaviour on the linux boxes I tested with.
> The test gets stuck acquiring permits from a semaphore, normally used for throttling:
> "main" prio=10 tid=0x08058c00 nid=0x588d waiting on condition [0xf723c000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0xb5619728> (a java.util.concurrent.Semaphore$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:394)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:366)
> 	at org.apache.bookkeeper.test.BookieReadWriteTest.testShutdown(BookieReadWriteTest.java:815)
> The issue might come from the synchronization mechanism used in the test itself. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-67) BookieReadWriteTest gets blocked and never finishes

Posted by "Matthieu Morel (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthieu Morel updated BOOKKEEPER-67:
-------------------------------------

    Attachment: BookieReadWriteTest-RHEL4.8.log

test log on RHEL4.8, jdk oracle/sun 1.6.0_25

> BookieReadWriteTest gets blocked and never finishes
> ---------------------------------------------------
>
>                 Key: BOOKKEEPER-67
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-67
>             Project: Bookkeeper
>          Issue Type: Bug
>         Environment: RHEL4.8 and Debian 6
>            Reporter: Matthieu Morel
>         Attachments: BookieReadWriteTest-RHEL4.8.log
>
>
> I systematically reproduce this behaviour on the linux boxes I tested with.
> The test gets stuck acquiring permits from a semaphore, normally used for throttling:
> "main" prio=10 tid=0x08058c00 nid=0x588d waiting on condition [0xf723c000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0xb5619728> (a java.util.concurrent.Semaphore$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:394)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:366)
> 	at org.apache.bookkeeper.test.BookieReadWriteTest.testShutdown(BookieReadWriteTest.java:815)
> The issue might come from the synchronization mechanism used in the test itself. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-67) BookieReadWriteTest gets blocked and never finishes

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105219#comment-13105219 ] 

Flavio Junqueira commented on BOOKKEEPER-67:
--------------------------------------------

Let me give some context. Netty has this known issue when shutting down: it hangs when we call releaseExternalResources() on the channel factory and there are connections open. In the client, we try to open connections to bookies (if necessary) when we create ledgers, and a race was causing us to have pending connections that would cause the client to hang while shutting down. 

We have only been able to reproduce the problem reliably creating a large number, so the magic number 10k. I don't know where the sweet spot is, so it is possible that we are able to reproduce reliably with fewer ledgers, but I'm not sure how to pick a smaller number and guarantee that the problem will pop up in the case the problem is not fixed. 

One issue is that we have apparently fixed the problem, so if you bring it down to 100, it should work, but if the change the value, it would be good to make sure that we will be able to catch the bug in the future in the case some patch reintroduces it.   

> BookieReadWriteTest gets blocked and never finishes
> ---------------------------------------------------
>
>                 Key: BOOKKEEPER-67
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-67
>             Project: Bookkeeper
>          Issue Type: Bug
>         Environment: RHEL4.8 and Debian 6
>            Reporter: Matthieu Morel
>         Attachments: BookieReadWriteTest-RHEL4.8.log, ShowFileDescriptorsInfo.java
>
>
> I systematically reproduce this behaviour on the linux boxes I tested with.
> The test gets stuck acquiring permits from a semaphore, normally used for throttling:
> "main" prio=10 tid=0x08058c00 nid=0x588d waiting on condition [0xf723c000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0xb5619728> (a java.util.concurrent.Semaphore$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:394)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:366)
> 	at org.apache.bookkeeper.test.BookieReadWriteTest.testShutdown(BookieReadWriteTest.java:815)
> The issue might come from the synchronization mechanism used in the test itself. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-67) BookieReadWriteTest gets blocked and never finishes

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104368#comment-13104368 ] 

Flavio Junqueira commented on BOOKKEEPER-67:
--------------------------------------------

Also, please upload the test logs.

> BookieReadWriteTest gets blocked and never finishes
> ---------------------------------------------------
>
>                 Key: BOOKKEEPER-67
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-67
>             Project: Bookkeeper
>          Issue Type: Bug
>         Environment: RHEL4.8 and Debian 6
>            Reporter: Matthieu Morel
>
> I systematically reproduce this behaviour on the linux boxes I tested with.
> The test gets stuck acquiring permits from a semaphore, normally used for throttling:
> "main" prio=10 tid=0x08058c00 nid=0x588d waiting on condition [0xf723c000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0xb5619728> (a java.util.concurrent.Semaphore$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:394)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:366)
> 	at org.apache.bookkeeper.test.BookieReadWriteTest.testShutdown(BookieReadWriteTest.java:815)
> The issue might come from the synchronization mechanism used in the test itself. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-67) BookieReadWriteTest gets blocked and never finishes

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104434#comment-13104434 ] 

Flavio Junqueira commented on BOOKKEEPER-67:
--------------------------------------------

It looks like it is testShutdown, it creates 10k ledgers...

> BookieReadWriteTest gets blocked and never finishes
> ---------------------------------------------------
>
>                 Key: BOOKKEEPER-67
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-67
>             Project: Bookkeeper
>          Issue Type: Bug
>         Environment: RHEL4.8 and Debian 6
>            Reporter: Matthieu Morel
>         Attachments: BookieReadWriteTest-RHEL4.8.log
>
>
> I systematically reproduce this behaviour on the linux boxes I tested with.
> The test gets stuck acquiring permits from a semaphore, normally used for throttling:
> "main" prio=10 tid=0x08058c00 nid=0x588d waiting on condition [0xf723c000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0xb5619728> (a java.util.concurrent.Semaphore$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:394)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:366)
> 	at org.apache.bookkeeper.test.BookieReadWriteTest.testShutdown(BookieReadWriteTest.java:815)
> The issue might come from the synchronization mechanism used in the test itself. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-67) BookieReadWriteTest gets blocked and never finishes

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104365#comment-13104365 ] 

Flavio Junqueira commented on BOOKKEEPER-67:
--------------------------------------------

Matthieu, Does it get stuck on every run? I'm not sure what you mean with "systematically". 

I have just run a couple of times and confirmed that it works for me. Is there anything special about your environment? Also, just to confirm, are you waiting long enough? There are 24 tests to run in ReadWrite test set. 

> BookieReadWriteTest gets blocked and never finishes
> ---------------------------------------------------
>
>                 Key: BOOKKEEPER-67
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-67
>             Project: Bookkeeper
>          Issue Type: Bug
>         Environment: RHEL4.8 and Debian 6
>            Reporter: Matthieu Morel
>
> I systematically reproduce this behaviour on the linux boxes I tested with.
> The test gets stuck acquiring permits from a semaphore, normally used for throttling:
> "main" prio=10 tid=0x08058c00 nid=0x588d waiting on condition [0xf723c000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0xb5619728> (a java.util.concurrent.Semaphore$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:394)
> 	at org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:366)
> 	at org.apache.bookkeeper.test.BookieReadWriteTest.testShutdown(BookieReadWriteTest.java:815)
> The issue might come from the synchronization mechanism used in the test itself. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira