You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by "Vladimir Sitnikov (Jira)" <ji...@apache.org> on 2020/01/12 17:39:00 UTC

[jira] [Created] (PULSAR-4) Pulsar precommit Jenkins jobs consume too much resources and it leads to "unable to create native thread"

Vladimir Sitnikov created PULSAR-4:
--------------------------------------

             Summary: Pulsar precommit Jenkins jobs consume too much resources and it leads to "unable to create native thread"
                 Key: PULSAR-4
                 URL: https://issues.apache.org/jira/browse/PULSAR-4
             Project: Pulsar
          Issue Type: Bug
            Reporter: Vladimir Sitnikov
         Attachments: pulsar_threaddump.txt.gz

See https://lists.apache.org/thread.html/r9cb0772531814fdf10c82b61fb4bb8d3a187852ddf98ac84754bf778%40%3Cbuilds.apache.org%3E

H23 node was unresponsive, and it turned out to have lots of Pulsar Java processes (~14 processes, 9000+ threads):

{noformat}
22058 jenkins   20   0 19.514g 2.156g  33960 S  36.8  2.3   2032:55 /usr/local/asfpackages/java/jdk1.8.0_191/jre/bin/java -Xmx1G -XX:+UseG1GC -Dpulsar.allocator.pooled=false -Dpulsar.allocator.leak_detection=Advanced -Dpulsar.allocator.exit_on_oom=false -Dlog4j.configurationFile=log4j2.xml -jar /home/jenkins/jenkins-slave/workspace/pulsar_precommit_java8/pulsar-broker/target/surefire/surefirebooter5673414172185975509.jar /home/jenkins/jenkins-slave/workspace/pulsar_precommit_java8/pulsar-broker/target/sur
{noformat}


Thread dump includes 1020 threas like
{noformat}
"pulsar-9510-20" #73509 prio=5 os_prio=0 tid=0x00007fba40010000 nid=0xa73 waiting on condition [0x00007fb8dd946000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000cd3bf4d8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)
{noformat}

733 threads like
{noformat}
"bookkeeper-ml-cache-eviction-6747-1" #51441 prio=5 os_prio=0 tid=0x00007fbb1c31c000 nid=0x58be sleeping[0x00007fb8ea6e6000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.cacheEvictionTask(ManagedLedgerFactoryImpl.java:221)
	at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl$$Lambda$70/551994588.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)
{noformat}

and so on (see the attached threaddump)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)