You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by "Vladimir Sitnikov (Jira)" <ji...@apache.org> on 2020/01/12 17:39:00 UTC
[jira] [Created] (PULSAR-4) Pulsar precommit Jenkins jobs consume
too much resources and it leads to "unable to create native thread"
Vladimir Sitnikov created PULSAR-4:
--------------------------------------
Summary: Pulsar precommit Jenkins jobs consume too much resources and it leads to "unable to create native thread"
Key: PULSAR-4
URL: https://issues.apache.org/jira/browse/PULSAR-4
Project: Pulsar
Issue Type: Bug
Reporter: Vladimir Sitnikov
Attachments: pulsar_threaddump.txt.gz
See https://lists.apache.org/thread.html/r9cb0772531814fdf10c82b61fb4bb8d3a187852ddf98ac84754bf778%40%3Cbuilds.apache.org%3E
H23 node was unresponsive, and it turned out to have lots of Pulsar Java processes (~14 processes, 9000+ threads):
{noformat}
22058 jenkins 20 0 19.514g 2.156g 33960 S 36.8 2.3 2032:55 /usr/local/asfpackages/java/jdk1.8.0_191/jre/bin/java -Xmx1G -XX:+UseG1GC -Dpulsar.allocator.pooled=false -Dpulsar.allocator.leak_detection=Advanced -Dpulsar.allocator.exit_on_oom=false -Dlog4j.configurationFile=log4j2.xml -jar /home/jenkins/jenkins-slave/workspace/pulsar_precommit_java8/pulsar-broker/target/surefire/surefirebooter5673414172185975509.jar /home/jenkins/jenkins-slave/workspace/pulsar_precommit_java8/pulsar-broker/target/sur
{noformat}
Thread dump includes 1020 threas like
{noformat}
"pulsar-9510-20" #73509 prio=5 os_prio=0 tid=0x00007fba40010000 nid=0xa73 waiting on condition [0x00007fb8dd946000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000cd3bf4d8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{noformat}
733 threads like
{noformat}
"bookkeeper-ml-cache-eviction-6747-1" #51441 prio=5 os_prio=0 tid=0x00007fbb1c31c000 nid=0x58be sleeping[0x00007fb8ea6e6000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.cacheEvictionTask(ManagedLedgerFactoryImpl.java:221)
at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl$$Lambda$70/551994588.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{noformat}
and so on (see the attached threaddump)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)