You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/09/17 07:46:02 UTC
[GitHub] [pulsar] bin-albin opened a new issue #8078: pulsar process shutdown , cpu useed load high
bin-albin opened a new issue #8078:
URL: https://github.com/apache/pulsar/issues/8078
**Describe the bug**
We monitored pulsar process shutdown and saw high CPU utilization. The specific log information before the outage is as follows.
09:54:02.461 [pulsar-zk-session-watcher-5-1] WARN org.apache.pulsar.zookeeper.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 8 seconds
09:54:02.645 [pulsar-web-42-17] INFO org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:02 +0800] "GET /admin/v2/non-persistent/2000087171-harbor/harbor-pool-01/0x40000000_0x80000000?authoritative=false HTTP/1.1" 200 244 "-" "Pulsar-Java-v2.6.0" 1
09:54:02.645 [pulsar-web-42-6] INFO org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:02 +0800] "GET /admin/v2/non-persistent/2000087171-harbor/harbor-pool-01/0x00000000_0x40000000?authoritative=false HTTP/1.1" 200 163 "-" "Pulsar-Java-v2.6.0" 1
09:54:02.742 [pulsar-web-42-31] INFO org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:02 +0800] "GET /admin/v2/non-persistent/2000087171-test-project/pool1/0x80000000_0xc0000000?authoritative=false HTTP/1.1" 200 79 "-" "Pulsar-Java-v2.6.0" 1
09:54:03.464 [pulsar-web-42-17] INFO org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:03 +0800] "GET /admin/v2/non-persistent/2000100247-kmse-test/test/0xc0000000_0xffffffff?authoritative=false HTTP/1.1" 200 75 "-" "Pulsar-Java-v2.6.0" 0
09:54:03.925 [pulsar-web-42-31] INFO org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:03 +0800] "GET /admin/v2/non-persistent/2000127883-klog_demo/nginx/0x40000000_0x80000000?authoritative=false HTTP/1.1" 200 76 "-" "Pulsar-Java-v2.6.0" 1
09:54:04.321 [pulsar-web-42-17] INFO org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:04 +0800] "GET /admin/v2/non-persistent/2000003485-wntest/test1/0x80000000_0xc0000000?authoritative=false HTTP/1.1" 200 73 "-" "Pulsar-Java-v2.6.0" 0
09:54:04.420 [pulsar-web-42-31] INFO org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:04 +0800] "GET /admin/v2/non-persistent/2000003485-lctest/filebeat2/0x80000000_0xc0000000?authoritative=false HTTP/1.1" 200 77 "-" "Pulsar-Java-v2.6.0" 1
09:54:04.425 [pulsar-web-42-17] INFO org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:04 +0800] "GET /admin/v2/non-persistent/2000020194-test/test/0x80000000_0xc0000000?authoritative=false HTTP/1.1" 200 70 "-" "Pulsar-Java-v2.6.0" 0
09:54:04.427 [pulsar-web-42-31] INFO org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:04 +0800] "GET /admin/v2/non-persistent/2000087171-harbor/pool-01/0xc0000000_0xffffffff?authoritative=false HTTP/1.1" 200 75 "-" "Pulsar-Java-v2.6.0" 1
09:54:04.462 [pulsar-zk-session-watcher-5-1] WARN org.apache.pulsar.zookeeper.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 6 seconds
09:54:04.885 [pulsar-web-42-17] INFO org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null][persistent://2000100247-kmse-test/test/bafb612941df43a69fbd5acd27ee0866] received reset cursor on subscription flink-pulsar-a0a931c9-5c13-4e53-b977-b05fe9e9c64e to position 73770:0:-1
09:54:04.886 [pulsar-web-42-17] WARN org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null] [persistent://2000100247-kmse-test/test/bafb612941df43a69fbd5acd27ee0866] Failed to reset cursor on subscription flink-pulsar-a0a931c9-5c13-4e53-b977-b05fe9e9c64e to position 73770:0:-1
java.lang.NullPointerException: null
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:877) ~[com.google.guava-guava-25.1-jre.jar:?]
at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalResetCursorOnPosition(PersistentTopicsBase.java:1801) ~[org.apache.pulsar-pulsar-broker-2.6.0.jar:2.6.0]
at org.apache.pulsar.broker.admin.v2.PersistentTopics.resetCursorOnPosition(PersistentTopics.java:890) ~[org.apache.pulsar-pulsar-broker-2.6.0.jar:2.6.0]
at sun.reflect.GeneratedMethodAccessor94.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) ~[org.glassfish.jersey.core-jersey-server-2.27.jar:?]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:183) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:415) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:104) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:277) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:272) [org.glassfish.jersey.core-jersey-common-2.27.jar:?]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:268) [org.glassfish.jersey.core-jersey-common-2.27.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:316) [org.glassfish.jersey.core-jersey-common-2.27.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:298) [org.glassfish.jersey.core-jersey-common-2.27.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:268) [org.glassfish.jersey.core-jersey-common-2.27.jar:?]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:289) [org.glassfish.jersey.core-jersey-common-2.27.jar:?]
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:256) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:703) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:416) [org.glassfish.jersey.containers-jersey-container-servlet-core-2.27.jar:?]
pulsar node cpu load:
![pulsar cpu](https://user-images.githubusercontent.com/13760768/93435500-2128f200-f8fc-11ea-9205-32d79388383f.jpg)
pulsar log info
The process stopped at about 9:54 this morning
[pulsar-broker-klog-broker-prod-3.ksyun.com-e.log.gz](https://github.com/apache/pulsar/files/5237125/pulsar-broker-klog-broker-prod-3.ksyun.com-e.log.gz)
**Desktop (please complete the following information):**
- OS: centos 7.6
16c64g 2T
**Additional context**
Add any other context about the problem here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] bin-albin commented on issue #8078: pulsar process shutdown , cpu useed load high
Posted by GitBox <gi...@apache.org>.
bin-albin commented on issue #8078:
URL: https://github.com/apache/pulsar/issues/8078#issuecomment-694098652
After the first pulsar node-1 stopped, CPU utilization of the remaining two pulsar nodes began to increase.
This cluster has a three-node, with 20 tenants and 80 namespaces 200+topic. The back end use flink on Pulsar Consumer to write to ES.
The machine is configured as 16C64G2T
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] Technoboy- edited a comment on issue #8078: pulsar process shutdown , cpu useed load high
Posted by GitBox <gi...@apache.org>.
Technoboy- edited a comment on issue #8078:
URL: https://github.com/apache/pulsar/issues/8078#issuecomment-694127156
Task a look at the relative code and find that should we be used if check to avoid throw NPE?
```
if(null == sub){
asyncResponse.resume(new RestException(Status.NOT_FOUND, "Subscription not found"));
return;
}
```
![image](https://user-images.githubusercontent.com/6297296/93455383-de244a00-f90e-11ea-9cb2-032cf60fdb93.png)
@sijie @jiazhai
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] Technoboy- commented on issue #8078: pulsar process shutdown , cpu useed load high
Posted by GitBox <gi...@apache.org>.
Technoboy- commented on issue #8078:
URL: https://github.com/apache/pulsar/issues/8078#issuecomment-694127156
Task a look at the relative code and find that should be used if check to avoid throw NPE?
```
if(null == sub){
asyncResponse.resume(new RestException(Status.NOT_FOUND, "Subscription not found"));
return;
}
```
![image](https://user-images.githubusercontent.com/6297296/93455383-de244a00-f90e-11ea-9cb2-032cf60fdb93.png)
@sijie @jiazhai
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] lhotari commented on issue #8078: pulsar process shutdown , cpu useed load high
Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #8078:
URL: https://github.com/apache/pulsar/issues/8078#issuecomment-706388385
I'm running into a similar problem. However I didn't see a NPE, but there is high CPU usage. If anyone is interested, I have shared the details on the Pulsar Slack at https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1602159831227900 .
[async-profiler](https://github.com/jvm-profiling-tools/async-profiler) seems to be the best way to observe what is going on when Pulsar is running on high CPU. It's possible to get it copied inside a k8s container and do profiling on-the-fly.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] bin-albin commented on issue #8078: pulsar process shutdown , cpu useed load high
Posted by GitBox <gi...@apache.org>.
bin-albin commented on issue #8078:
URL: https://github.com/apache/pulsar/issues/8078#issuecomment-697356919
This is a screenshot of the high occupancy of those classes when there was a problem
![image](https://user-images.githubusercontent.com/13760768/94016891-4d071480-fde1-11ea-99d0-d53a33c2461f.png)
![image](https://user-images.githubusercontent.com/13760768/94016994-6f009700-fde1-11ea-859c-c78d1ecc7f1f.png)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org