You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/09/17 07:46:02 UTC

[GitHub] [pulsar] bin-albin opened a new issue #8078: pulsar process shutdown , cpu useed load high

bin-albin opened a new issue #8078:
URL: https://github.com/apache/pulsar/issues/8078


   **Describe the bug**
   We monitored pulsar process shutdown and saw high CPU utilization. The specific log information before the outage is as follows.
   09:54:02.461 [pulsar-zk-session-watcher-5-1] WARN  org.apache.pulsar.zookeeper.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 8 seconds
   09:54:02.645 [pulsar-web-42-17] INFO  org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:02 +0800] "GET /admin/v2/non-persistent/2000087171-harbor/harbor-pool-01/0x40000000_0x80000000?authoritative=false HTTP/1.1" 200 244 "-" "Pulsar-Java-v2.6.0" 1
   09:54:02.645 [pulsar-web-42-6] INFO  org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:02 +0800] "GET /admin/v2/non-persistent/2000087171-harbor/harbor-pool-01/0x00000000_0x40000000?authoritative=false HTTP/1.1" 200 163 "-" "Pulsar-Java-v2.6.0" 1
   09:54:02.742 [pulsar-web-42-31] INFO  org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:02 +0800] "GET /admin/v2/non-persistent/2000087171-test-project/pool1/0x80000000_0xc0000000?authoritative=false HTTP/1.1" 200 79 "-" "Pulsar-Java-v2.6.0" 1
   09:54:03.464 [pulsar-web-42-17] INFO  org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:03 +0800] "GET /admin/v2/non-persistent/2000100247-kmse-test/test/0xc0000000_0xffffffff?authoritative=false HTTP/1.1" 200 75 "-" "Pulsar-Java-v2.6.0" 0
   09:54:03.925 [pulsar-web-42-31] INFO  org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:03 +0800] "GET /admin/v2/non-persistent/2000127883-klog_demo/nginx/0x40000000_0x80000000?authoritative=false HTTP/1.1" 200 76 "-" "Pulsar-Java-v2.6.0" 1
   09:54:04.321 [pulsar-web-42-17] INFO  org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:04 +0800] "GET /admin/v2/non-persistent/2000003485-wntest/test1/0x80000000_0xc0000000?authoritative=false HTTP/1.1" 200 73 "-" "Pulsar-Java-v2.6.0" 0
   09:54:04.420 [pulsar-web-42-31] INFO  org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:04 +0800] "GET /admin/v2/non-persistent/2000003485-lctest/filebeat2/0x80000000_0xc0000000?authoritative=false HTTP/1.1" 200 77 "-" "Pulsar-Java-v2.6.0" 1
   09:54:04.425 [pulsar-web-42-17] INFO  org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:04 +0800] "GET /admin/v2/non-persistent/2000020194-test/test/0x80000000_0xc0000000?authoritative=false HTTP/1.1" 200 70 "-" "Pulsar-Java-v2.6.0" 0
   09:54:04.427 [pulsar-web-42-31] INFO  org.eclipse.jetty.server.RequestLog - 10.69.65.74 - - [17/Sep/2020:09:54:04 +0800] "GET /admin/v2/non-persistent/2000087171-harbor/pool-01/0xc0000000_0xffffffff?authoritative=false HTTP/1.1" 200 75 "-" "Pulsar-Java-v2.6.0" 1
   09:54:04.462 [pulsar-zk-session-watcher-5-1] WARN  org.apache.pulsar.zookeeper.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 6 seconds
   09:54:04.885 [pulsar-web-42-17] INFO  org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null][persistent://2000100247-kmse-test/test/bafb612941df43a69fbd5acd27ee0866] received reset cursor on subscription flink-pulsar-a0a931c9-5c13-4e53-b977-b05fe9e9c64e to position 73770:0:-1
   09:54:04.886 [pulsar-web-42-17] WARN  org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null] [persistent://2000100247-kmse-test/test/bafb612941df43a69fbd5acd27ee0866] Failed to reset cursor on subscription flink-pulsar-a0a931c9-5c13-4e53-b977-b05fe9e9c64e to position 73770:0:-1
   java.lang.NullPointerException: null
           at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:877) ~[com.google.guava-guava-25.1-jre.jar:?]
           at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalResetCursorOnPosition(PersistentTopicsBase.java:1801) ~[org.apache.pulsar-pulsar-broker-2.6.0.jar:2.6.0]
           at org.apache.pulsar.broker.admin.v2.PersistentTopics.resetCursorOnPosition(PersistentTopics.java:890) ~[org.apache.pulsar-pulsar-broker-2.6.0.jar:2.6.0]
           at sun.reflect.GeneratedMethodAccessor94.invoke(Unknown Source) ~[?:?]
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231]
           at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
           at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) ~[org.glassfish.jersey.core-jersey-server-2.27.jar:?]
           at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
           at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
           at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:183) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
           at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
           at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
           at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:415) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
           at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:104) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
           at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:277) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
           at org.glassfish.jersey.internal.Errors$1.call(Errors.java:272) [org.glassfish.jersey.core-jersey-common-2.27.jar:?]
           at org.glassfish.jersey.internal.Errors$1.call(Errors.java:268) [org.glassfish.jersey.core-jersey-common-2.27.jar:?]
           at org.glassfish.jersey.internal.Errors.process(Errors.java:316) [org.glassfish.jersey.core-jersey-common-2.27.jar:?]
           at org.glassfish.jersey.internal.Errors.process(Errors.java:298) [org.glassfish.jersey.core-jersey-common-2.27.jar:?]
           at org.glassfish.jersey.internal.Errors.process(Errors.java:268) [org.glassfish.jersey.core-jersey-common-2.27.jar:?]
           at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:289) [org.glassfish.jersey.core-jersey-common-2.27.jar:?]
           at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:256) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
           at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:703) [org.glassfish.jersey.core-jersey-server-2.27.jar:?]
           at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:416) [org.glassfish.jersey.containers-jersey-container-servlet-core-2.27.jar:?]
   
   pulsar node cpu load:
   
   ![pulsar cpu](https://user-images.githubusercontent.com/13760768/93435500-2128f200-f8fc-11ea-9205-32d79388383f.jpg)
   
   pulsar log info 
   The process stopped at about 9:54 this morning
   
   [pulsar-broker-klog-broker-prod-3.ksyun.com-e.log.gz](https://github.com/apache/pulsar/files/5237125/pulsar-broker-klog-broker-prod-3.ksyun.com-e.log.gz)
   
   
   **Desktop (please complete the following information):**
    - OS: centos 7.6
    16c64g 2T
   
   
   **Additional context**
   Add any other context about the problem here.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] bin-albin commented on issue #8078: pulsar process shutdown , cpu useed load high

Posted by GitBox <gi...@apache.org>.
bin-albin commented on issue #8078:
URL: https://github.com/apache/pulsar/issues/8078#issuecomment-694098652


   After the first pulsar node-1 stopped, CPU utilization of the remaining two pulsar nodes began to increase.
   
   This cluster has a three-node, with 20 tenants and 80 namespaces 200+topic. The back end use flink on Pulsar Consumer to write to ES.
   
   The machine is configured as 16C64G2T 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Technoboy- edited a comment on issue #8078: pulsar process shutdown , cpu useed load high

Posted by GitBox <gi...@apache.org>.
Technoboy- edited a comment on issue #8078:
URL: https://github.com/apache/pulsar/issues/8078#issuecomment-694127156


   Task a look at the relative code and find that should we be used if check to avoid throw NPE?
   ```
   if(null == sub){
      asyncResponse.resume(new RestException(Status.NOT_FOUND, "Subscription not found"));
      return;
   }
   ```
   ![image](https://user-images.githubusercontent.com/6297296/93455383-de244a00-f90e-11ea-9cb2-032cf60fdb93.png)
   @sijie @jiazhai 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Technoboy- commented on issue #8078: pulsar process shutdown , cpu useed load high

Posted by GitBox <gi...@apache.org>.
Technoboy- commented on issue #8078:
URL: https://github.com/apache/pulsar/issues/8078#issuecomment-694127156


   Task a look at the relative code and find that should be used if check to avoid throw NPE?
   ```
   if(null == sub){
      asyncResponse.resume(new RestException(Status.NOT_FOUND, "Subscription not found"));
      return;
   }
   ```
   ![image](https://user-images.githubusercontent.com/6297296/93455383-de244a00-f90e-11ea-9cb2-032cf60fdb93.png)
   @sijie @jiazhai 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #8078: pulsar process shutdown , cpu useed load high

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #8078:
URL: https://github.com/apache/pulsar/issues/8078#issuecomment-706388385


   I'm running into a similar problem. However I didn't see a NPE, but there is high CPU usage. If anyone is interested, I have shared the details on the Pulsar Slack at https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1602159831227900 .
   [async-profiler](https://github.com/jvm-profiling-tools/async-profiler) seems to be the best way to observe what is going on when Pulsar is running on high CPU. It's possible to get it copied inside a k8s container and do profiling on-the-fly. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] bin-albin commented on issue #8078: pulsar process shutdown , cpu useed load high

Posted by GitBox <gi...@apache.org>.
bin-albin commented on issue #8078:
URL: https://github.com/apache/pulsar/issues/8078#issuecomment-697356919


   This is a screenshot of the high occupancy of those classes when there was a problem
   
   ![image](https://user-images.githubusercontent.com/13760768/94016891-4d071480-fde1-11ea-99d0-d53a33c2461f.png)
   ![image](https://user-images.githubusercontent.com/13760768/94016994-6f009700-fde1-11ea-859c-c78d1ecc7f1f.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org