You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/06/02 09:23:34 UTC
[GitHub] [pulsar] hmoragrega opened a new issue #7140: Zombie producers shown as connected after a broker instance crashed
hmoragrega opened a new issue #7140:
URL: https://github.com/apache/pulsar/issues/7140
**Describe the bug**
A broker crash due to "out of memory" left zombie producers connected
Our setup:
- k8s cluster
- 3 pulsar brokers
- 3 pulsar proxies
- X Services publishing to the same topic using official CGo library
**Steps to reproduce the behavior:**
Force a OOM crash in a broker with connected producers, the producers connected to the broker are still listed as connected and have to be manually deleted
In our case producers tried to reconnect automatically to the cluster using the same producer name but received this error `Failed to create producer: ProducerBusy`.
Only a change in the producer name fixed the issue
The list of producers for the topic show old instances with prefix `74966b7f75` yet only 5 instances producing messages are running in our k8s cluster
```
"publishers" : [ {
"producerName" : "service-x-events-968974b58-7wwcm",
"connectedSince" : "2020-06-02T07:09:08.453Z",
"address" : "/10.1.30.2:57634"
}, {
"producerName" : "service-x-events-74966b7f75-x9cfq",
"connectedSince" : "2020-06-01T20:32:21.863Z",
"address" : "/10.1.39.51:49418"
}, {
"producerName" : "service-x-events-968974b58-8v679",
"connectedSince" : "2020-06-02T04:56:23.638Z",
"address" : "/10.1.16.2:59924"
}, {
"producerName" : "service-x-events-968974b58-99qpk",
"connectedSince" : "2020-06-02T04:56:25.752Z",
"address" : "/10.1.16.2:59920"
}, {
"producerName" : "service-x-events-74966b7f75-c6ljj",
"connectedSince" : "2020-06-01T20:32:21.863Z",
"address" : "/10.1.39.51:59820"
}, {
"producerName" : "service-x-events-74966b7f75-rvtjt",
"connectedSince" : "2020-06-01T20:32:21.863Z",
"address" : "/10.1.16.2:56450"
}, {
"producerName" : "service-x-events-968974b58-86p8p",
"connectedSince" : "2020-06-02T04:00:07.853Z",
"address" : "/10.1.39.51:52732"
}, {
"producerName" : "service-x-events-968974b58-9wp9m",
"connectedSince" : "2020-06-02T07:05:30.878Z",
"address" : "/10.1.30.2:57488"
} ],
```
```
service-x-968974b58-99qpk Running 0 Jun 2, 2020, 3:57:07 AM
service-x-968974b58-86p8p Running 0 Jun 2, 2020, 3:57:07 AM
service-x-968974b58-8v679 Running 0 Jun 2, 2020, 3:57:07 AM
service-x-968974b58-9wp9m Running 0 Jun 2, 2020, 7:00:26 AM
service-x-968974b58-7wwcm Running 0 Jun 2, 2020, 7:00:57 AM
```
**Expected behavior**
The old producers are removed after the broker crash
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] sijie commented on issue #7140: Zombie producers shown as connected after a broker instance crashed
Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7140:
URL: https://github.com/apache/pulsar/issues/7140#issuecomment-637902988
@hmoragrega
A couple of questions:
- which version of pulsar broker are you using?
- how did you cause broker crash? one thing I don't understand in the description is if the broker was crashed, the JVM process should be gone. Are you using a different approach to force broker crash?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] larslorenzen commented on issue #7140: Zombie producers shown as connected after a broker instance crashed
Posted by GitBox <gi...@apache.org>.
larslorenzen commented on issue #7140:
URL: https://github.com/apache/pulsar/issues/7140#issuecomment-648651707
Also happened to us after a broker restart: https://github.com/apache/pulsar/issues/7059#issuecomment-648041771. Producer could not reconnect.
Pulsar broker version: 2.5.2
Pulsar producer version: 2.5.2
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] hmoragrega commented on issue #7140: Zombie producers shown as connected after a broker instance crashed
Posted by GitBox <gi...@apache.org>.
hmoragrega commented on issue #7140:
URL: https://github.com/apache/pulsar/issues/7140#issuecomment-638009352
thanks for checking out:
* We are running a docker image: `apachepulsar/pulsar-all:2.5.1`
* The broker process was killed by k8s:
```
2020-06-01 20:31:26.000 UTC
Memory cgroup out of memory: Kill process 281696 (java) score 1862 or sacrifice child Killed process 281696 (java) total-vm:6202804kB, anon-rss:2484912kB, file-rss:24084kB, shmem-rss:0kB
2020-06-01 20:31:26.152 UTC
Killed process 281696 (java) total-vm:6202804kB, anon-rss:2484912kB, file-rss:24084kB, shmem-rss:0kB}] Conditions:[{Type:KernelDeadlock Status:False Transition:2020-05-10 23:28:18.17330381 +0000 UTC m=+5.104380376 Reason:KernelHasNoDeadlock Message:kernel has no deadlock} {Type:ReadonlyFilesystem Status:False Transition:2020-05-10 23:28:18.173304129 +0000 UTC m=+5.104380498 Reason:FilesystemIsNotReadOnly Message:Filesystem is not read-only}]}
```
I've checked the pulsar logs around the event but I haven't found any outstanding message, just the application being restarted
```
I 2020-06-01T20:31:19.772967095Z 20:31:19.772 [pulsar-client-io-42-1] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://pulsar/pulsar/10.1.2.182:8080/healthcheck][reader-98d2f9bedb] Subscribed to topic on 10.1.2.182/10.1.2.182:6650 -- consumer: 3380
I 2020-06-01T20:31:19.785581683Z 20:31:19.785 [pulsar-io-22-3] INFO org.apache.pulsar.broker.service.ServerCnx - [PersistentTopic{topic=persistent://pulsar/pulsar/10.1.2.182:8080/healthcheck}][pulsar-20-3385] Closing producer on cnx /10.1.2.182:33968
I 2020-06-01T20:31:19.785694992Z 20:31:19.785 [pulsar-io-22-3] INFO org.apache.pulsar.broker.service.ServerCnx - [PersistentTopic{topic=persistent://pulsar/pulsar/10.1.2.182:8080/healthcheck}][pulsar-20-3385] Closed producer on cnx /10.1.2.182:33968
I 2020-06-01T20:31:19.785802934Z 20:31:19.785 [pulsar-client-io-42-1] INFO org.apache.pulsar.client.impl.ProducerImpl - [persistent://pulsar/pulsar/10.1.2.182:8080/healthcheck] [pulsar-20-3385] Closed Producer
I 2020-06-01T20:31:19.785898196Z 20:31:19.785 [pulsar-io-22-3] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.1.2.182:33968] Closing consumer: 3380
I 2020-06-01T20:31:19.786763101Z 20:31:19.786 [pulsar-io-22-3] INFO org.apache.pulsar.broker.service.AbstractDispatcherSingleActiveConsumer - Removing consumer Consumer{subscription=PersistentSubscription{topic=persistent://pulsar/pulsar/10.1.2.182:8080/healthcheck, name=reader-98d2f9bedb}, consumerId=3380, consumerName=9574f, address=/10.1.2.182:33968}
I 2020-06-01T20:31:19.787468692Z 20:31:19.786 [pulsar-io-22-3] INFO org.apache.pulsar.broker.service.persistent.PersistentSubscription - [persistent://pulsar/pulsar/10.1.2.182:8080/healthcheck][reader-98d2f9bedb] Successfully closed subscription [NonDurableCursorImpl{ledger=pulsar/pulsar/10.1.2.182:8080/persistent/healthcheck, ackPos=38739:397, readPos=38739:398}]
I 2020-06-01T20:31:19.787644898Z 20:31:19.787 [pulsar-io-22-3] INFO org.apache.pulsar.broker.service.persistent.PersistentSubscription - [persistent://pulsar/pulsar/10.1.2.182:8080/healthcheck][reader-98d2f9bedb] Successfully closed dispatcher for reader
I 2020-06-01T20:31:19.787662374Z 20:31:19.787 [pulsar-external-listener-43-1] INFO org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [01/Jun/2020:20:31:19 +0000] "GET /admin/v2/brokers/health HTTP/1.1" 200 2 "-" "curl/7.64.0" 26
I 2020-06-01T20:31:19.787773157Z 20:31:19.787 [pulsar-io-22-3] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.1.2.182:33968] Closed consumer Consumer{subscription=PersistentSubscription{topic=persistent://pulsar/pulsar/10.1.2.182:8080/healthcheck, name=reader-98d2f9bedb}, consumerId=3380, consumerName=9574f, address=/10.1.2.182:33968}
I 2020-06-01T20:31:19.787804660Z 20:31:19.787 [pulsar-client-io-42-1] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://pulsar/pulsar/10.1.2.182:8080/healthcheck] [reader-98d2f9bedb] Closed consumer
I 2020-06-01T20:31:23.906091867Z 20:31:23.905 [pulsar-ordered-OrderedExecutor-3-0-EventThread] INFO org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x300369d33e0001b local:/10.1.2.182:33466 remoteserver:pulsar-zookeeper-2.pulsar-zookeeper/10.1.8.80:2181 lastZxid:25770466537 xid:107048 sent:107048 recv:124047 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/10.1.41.80:8080
I 2020-06-01T20:31:24.719776356Z 20:31:24.719 [pulsar-load-manager-3-1] INFO org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Writing local data to ZooKeeper because maximum change 11.40127182006836% exceeded threshold 10%; time since last report written is 5.0 seconds
I 2020-06-01T20:31:24.725426343Z 20:31:24.725 [pulsar-ordered-OrderedExecutor-3-0-EventThread] INFO org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x300369d33e0001b local:/10.1.2.182:33466 remoteserver:pulsar-zookeeper-2.pulsar-zookeeper/10.1.8.80:2181 lastZxid:25770466539 xid:107051 sent:107051 recv:124051 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/10.1.2.182:8080
// Restart
I 2020-06-01T20:31:27.095166566Z [conf/pulsar_env.sh] Applying config PULSAR_EXTRA_OPTS = -Dpulsar.log.root.level=info
I 2020-06-01T20:31:27.095312356Z [conf/pulsar_env.sh] Applying config PULSAR_GC = "-XX:+UseG1GC -XX:MaxGCPauseMillis=10"
I 2020-06-01T20:31:27.095324965Z [conf/pulsar_env.sh] Applying config PULSAR_MEM = "-Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem"
I 2020-06-01T20:31:27.095333038Z [conf/broker.conf] Applying config advertisedAddress = 10.1.2.182
...
I 2020-06-01T20:31:27.095428042Z [conf/broker.conf] Applying config zookeeperServers = pulsar-zookeeper-0.pulsar-zookeeper,pulsar-zookeeper-1.pulsar-zookeeper,pulsar-zookeeper-2.pulsar-zookeeper
I 2020-06-01T20:31:27.118097760Z [conf/pulsar_env.sh] Applying config PULSAR_EXTRA_OPTS = -Dpulsar.log.root.level=info
I 2020-06-01T20:31:27.118143135Z [conf/pulsar_env.sh] Applying config PULSAR_GC = "-XX:+UseG1GC -XX:MaxGCPauseMillis=10"
I 2020-06-01T20:31:27.118149618Z [conf/pulsar_env.sh] Applying config PULSAR_MEM = "-Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem"
I 2020-06-01T20:31:27.118155877Z [conf/pulsar_env.sh] Applying config PULSAR_EXTRA_OPTS = -Dpulsar.log.root.level=info
I 2020-06-01T20:31:27.118160698Z [conf/pulsar_env.sh] Applying config PULSAR_GC = "-XX:+UseG1GC -XX:MaxGCPauseMillis=10"
I 2020-06-01T20:31:27.118165936Z [conf/pulsar_env.sh] Applying config PULSAR_MEM = "-Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem"
I 2020-06-01T20:31:27.141939511Z [conf/pulsar_env.sh] Applying config PULSAR_EXTRA_OPTS = -Dpulsar.log.root.level=info
I 2020-06-01T20:31:27.141971873Z [conf/pulsar_env.sh] Applying config PULSAR_GC = "-XX:+UseG1GC -XX:MaxGCPauseMillis=10"
I 2020-06-01T20:31:27.142012430Z [conf/pulsar_env.sh] Applying config PULSAR_MEM = "-Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem"
E 2020-06-01T20:31:27.934534415Z [AppClassLoader@18b4aac2] info AspectJ Weaver Version 1.9.2 built on Wednesday Oct 24, 2018 at 15:43:33 GMT
E 2020-06-01T20:31:27.934578455Z [AppClassLoader@18b4aac2] info register classloader sun.misc.Launcher$AppClassLoader@18b4aac2
E 2020-06-01T20:31:27.934621976Z [AppClassLoader@18b4aac2] info using configuration file:/pulsar/lib/org.apache.pulsar-pulsar-zookeeper-utils-2.5.1.jar!/META-INF/aop.xml
E 2020-06-01T20:31:27.934630474Z [AppClassLoader@18b4aac2] info using configuration file:/pulsar/lib/org.apache.pulsar-pulsar-zookeeper-2.5.1.jar!/META-INF/aop.xml
E 2020-06-01T20:31:27.940125061Z [AppClassLoader@18b4aac2] info register aspect org.apache.pulsar.broker.zookeeper.aspectj.ClientCnxnAspect
E 2020-06-01T20:31:28.218736757Z [AppClassLoader@18b4aac2] info register aspect org.apache.pulsar.zookeeper.FinalRequestProcessorAspect
E 2020-06-01T20:31:28.223107367Z [AppClassLoader@18b4aac2] info register aspect org.apache.pulsar.zookeeper.ZooKeeperServerAspect
I 2020-06-01T20:31:30.492087527Z 20:31:30.488 [main] INFO org.apache.pulsar.broker.PulsarService - Starting Pulsar Broker service; version: '2.5.1'
I 2020-06-01T20:31:30.494428749Z 20:31:30.494 [main] INFO org.apache.pulsar.broker.PulsarService - Git Revision c36170c3acc0bd432036baf7d9471a4c34320395
I 2020-06-01T20:31:30.494766089Z 20:31:30.494 [main] INFO org.apache.pulsar.broker.PulsarService - Built by guangning <gu...@apache.org> on ubuntu-s-4vcpu-8gb-sfo1-01 at 2020-04-15T08:53:35+0000
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org