You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/07/29 13:49:58 UTC

[GitHub] [pulsar] rkaw92 opened a new issue, #16875: Broker exits with code 255

rkaw92 opened a new issue, #16875:
URL: https://github.com/apache/pulsar/issues/16875

   Hi,
   
   We're running a cluster of 3 brokers and 3 bookies on Pulsar 2.10.1. The issue is, after some time, the broker processes just stop working. They disappear, with no trace in the log (I'm running with `immediateFlush: true`).
   
   It's like this: one minute everything is working fine, daemons listen where they should, messages flow... and the next minute nothing is listening on the port and the process is just gone. It happens on all 3 hosts in the cluster, at different times, but the end result is: in a few hours, the entire 3-node cluster is down for no explicable reason.
   
   Last time, I attached `strace` to one of the processes, and this is all I saw:
   ```
   root@pulsar1:/opt/apache-pulsar-2.10.1# strace -p 714776
   strace: Process 714776 attached
   futex(0x7f1f4125b9d0, FUTEX_WAIT, 714831, NULL) = ?
   +++ exited with 255 +++
   ```
   
   Apparently, the process is quitting with error code 255. Why? I have no idea. I checked dmesg looking for an Out-of-Memory message, but found nothing. Any hints?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Technoboy- commented on issue #16875: Broker exits with code 255

Posted by GitBox <gi...@apache.org>.
Technoboy- commented on issue #16875:
URL: https://github.com/apache/pulsar/issues/16875#issuecomment-1199408126

   Are there any broker logs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #16875: Broker exits with code 255

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #16875:
URL: https://github.com/apache/pulsar/issues/16875#issuecomment-1237595609

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] rkaw92 commented on issue #16875: Broker exits with code 255

Posted by GitBox <gi...@apache.org>.
rkaw92 commented on issue #16875:
URL: https://github.com/apache/pulsar/issues/16875#issuecomment-1199533228

   > Are there any broker logs?
   
   Before your reply, there were no logs that would look relevant to this - it'd just stop and print literally nothing to the log. Now, after you wrote, a log entry has miraculously appeared after the latest crash:
   
   ```
   2022-07-29T15:05:46,317+0000 [pulsar-2-1] ERROR org.apache.pulsar.PulsarBrokerStarter - -- Shutting down - Received OOM exception: failed to allocate 4194304 byte(s) of direct memory (used: 10733223943, max: 10737418240)
   io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 4194304 byte(s) of direct memory (used: 10733223943, max: 10737418240)
           at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:806) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
           at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:735) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
           at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:649) ~[io.netty-netty-buffer-4.1.77.Final.jar:4.1.77.Final]
           at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:624) ~[io.netty-netty-buffer-4.1.77.Final.jar:4.1.77.Final]
           at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:203) ~[io.netty-netty-buffer-4.1.77.Final.jar:4.1.77.Final]
           at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:187) ~[io.netty-netty-buffer-4.1.77.Final.jar:4.1.77.Final]
           at io.netty.buffer.PoolArena.allocate(PoolArena.java:136) ~[io.netty-netty-buffer-4.1.77.Final.jar:4.1.77.Final]
           at io.netty.buffer.PoolArena.allocate(PoolArena.java:126) ~[io.netty-netty-buffer-4.1.77.Final.jar:4.1.77.Final]
           at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:396) ~[io.netty-netty-buffer-4.1.77.Final.jar:4.1.77.Final]
           at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188) ~[io.netty-netty-buffer-4.1.77.Final.jar:4.1.77.Final]
           at org.apache.bookkeeper.common.allocator.impl.ByteBufAllocatorImpl.newDirectBuffer(ByteBufAllocatorImpl.java:163) ~[org.apache.bookkeeper-bookkeeper-common-allocator-4.14.5.jar:4.14.5]
           at org.apache.bookkeeper.common.allocator.impl.ByteBufAllocatorImpl.newDirectBuffer(ByteBufAllocatorImpl.java:157) ~[org.apache.bookkeeper-bookkeeper-common-allocator-4.14.5.jar:4.14.5]
           at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188) ~[io.netty-netty-buffer-4.1.77.Final.jar:4.1.77.Final]
           at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179) ~[io.netty-netty-buffer-4.1.77.Final.jar:4.1.77.Final]
           at io.streamnative.pulsar.handlers.kop.format.DirectBufferOutputStream.<init>(DirectBufferOutputStream.java:40) ~[?:?]
           at io.streamnative.pulsar.handlers.kop.utils.ByteBufUtils.decodePulsarEntryToKafkaRecords(ByteBufUtils.java:130) ~[?:?]
           at io.streamnative.pulsar.handlers.kop.format.AbstractEntryFormatter.decode(AbstractEntryFormatter.java:90) ~[?:?]
           at io.streamnative.pulsar.handlers.kop.format.PulsarEntryFormatter.decode(PulsarEntryFormatter.java:118) ~[?:?]
           at io.streamnative.pulsar.handlers.kop.MessageFetchContext.lambda$handleEntries$10(MessageFetchContext.java:483) ~[?:?]
           at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) [?:?]
           at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) [?:?]
           at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?]
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
           at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
           at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
           at java.lang.Thread.run(Thread.java:829) [?:?]
   ```
   
   So, it looks like the broker consumed 10 GB of memory and crashed.
   
   How do I know the appropriate direct memory size required to run a stable broker?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] leizhiyuan commented on issue #16875: Broker exits with code 255

Posted by GitBox <gi...@apache.org>.
leizhiyuan commented on issue #16875:
URL: https://github.com/apache/pulsar/issues/16875#issuecomment-1207305112

           at io.streamnative.pulsar.handlers.kop.utils.ByteBufUtils.decodePulsarEntryToKafkaRecords(ByteBufUtils.java:130) ~[?:?]
   
   it seems you use kop, you can take a look for your kop version, maybe it do not release the directBuffer which applied.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tisonkun commented on issue #16875: Broker exits with code 255

Posted by GitBox <gi...@apache.org>.
tisonkun commented on issue #16875:
URL: https://github.com/apache/pulsar/issues/16875#issuecomment-1309639644

   This seems a KoP-specific issue. @rkaw92 you may open an issue against https://github.com/streamnative/kop to see how they use the memory.
   
   Closed as unactionable. Feel free to open a new issue if KoP developers find a root cause in the Pulsar scope.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tisonkun closed issue #16875: Broker exits with code 255

Posted by GitBox <gi...@apache.org>.
tisonkun closed issue #16875: Broker exits with code 255
URL: https://github.com/apache/pulsar/issues/16875


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org