You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2019/10/22 20:11:52 UTC

[GitHub] [pulsar] rdhabalia opened a new pull request #5443: [pulsar-client] Fix message corruption on OOM for batch messages

rdhabalia opened a new pull request #5443: [pulsar-client] Fix message corruption on OOM for batch messages
URL: https://github.com/apache/pulsar/pull/5443
 
 
   ### Motivation
   
   With below test setup, we can see message corruption and incorrect batch-size for a batch message topic and consumer will see below exception:
   
   It's also related to similar issue mentioned: #1201 
   
   ```
   17:47:55.973 [pulsar-client-io-31-1:org.apache.pulsar.client.impl.ClientCnx@257] WARN  org.apache.pulsar.client.impl.ClientCnx - [localhost/127.0.0.1:15119] Got exception IndexOutOfBoundsException : readerIndex: 85, writerIndex: 1702065078 (expected: 0 <= readerIndex <= writerIndex <= capacity(183))
   java.lang.IndexOutOfBoundsException: readerIndex: 85, writerIndex: 1702065078 (expected: 0 <= readerIndex <= writerIndex <= capacity(183))
   	at io.netty.buffer.AbstractByteBuf.checkIndexBounds(AbstractByteBuf.java:112) ~[netty-all-4.1.32.Final.jar:4.1.32.Final]
   	at io.netty.buffer.AbstractByteBuf.writerIndex(AbstractByteBuf.java:135) ~[netty-all-4.1.32.Final.jar:4.1.32.Final]
   	at org.apache.pulsar.common.protocol.Commands.deSerializeSingleMessageInBatch(Commands.java:1518) ~[classes/:?]
   	at org.apache.pulsar.client.impl.ConsumerImpl.receiveIndividualMessagesFromBatch(ConsumerImpl.java:950) ~[classes/:?]
   	at org.apache.pulsar.client.impl.ConsumerImpl.messageReceived(ConsumerImpl.java:835) ~[classes/:?]
   	at org.apache.pulsar.client.impl.ClientCnx.handleMessage(ClientCnx.java:375) ~[classes/:?]
   ```
   
   
   
   1. Create a partition topic with 400 partitions
   2. Restrict direct memory to 512m `-XX:MaxDirectMemorySize=512m` 
   3. Publish messages with large producer queue size which can cause OOM 
   `./pulsar-perf produce persistent://sample/standalone/batch/part -r 10000 -s 10240 -o 200000 -p 100000000`
   4. unload namespace which will make client to full producer queue and cause OOM 
   
   With above setup, producer will have OOM while serializing individual batch message
   ```
   18:08:56.738 [pulsar-timer-5-1] WARN  org.apache.pulsar.client.impl.ProducerImpl - [persistent://sample/standalone/batch/t2-partition-151] [standalone-7-3208] error while create opSendMsg by batch message container -- java.lang.RuntimeException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 20971520 byte(s) of direct memory (used: 134217728, max: 134217728)
   io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 134217728, max: 134217728)
   	at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:655)
   	at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:610)
   	at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:769)
   	at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:745)
   	at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244)
   	at io.netty.buffer.PoolArena.allocate(PoolArena.java:226)
   	at io.netty.buffer.PoolArena.reallocate(PoolArena.java:397)
   	at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118)
   	at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:299)
   	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:278)
   	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103)
   	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1096)
   	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1087)
   	at org.apache.pulsar.common.protocol.Commands.serializeSingleMessageInBatchWithPayload(Commands.java:1471)
   	at org.apache.pulsar.common.protocol.Commands.serializeSingleMessageInBatchWithPayload(Commands.java:1501)
   	at org.apache.pulsar.client.impl.BatchMessageContainerImpl.getCompressedBatchMetadataAndPayload(BatchMessageContainerImpl.java:92)
   	at org.apache.pulsar.client.impl.BatchMessageContainerImpl.createOpSendMsg(BatchMessageContainerImpl.java:160)
   	at org.apache.pulsar.client.impl.ProducerImpl.batchMessageAndSend(ProducerImpl.java:1296)
   	at org.apache.pulsar.client.impl.ProducerImpl.access$500(ProducerImpl.java:76)
   	at org.apache.pulsar.client.impl.ProducerImpl$2.run(ProducerImpl.java:1253)
   	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:682)
   	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:757)
   	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:485)
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
   	at java.lang.Thread.run(Thread.java:745)
   ```
   
   This error will corrupt 
   1. [batch-message buffer](https://github.com/apache/pulsar/blob/branch-2.4/pulsar-common/src/main/java/org/apache/pulsar/common/protocol/Commands.java#L1166) and 
   2. individual message buffer's index and also [recycles messageMetadata](https://github.com/apache/pulsar/blob/branch-2.4/pulsar-client/src/main/java/org/apache/pulsar/client/impl/BatchMessageContainerImpl.java#L89)
   
   It can be also reproduced with change mentioned into this [commit](https://github.com/apache/pulsar/commit/1411ff3e55470f3253c118f816b4cb23f14a84dc). It's hard to reproduce with unit-test but I will see if we can add it.
   
   ### Modification
   Reset batch-message buffer index and individual message's buffer index in case of failure in message serialization.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services