You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/09/29 04:11:11 UTC

[GitHub] [pulsar] lhotari edited a comment on issue #8138: [Pulsar Java Client] Reader API usage with a lot of shortly used Reader instances (created-used-closed) under heavy load causes the client's memory consumption to grow until there is an OOM

lhotari edited a comment on issue #8138:
URL: https://github.com/apache/pulsar/issues/8138#issuecomment-700413950


   I have re-tested my application where the memory leak happens in a certain load test scenario after applying the #8149 changes on top of a custom Pulsar Client version based on 2.6.1 version. The memory leak still remains. I can see that the CompletableFuture's no longer keep references to ConsumerImpl instances, so that aspect is fixed. 
   The instances are still retained from ClientCnx consumers field. I guess this reference was also there before and it might have been that the majority were retained by ClientCnx consumers. 
    
   It looks like there is a race condition in the way ConsumerImpl is registered / de-registered from the ClientCnx instance when the ClientCnx gets switched. It looks like the reason why the race condition is very likely to happen is that "Currently the seek() operation will first disconnect all the connected consumers and then reset the cursor." (#5278). 
   
   I created a local repro using the Pulsar Client Reactive Streams / Project Reactor wrappers (internal library) that the application uses. This way I'm minimizing the repro step-by-step.
   The current one reproduces the memory leak issue in a case where it creates 100 topics with 3 messages in each. The test case then reads the last message using a reader created with startMessageIdInclusive() and startMessageId(MessageId.latest) . The async API is used and hasMessageAvailableAsync is called before calling readNextAsync to read the last message. A loop of 1000 times randomly picking a topic of the 100 topics and reading the last message in the described way reproduces the issue. This is done concurrently with concurrency level of 16. 
   I'd have to port the repro to plain Pulsar Client Java code before I could share it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org