You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/09/24 07:02:17 UTC

[GitHub] [pulsar] wenbingshen opened a new issue #12169: [BUG] Questions about pulsar broker direct OOM

wenbingshen opened a new issue #12169:
URL: https://github.com/apache/pulsar/issues/12169


   **Describe the bug**
   A clear and concise description of what the bug is.
   
   Pulsar and bookkeeper version:
   pulsar-2.8.0 and pulsar-2.8.0 built-in bookkeeper
   
   In order to figure out the reason for the OOM of the pulsar broker's direct memory, I tested different scenarios and got some different results.
   
   After analyzing the pulsar broker heap dump, a large number of PendingAddOp instances have not been restored or destroyed.
   
   As shown in the figure below, I suspect that a large number of entry requests written to bookie have not received all the WQ responses, which makes PendingAddOp unable to be recycled or destroyed.
   
   ![image](https://user-images.githubusercontent.com/35599757/134630405-64fec4d3-2714-4c8a-968b-bf65e6bd75bd.png)
   
   Therefore, I use maxMessagePublishBufferSizeInMB to limit the traffic handled by the broker according to https://github.com/apache/pulsar/pull/7406 and https://github.com/apache/pulsar/pull/6178.
   
   But next is my test results:
   1. The broker is configured with maxMessagePublishBufferSizeInMB=512, EW A=3:3:2, OOM still occurs after the pressure test
   2. The broker configures maxMessagePublishBufferSizeInMB=512, and tests EW A=3:3:3, 3:2:2, and 2:2:2 respectively. After the pressure test, the direct memory is normal
   3. The broker configures maxMessagePublishBufferSizeInMB=2048, test EW A=3:3:3 and 3:2:2, after the pressure test, the direct memory is normal
   4. The broker configuration keeps maxMessagePublishBufferSizeInMB as the default value, the default is 1/2 of the maximum allocated off-heap memory (8/2=4GB in the test), test EW A=3:3:3 and 3:2:2, pressure test The off-heap memory is normal
   5. The broker configures maxMessagePublishBufferSizeInMB=-1, closes current limiting measures, tests EW A=3:3:3 and 3:2:2, the memory is normal after the pressure test
   6. The broker configures maxMessagePublishBufferSizeInMB=-1, closes current limiting measures, tests EW A=3:3:2, OOM occurs after the pressure test
   
   The next questions also are related to #9562 
   
   My question is, whether maxMessagePublishBufferSizeInMB is configured or not, 
   as long as AQ=WQ, direct memory is normal, 
   as long as AQ<WQ, direct memory will appear OOM, 
   then how does maxMessagePublishBufferSizeInMB work?
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wenbingshen edited a comment on issue #12169: [BUG] Questions about pulsar broker direct OOM

Posted by GitBox <gi...@apache.org>.
wenbingshen edited a comment on issue #12169:
URL: https://github.com/apache/pulsar/issues/12169#issuecomment-926602296


   > @wenbingshen Do you have a chance to test with 2.8.1 ? That contains quite a few fixes, just to see if there's a difference.
   > 
   > > producerRate: 880000000
   > 
   > I assume you are intentionally testing an overload situation?
   > 
   > > After analyzing the pulsar broker heap dump, a large number of PendingAddOp instances have not been recycled or destroyed.
   > 
   > That is probably expected if there's such a high load on the system.
   > 
   > One possibility to protect from overload is to configure rate limiters on the system.
   > However, it would be good if the Pulsar system would have backpressure (even without rate limiting configured) to prevent the system getting into a state where it breaks because of OOM. One such improvement suggestion is documented in #10439 .
   
   @lhotari Thank you very much for your reply. I don’t know much about bookkeeper's back pressure mechanism and related parameters. I will learn this content later. In fact, here, the question I want to understand is:
   Keep the producerRate at 880000000, and perform the following four tests for the same traffic:
   1. Configure maxMessagePublishBufferSizeInMB>0, OOM occurs when EWA=3:3:2
   2. Configure maxMessagePublishBufferSizeInMB>0, OOM will not occur when EWA=3:3:3, 3:2:2, and 2:2:2
   3. Configure maxMessagePublishBufferSizeInMB=-1, which is related to turning off the current limit, and OOM occurs when EWA=3:3:2
   4. Configure maxMessagePublishBufferSizeInMB=-1, which is related to turning off the current limit. OOM will not occur when EWA=3:3:3, 3:2:2, and 2:2:2
   Why compare 1 and 3, 2 and 4, the maxMessagePublishBufferSizeInMB parameter turns on or turns off the current limit, there is no effect, the test results of the two are the same, then what is the working meaning of maxMessagePublishBufferSizeInMB?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #12169: [BUG] Questions about pulsar broker direct OOM

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #12169:
URL: https://github.com/apache/pulsar/issues/12169#issuecomment-926592230


   @wenbingshen Do you have a chance to test with 2.8.1 ? That contains quite a few fixes, just to see if there's a difference.
   
   > producerRate: 880000000
   
   I assume you are intentionally testing an overload situation? 
   
   > After analyzing the pulsar broker heap dump, a large number of PendingAddOp instances have not been recycled or destroyed.
   
   That is probably expected if there's such a high load on the system.
   
   One possibility to protect from overload is to configure rate limiters on the system.
   However, it would be good if the Pulsar system would have backpressure (even without rate limiting configured) to prevent the system getting into a state where it breaks because of OOM. One such improvement suggestion is documented in #10439  .
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wenbingshen commented on issue #12169: [BUG] Questions about pulsar broker direct OOM

Posted by GitBox <gi...@apache.org>.
wenbingshen commented on issue #12169:
URL: https://github.com/apache/pulsar/issues/12169#issuecomment-926602296


   > @wenbingshen Do you have a chance to test with 2.8.1 ? That contains quite a few fixes, just to see if there's a difference.
   > 
   > > producerRate: 880000000
   > 
   > I assume you are intentionally testing an overload situation?
   > 
   > > After analyzing the pulsar broker heap dump, a large number of PendingAddOp instances have not been recycled or destroyed.
   > 
   > That is probably expected if there's such a high load on the system.
   > 
   > One possibility to protect from overload is to configure rate limiters on the system.
   > However, it would be good if the Pulsar system would have backpressure (even without rate limiting configured) to prevent the system getting into a state where it breaks because of OOM. One such improvement suggestion is documented in #10439 .
   
   @lhotari Thank you very much for your reply. I don’t know much about bookkeeper's back pressure mechanism and related parameters. I will learn this content later. In fact, here, the question I want to understand is:
   Keep the producerRate at 880000000, and perform the following four tests for the same traffic:
   1. Configure maxMessagePublishBufferSizeInMB, OOM occurs when EWA=3:3:2
   2. Configure maxMessagePublishBufferSizeInMB, OOM will not occur when EWA=3:3:3, 3:2:2, and 2:2:2
   3. Configure maxMessagePublishBufferSizeInMB=-1, which is related to turning off the current limit, and OOM occurs when EWA=3:3:2
   4. Configure maxMessagePublishBufferSizeInMB=-1, which is related to turning off the current limit. OOM will not occur when EWA=3:3:3, 3:2:2, and 2:2:2
   Why compare 1 and 3, 2 and 4, the maxMessagePublishBufferSizeInMB parameter turns on or turns off the current limit, there is no effect, the test results of the two are the same, then what is the working meaning of maxMessagePublishBufferSizeInMB?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wenbingshen commented on issue #12169: [BUG] Questions about pulsar broker direct OOM

Posted by GitBox <gi...@apache.org>.
wenbingshen commented on issue #12169:
URL: https://github.com/apache/pulsar/issues/12169#issuecomment-926578428


   ping @merlimat @codelipenghui @lhotari PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] github-actions[bot] commented on issue #12169: [BUG] Questions about pulsar broker direct OOM

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #12169:
URL: https://github.com/apache/pulsar/issues/12169#issuecomment-1054902777


   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org