You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/02/11 07:33:00 UTC

[GitHub] [pulsar] danielsinai opened a new issue #9562: maxMessageBufferSizeInMB is not working?

danielsinai opened a new issue #9562:
URL: https://github.com/apache/pulsar/issues/9562


   After some load testing on my pulsar cluster, thr maxMessageBufferSizeInMB configuration does not protect the broker from being OOMED, im publishing to a single topic about 150 mb/s (5mb message size) and my direct memory is going up until the broker crashed because of bookie disk latencies, I would expect some throttling on the producers side using this configuration released in 2.5.1 https://github.com/apache/pulsar/pull/6178
    
   I allocate 8 gb of direct memory to the broker and I configured it to a range of 1-4096
   
   Steps to reproduce the behavior:
   1. Use the default maxMessageBufferSizeInMB (1/2 of direct memory)
   2. Load the cluster with 5mb messages untill bookies disk have high latencies
   3. wait for broker to crash
   
   I would expect producers to get throttled untill the broker does succeefully get acks from the qa of bookies.
   
   ![IMG_20210210_175934.jpg](https://user-images.githubusercontent.com/51213812/107611005-a619fc80-6c4b-11eb-933a-ba666e6da655.jpg)
   
   The holes in the graphs describes when the broker failed and also when I tried to tune this configuration
   
    - OS: Centos 7.8
   
   You can see more details about it in this slack thread https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1612886327337200?thread_ts=1612886327.337200&cid=C5Z4T36F7
   
    Btw, it can also happen with small messages


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #9562: maxMessageBufferSizeInMB is not working?

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #9562:
URL: https://github.com/apache/pulsar/issues/9562#issuecomment-902159988


   @danielsinai Did this problem get resolved by upgrading to a specific Pulsar version?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] dlg99 commented on issue #9562: maxMessageBufferSizeInMB is not working?

Posted by GitBox <gi...@apache.org>.
dlg99 commented on issue #9562:
URL: https://github.com/apache/pulsar/issues/9562#issuecomment-789119252


   another thing to look at is "because of bookie disk latencies"
   Large messages are a part of the problem (unless you have pulsar chunk them).
   you need really carefully review bookie configuration to tune it better for the large messages:
   https://github.com/apache/bookkeeper/blob/master/conf/bk_server.conf
   At least:
   ```
   byteBufAllocatorSize...
   nettyMaxFrameSizeBytes
   journalPreAllocSizeMB
   journalWriteBufferSizeKB
   journalBufferedWritesThreshold
   skipList...
   readBufferSizeBytes
   writeBufferSizeBytes
   ```
   probably something else.
   Can you move journal to a dedicated fast disk?
   
   Increase skip list size, especially if you have tailing reads
   
   check linux config for the disks, i.e. read-ahead, scheduler for ssds etc., though it feels like a rotational disks ("publishing to a single topic about 150 mb/s"). I had clients writing to bookies being bottlenecked by 20gbps NIC though entries were smaller (~64K)
   
   Consider increasing ensemble size for the ledger, this will spread IO across more bookies.
   i.e. if you running with ES/Wq/Aq of 3/3/2 increase ES to 7, 11 or more.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] danielsinai commented on issue #9562: maxMessageBufferSizeInMB is not working?

Posted by GitBox <gi...@apache.org>.
danielsinai commented on issue #9562:
URL: https://github.com/apache/pulsar/issues/9562#issuecomment-798965007


   > @danielsinai Have you tried enabling backpressure?
   > 
   > there is a discussion in the bookkeeper dev mailing list (subject: "Unbounded memory usage for WQ > AQ ?") on this topic, I'll copy my comment from there with some ideas for you to experiment with:
   > 
   > ```
   > 
   > I remember issues with bookies OOMing/slowing down due to memory pressure
   > 
   > under load.
   > 
   > https://github.com/apache/bookkeeper/issues/1409
   > 
   > https://github.com/apache/bookkeeper/pull/1410
   > 
   > 
   > 
   > IIRC, there were a couple of problems:
   > 
   > 
   > 
   > - Slow bookie kept on accepting data hat it could not process (netty kept
   > 
   > on reading it and throwing it into the queue)
   > 
   > AQ < WQ means that the client does not wait after AQ acks received and
   > 
   > keeps on throwing data to the slow bookie and ensemble change did not
   > 
   > happen (or did not happen fast enough?)
   > 
   > 
   > 
   > - client submitted a lot of requests but was too slow to process responses
   > 
   > (network capacity, NIC bandwidth, something else), and the bookie kept to
   > 
   > the data
   > 
   > 
   > 
   > It's been a while and I don't recall all the details but the PR is merged.
   > 
   > Have you played with these settings:
   > 
   > https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java
   > 
   >     // backpressure control
   > 
   >     protected static final String MAX_ADDS_IN_PROGRESS_LIMIT =
   > 
   > "maxAddsInProgressLimit";
   > 
   >     protected static final String MAX_READS_IN_PROGRESS_LIMIT =
   > 
   > "maxReadsInProgressLimit";
   > 
   >     protected static final String CLOSE_CHANNEL_ON_RESPONSE_TIMEOUT =
   > 
   > "closeChannelOnResponseTimeout";
   > 
   >     protected static final String WAIT_TIMEOUT_ON_RESPONSE_BACKPRESSURE = "waitTimeoutOnResponseBackpressureMs";
   > 
   > https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ClientConfiguration.java
   > 
   >     // backpressure configuration
   > 
   >     protected static final String WAIT_TIMEOUT_ON_BACKPRESSURE ="waitTimeoutOnBackpressureMs";
   > 
   > ```
   > 
   
   Thanks for your answer and I am really sorry for the delay.
   
   I will try to tune this stuff later on but isn't it strange that the broker getting OOMed? Is back pressuring to the broker will solve the problem? 
   
   I mean I would expect pulsar broker to throttle the producer before getting his memory filled
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] dlg99 edited a comment on issue #9562: maxMessageBufferSizeInMB is not working?

Posted by GitBox <gi...@apache.org>.
dlg99 edited a comment on issue #9562:
URL: https://github.com/apache/pulsar/issues/9562#issuecomment-789101618


   @danielsinai Have you tried enabling backpressure?
   there is a discussion in the bookkeeper dev mailing list (subject: "Unbounded memory usage for WQ > AQ ?") on this topic, I'll copy my comment from there with some ideas for you to experiment with:
   ```
   I remember issues with bookies OOMing/slowing down due to memory pressure
   under load.
   https://github.com/apache/bookkeeper/issues/1409
   https://github.com/apache/bookkeeper/pull/1410
   
   IIRC, there were a couple of problems:
   
   - Slow bookie kept on accepting data hat it could not process (netty kept
   on reading it and throwing it into the queue)
   AQ < WQ means that the client does not wait after AQ acks received and
   keeps on throwing data to the slow bookie and ensemble change did not
   happen (or did not happen fast enough?)
   
   - client submitted a lot of requests but was too slow to process responses
   (network capacity, NIC bandwidth, something else), and the bookie kept to
   the data
   
   It's been a while and I don't recall all the details but the PR is merged.
   Have you played with these settings:
   https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java
       // backpressure control
       protected static final String MAX_ADDS_IN_PROGRESS_LIMIT =
   "maxAddsInProgressLimit";
       protected static final String MAX_READS_IN_PROGRESS_LIMIT =
   "maxReadsInProgressLimit";
       protected static final String CLOSE_CHANNEL_ON_RESPONSE_TIMEOUT =
   "closeChannelOnResponseTimeout";
       protected static final String WAIT_TIMEOUT_ON_RESPONSE_BACKPRESSURE = "waitTimeoutOnResponseBackpressureMs";
   https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ClientConfiguration.java
       // backpressure configuration
       protected static final String WAIT_TIMEOUT_ON_BACKPRESSURE ="waitTimeoutOnBackpressureMs";
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] danielsinai commented on issue #9562: maxMessageBufferSizeInMB is not working?

Posted by GitBox <gi...@apache.org>.
danielsinai commented on issue #9562:
URL: https://github.com/apache/pulsar/issues/9562#issuecomment-800142058


   It also seems like there is memory leak. 
   
   After stopping the writes the direct memory doesn't decrease


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #9562: maxMessageBufferSizeInMB is not working?

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #9562:
URL: https://github.com/apache/pulsar/issues/9562#issuecomment-788976688


   @danielsinai What is the persistent policy set? If you are using the write quorum greater than the ack quorum, I think you can try to keep them equals and retry. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] dlg99 commented on issue #9562: maxMessageBufferSizeInMB is not working?

Posted by GitBox <gi...@apache.org>.
dlg99 commented on issue #9562:
URL: https://github.com/apache/pulsar/issues/9562#issuecomment-789101618


   @danielsinai Have you tried enabling backpressure?
   there is a discussion in dev mailing list (subject: "Unbounded memory usage for WQ > AQ ?") on this topic, I'll copy my comment from there with some ideas for you to experiment with:
   ```
   I remember issues with bookies OOMing/slowing down due to memory pressure
   under load.
   https://github.com/apache/bookkeeper/issues/1409
   https://github.com/apache/bookkeeper/pull/1410
   
   IIRC, there were a couple of problems:
   
   - Slow bookie kept on accepting data hat it could not process (netty kept
   on reading it and throwing it into the queue)
   AQ < WQ means that the client does not wait after AQ acks received and
   keeps on throwing data to the slow bookie and ensemble change did not
   happen (or did not happen fast enough?)
   
   - client submitted a lot of requests but was too slow to process responses
   (network capacity, NIC bandwidth, something else), and the bookie kept to
   the data
   
   It's been a while and I don't recall all the details but the PR is merged.
   Have you played with these settings:
   https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java
       // backpressure control
       protected static final String MAX_ADDS_IN_PROGRESS_LIMIT =
   "maxAddsInProgressLimit";
       protected static final String MAX_READS_IN_PROGRESS_LIMIT =
   "maxReadsInProgressLimit";
       protected static final String CLOSE_CHANNEL_ON_RESPONSE_TIMEOUT =
   "closeChannelOnResponseTimeout";
       protected static final String WAIT_TIMEOUT_ON_RESPONSE_BACKPRESSURE = "waitTimeoutOnResponseBackpressureMs";
   https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ClientConfiguration.java
       // backpressure configuration
       protected static final String WAIT_TIMEOUT_ON_BACKPRESSURE ="waitTimeoutOnBackpressureMs";
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] danielsinai edited a comment on issue #9562: maxMessageBufferSizeInMB is not working?

Posted by GitBox <gi...@apache.org>.
danielsinai edited a comment on issue #9562:
URL: https://github.com/apache/pulsar/issues/9562#issuecomment-798965007


   
   Thanks for your answer and I am really sorry for the delay.
   
   I am using QW > QA I didn't thought about making them equals but I will try that out.
   
   I will try to tune my bookie to a larger messages / use chunking. but isn't it strange that the broker is getting OOMed? I would expect a throttling to the producer instead of this behavior, am I missing something out?
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] danielsinai commented on issue #9562: maxMessageBufferSizeInMB is not working?

Posted by GitBox <gi...@apache.org>.
danielsinai commented on issue #9562:
URL: https://github.com/apache/pulsar/issues/9562#issuecomment-800015508


   Tried the WQ = AQ and it helped, but didn't prevent the problem.
   
   It just made the brokers to last longer ![image](https://user-images.githubusercontent.com/51213812/111269902-da604e80-8637-11eb-9e04-12d2b070b13a.jpeg)
   
   Any ideas?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] danielsinai closed issue #9562: maxMessageBufferSizeInMB is not working?

Posted by GitBox <gi...@apache.org>.
danielsinai closed issue #9562:
URL: https://github.com/apache/pulsar/issues/9562


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org