You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/05/15 15:33:07 UTC

[GitHub] [pulsar] trexinc opened a new issue #6969: Brokers crash if all bookies are full

trexinc opened a new issue #6969:
URL: https://github.com/apache/pulsar/issues/6969


   Happens with both 2.5.0 and 2.5.1
   Running distributed pulsar on k8s. Several bookies, brokers. functions workers and proxies.
   If bookies get completely full (because of a bug with retention - #6935 ), brokers begin to loop crash making it impossible to remove large topics  or troubleshoot.
   As a workaround we add another bookie, and then clear large topics, but I would expect brokers not to crash or maybe even go into some emergency mode where only admin API is available.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] trexinc commented on issue #6969: Brokers crash if all bookies are full

Posted by GitBox <gi...@apache.org>.
trexinc commented on issue #6969:
URL: https://github.com/apache/pulsar/issues/6969#issuecomment-630564614


   @sijie ou function workers run on separate pods, not along with the brokers.
   
   @jiazhai unfortunately the log of the first crash wasn't saved, all next logs showed the crash because of "Broker-znode owned by different zk-session", even if I stop all brokers but one. Didn't see any other interesting logs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #6969: Brokers crash if all bookies are full

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #6969:
URL: https://github.com/apache/pulsar/issues/6969#issuecomment-630461037


   @trexinc if you have function workers running along with brokers, function workers use pulsar topics for metadata management. so if the bookkeeper cluster is not writable, it will cause function workers not able to produce messages and cause brokers not able to startup. We can think about adding the retry logic in function worker and let it retry until the it is able to produce the messages.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jiazhai commented on issue #6969: Brokers crash if all bookies are full

Posted by GitBox <gi...@apache.org>.
jiazhai commented on issue #6969:
URL: https://github.com/apache/pulsar/issues/6969#issuecomment-629908155


   @trexinc Thanks for the reporting of this issue. Would you please help collect the broker logs when this error happens. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #6969: Brokers crash if all bookies are full

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #6969:
URL: https://github.com/apache/pulsar/issues/6969#issuecomment-630574890


   @trexinc interesting. it would be good to get the logs so we can help you analyze the logs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ckdarby commented on issue #6969: Brokers crash if all bookies are full

Posted by GitBox <gi...@apache.org>.
ckdarby commented on issue #6969:
URL: https://github.com/apache/pulsar/issues/6969#issuecomment-633729334


   @trexinc Were you using "small volumes" with large ingestion? Better put, could you fill ~10% of your total bookies in < 10 seconds?
   
   We faced a similar issue where the cluster filled all the bookies before the ReadOnly safety check could even be performed and the cluster went into a state of being partially unusable for some functions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] trexinc commented on issue #6969: Brokers crash if all bookies are full

Posted by GitBox <gi...@apache.org>.
trexinc commented on issue #6969:
URL: https://github.com/apache/pulsar/issues/6969#issuecomment-630565386


   We will try to set-up a separate environment where we can replicate this issue on demand without affecting others. It reproduces easily on our active env, hopefully it will replicate as well on a dedicated one.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org