You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/06/24 20:06:16 UTC

[GitHub] [pulsar] wmccarley commented on issue #7328: Bookie cluster gradually fails when one of the bookie node goes down

wmccarley commented on issue #7328:
URL: https://github.com/apache/pulsar/issues/7328#issuecomment-649042142

@pushkar-engagio You mention you are using Amazon linux 2 but you don't specifically mention which EC2 instance type you are using. I have noticed that when running bookkeeper cluster in AWS if the instance type + config used is *just barely* suitable for the IO throughput during normal operation then sudden spikes can cause processes to crash. Once a bookie crashes auto-recovery generates additional IO on the peers and you get cascading failures. Even if you have the bookie process setup as a service and it comes back up the bouncing up and down will create tons of under-replicated ledgers and you will be hosed. The easiest solution is to make sure your bookies are powerful enough to handle whatever you can throw at them and then some.

You also mention you are using 500Gb for your journal and 1Tb for your ledger, I assume EBS volumes. 500Gb is probably much more than you need for journal since the default configurations of bookkeeper are: journalMaxSizeMB = 2048 and journalMaxBackups = 5 effectively the max size of the journal directory is 12Gb (5 2Gb backup journals plus the current 2Gb journal)

FWIW i3.xlarge instances work well as bookkeepers, they come with a 950 Gb attached NVMe SSD. You can put the journal directory and the ledger directory on the same device (use two separate partitions.) This config would get you _close_ to the 1Tb-per-bookie setup you are testing with but you'll have 99% write latencies of 3ms or less.

Also I have noticed that the default setting for: _compactionRate_ of 1000 seems way too low. If you run a moderately sized bookie cluster with that setting your compactions will take a really long time and old data will hang around much longer than it needs to. Personally I run at 10X that (10000) and it works fine. Finally, if you do use an i3.xlarge it comes with 30Gb of RAM so you can tune the bookie JVM settings in bkenv.sh to make better use of those additional resources.

Hope that helps.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org