You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Ben Podgursky <bp...@gmail.com> on 2015/09/14 05:27:42 UTC

ResourceManager HA + ZooKeeper performance questions

Hi all,

I'm looking into turning on High Availability ResourceManagers in our YARN
cluster (we run a single ResourceManager at the moment).  We plan on using
ZooKeeper for fencing and state storage, and my main concern is around ZK
scalability+performance.  We run a fairly active YARN cluster:

- ~550 NodeManagers
- ~5,000 applications/day
- ~15,000 active containers at any given time
~ usually ~100 applications running at any given time

Since we can't really load-test our setup before turning HA on in
production, I was hoping someone who had run a cluster at similar scale
could give advice on their ZK environment; specifically

- What ZK heap size did you need?
- How many nodes in your ensemble?
- What kind of disks?  Are spinning disks OK, or do you use SSDs?
- Did you need any special configurations around timeouts, etc?

Basically I'm looking either for any horror stories, or hoping that someone
can say that RM HA will be A-OK at this throughput.


Thanks,

Ben