You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Crowder Tim <ti...@yahoo.com.INVALID> on 2015/03/26 02:00:41 UTC

Zookeeper ensemble in Docker

Hi All-

A few notes on running a zookeeper ensemble on docker...
(Apologies in advance for any mistakes/confusion.)

By default, Docker uses "bridge" networking, where it creates 
a virtual IP address (10.x.x.x) for each container. You can
force it to use the host IP with --net=host, but this opens
up some security vulnerabilities (like a docker image 
shutting down your host by sending commands to D-Bus).

By default, Zookeeper wants each quorum member to have the 
full list of ensemble-member addresses. Unfortunately, this
list performs double duty. For the server itself, it's entry
represents the *interface* to bind to. For it's peers in the
ensemble, it represents the *IP* address to connect to.

To see why this is a problem, consider the following scenario:

We want to run two servers "1" and "2" in separate containers
on localhost. Suppose we set the quorum list to:
  - 1.2.3.4:3181:4181:participant;2181
  - 1.2.3.4:3182:4182:participant;2182
Server "1" tries to bind to address 1.2.3.4:3181, and fails,
because it actually has an address like 10.3.4.5.
Similarly, if we use the hostname:
  - foo1.bar.com:3181:4181:participant;2181
  - foo2.bar.com:3182:4182:participant;2182
it will resolve to 1.2.3.4 and fail.
Using the --hostname option to Docker will also fail, either 
beacuse we use IPs in the quorum list, or because 
we will have the wrong (10.x.x.x) address for our peer.

Now suppose we get clever and pass different quorum lists to 
each server. 
So, server "1" gets:
  - 0.0.0.0:3181:4181:participant;2181
  - 1.2.3.4:3182:4182:participant;2182
And server "2" gets:
  - 1.2.3.4:3181:4181:participant;2181
  - 0.0.0.0:3182:4182:participant;2182
It turns out that both servers will be able to bind to their 
local server address, *and* connect to their peer.

So, now we're totally good, right?...

Nope. It turns out that once the servers connect, they synchronize
their quorum lists, and *restart* their quorum and leader election
ports and services.
Then one of 3 things happen: 
  a) The server has 0.0.0.0 for its peer and can't connect.
  b) The server has 1.2.3.4 for it's own address and can't bind.
  c) Unexpected exceptions (restart race condition?) and repeated failures.

So, unless you're willing to use --host=net, there's no correct way
to specify the quorum with a default ZK setup.

So, we're doomed, right?...

It turns out that there's a (slightly buried) feature that lets you
bind to 0.0.0.0 for the leader and quorum interfaces:
  https://issues.apache.org/jira/browse/ZOOKEEPER-1096

But really, it seems unfortunate that:
  a) The quorum list from peers can re-configure your local bindings
      (Denial of service on port 80, anyone?)
  b) ZK conflates binding *interfaces* and peer *IP* addresses
  c) This doesn't work by default, and takes some digging to find the fix.

Thoughts?

Thanks!
.timrc