You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Paul Carey <pa...@gmail.com> on 2017/08/21 02:18:36 UTC

Transaction logs and hierarchical quorums

Hi

In order to simplify handling of data center failover, I wanted to create a
ZooKeeper ensemble where writes were synchronously replicated to at least
one node in each DC before returning to the client.

Quoting from the section on hierarchical quorums [1] that
  "we are able to form a quorum once we have a majority of votes from a
majority of non-zero-weight groups"
I understand from this that if I have 2 non-zero-weight groups of 3 nodes,
then a quorum must be formed from 2 groups of at least 2 nodes. Which is at
least 4 nodes and hence at least node in each DC must be part of quorum,
thus ensuring each write is replicated to at least one node in each DC.

I also understand from this line in the Programmer's Guide [2], and various
other places in the docs that the transaction log will reflect every change
applied to the znode tree.
  "The most performance-critical part of ZooKeeper is the transaction log.
ZooKeeper must sync transactions to media before it returns a response. "

Given the two points above, I would expect to see every zxid in at least
four transaction logs. But under failure conditions, this is not what I
see. I used `tc netem` to simulate a network split by progressively:
  - increasing inter-DC latency from an average of 0.7ms (the DCs are 30km
apart) to 10ms
  - dropping 50% of packets between DCs
I see the last zxid before total failure of the ensemble, 0x1b0002dfa5, in
only 3 of the 6 transaction logs, suggesting to me that the hierarchical
quroum was not correctly established before the write was accepted.

But maybe I'm misunderstanding:
  - maybe the presence of an entry in the transaction log is not the same
as saying that change will be applied to the in-memory state
  - the zxids refer to createSession, perhaps quorum rules are not enforced
for such calls

Anyway, I'd be very grateful if someone could help me understand what I'm
seeing here. Log snippets and config follow below. I'm running ZooKeeper
3.4.6 on RHEL 6.8.

Many thanks

Paul

== Transaction Logs ==

Host 3a
8/17/17 6:01:44 AM UTC session 0x35dee3497560039 cxid 0x0 zxid 0x1b0002dfa5
createSession 10000
8/17/17 6:02:13 AM UTC session 0x35deec8e21b0000 cxid 0x0 zxid 0x1c00000001
createSession 10000

Host 3b
8/17/17 6:00:54 AM GMT session 0x15dee2f698e000d cxid 0x0 zxid 0x1b0002df44
closeSession null
EOF reached after 10556 txns.

Host 4a
8/17/17 6:01:44 AM GMT session 0x35dee3497560039 cxid 0x0 zxid 0x1b0002dfa5
createSession 10000
8/17/17 6:02:13 AM GMT session 0x35deec8e21b0000 cxid 0x0 zxid 0x1c00000001
createSession 10000

Host 4b
8/17/17 6:01:25 AM GMT session 0x35dee3497560032 cxid 0x0 zxid 0x1b0002df88
createSession 10000
8/17/17 6:02:13 AM GMT session 0x35deec8e21b0000 cxid 0x0 zxid 0x1c00000001
createSession 10000
EOF reached after 6619 txns.

Host 7a
8/17/17 6:01:44 AM GMT session 0x35dee3497560039 cxid 0x0 zxid 0x1b0002dfa5
createSession 10000
8/17/17 6:02:13 AM GMT session 0x35deec8e21b0000 cxid 0x0 zxid 0x1c00000001
createSession 10000

Host 7b
8/17/17 6:01:25 AM GMT session 0x35dee3497560032 cxid 0x0 zxid 0x1b0002df88
createSession 10000
EOF reached after 49071 txns.


== Config ==

server.1=3a:2888:3888
server.2=3b:2888:3888
server.3=4a:2888:3888
server.4=4b:2888:3888
server.5=7a:2888:3888
server.6=7b:2888:3888

group.1=1:2:4
group.2=3:5:6

weight.1=1
weight.2=1
weight.3=1
weight.4=1
weight.5=1
weight.6=1

[1] http://zookeeper.apache.org/doc/r3.4.6/zookeeperHierarchicalQuorums.html
[2] http://zookeeper.apache.org/doc/r3.4.6/zookeeperProgrammers.html

Re: Transaction logs and hierarchical quorums

Posted by Paul Carey <pa...@gmail.com>.
Just bumping this thread. I'm still interested to know if I'm
misunderstanding expected behaviour, or if something else is happening.
Thanks.

On Mon, 21 Aug 2017 at 10:18 Paul Carey <pa...@gmail.com> wrote:

> Hi
>
> In order to simplify handling of data center failover, I wanted to create
> a ZooKeeper ensemble where writes were synchronously replicated to at least
> one node in each DC before returning to the client.
>
> Quoting from the section on hierarchical quorums [1] that
>   "we are able to form a quorum once we have a majority of votes from a
> majority of non-zero-weight groups"
> I understand from this that if I have 2 non-zero-weight groups of 3 nodes,
> then a quorum must be formed from 2 groups of at least 2 nodes. Which is at
> least 4 nodes and hence at least node in each DC must be part of quorum,
> thus ensuring each write is replicated to at least one node in each DC.
>
> I also understand from this line in the Programmer's Guide [2], and
> various other places in the docs that the transaction log will reflect
> every change applied to the znode tree.
>   "The most performance-critical part of ZooKeeper is the transaction log.
> ZooKeeper must sync transactions to media before it returns a response. "
>
> Given the two points above, I would expect to see every zxid in at least
> four transaction logs. But under failure conditions, this is not what I
> see. I used `tc netem` to simulate a network split by progressively:
>   - increasing inter-DC latency from an average of 0.7ms (the DCs are 30km
> apart) to 10ms
>   - dropping 50% of packets between DCs
> I see the last zxid before total failure of the ensemble, 0x1b0002dfa5, in
> only 3 of the 6 transaction logs, suggesting to me that the hierarchical
> quroum was not correctly established before the write was accepted.
>
> But maybe I'm misunderstanding:
>   - maybe the presence of an entry in the transaction log is not the same
> as saying that change will be applied to the in-memory state
>   - the zxids refer to createSession, perhaps quorum rules are not
> enforced for such calls
>
> Anyway, I'd be very grateful if someone could help me understand what I'm
> seeing here. Log snippets and config follow below. I'm running ZooKeeper
> 3.4.6 on RHEL 6.8.
>
> Many thanks
>
> Paul
>
> == Transaction Logs ==
>
> Host 3a
> 8/17/17 6:01:44 AM UTC session 0x35dee3497560039 cxid 0x0 zxid
> 0x1b0002dfa5 createSession 10000
> 8/17/17 6:02:13 AM UTC session 0x35deec8e21b0000 cxid 0x0 zxid
> 0x1c00000001 createSession 10000
>
> Host 3b
> 8/17/17 6:00:54 AM GMT session 0x15dee2f698e000d cxid 0x0 zxid
> 0x1b0002df44 closeSession null
> EOF reached after 10556 txns.
>
> Host 4a
> 8/17/17 6:01:44 AM GMT session 0x35dee3497560039 cxid 0x0 zxid
> 0x1b0002dfa5 createSession 10000
> 8/17/17 6:02:13 AM GMT session 0x35deec8e21b0000 cxid 0x0 zxid
> 0x1c00000001 createSession 10000
>
> Host 4b
> 8/17/17 6:01:25 AM GMT session 0x35dee3497560032 cxid 0x0 zxid
> 0x1b0002df88 createSession 10000
> 8/17/17 6:02:13 AM GMT session 0x35deec8e21b0000 cxid 0x0 zxid
> 0x1c00000001 createSession 10000
> EOF reached after 6619 txns.
>
> Host 7a
> 8/17/17 6:01:44 AM GMT session 0x35dee3497560039 cxid 0x0 zxid
> 0x1b0002dfa5 createSession 10000
> 8/17/17 6:02:13 AM GMT session 0x35deec8e21b0000 cxid 0x0 zxid
> 0x1c00000001 createSession 10000
>
> Host 7b
> 8/17/17 6:01:25 AM GMT session 0x35dee3497560032 cxid 0x0 zxid
> 0x1b0002df88 createSession 10000
> EOF reached after 49071 txns.
>
>
> == Config ==
>
> server.1=3a:2888:3888
> server.2=3b:2888:3888
> server.3=4a:2888:3888
> server.4=4b:2888:3888
> server.5=7a:2888:3888
> server.6=7b:2888:3888
>
> group.1=1:2:4
> group.2=3:5:6
>
> weight.1=1
> weight.2=1
> weight.3=1
> weight.4=1
> weight.5=1
> weight.6=1
>
> [1]
> http://zookeeper.apache.org/doc/r3.4.6/zookeeperHierarchicalQuorums.html
> [2] http://zookeeper.apache.org/doc/r3.4.6/zookeeperProgrammers.html
>
>