You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by Huizhi Lu <ih...@gmail.com> on 2020/07/27 20:23:02 UTC

ZK Transaction API multi()

Hi Zookeeper Devs,

Hope this email finds you well!

I am working on some stuff that needs ZK multi(). I would like to confirm a
few things about this API.

1. Is this a real transaction operation in ZK? My understanding is, it is a
real transaction. If I put 3 write operations in this transaction request,
these 3 write operations are committed in 1 transaction with the same zxid
and 1 proposal. Observers should either see all the updates or none of the
updates. Observers should not see partial updates, eg. only 1 of the 3
updates.

2. We have a case to write multiple znodes. Currently it is sending the
requests one by one. With transaction, I believe we could batch writes in 1
transaction request. This is intended to reduce ZK write pressure. Eg. we
are writing 100 znodes, putting it in one transaction request would reduce
write requests from 100 (single write request using create() or set()) to 1
write request (transaction), right? And does this reduce ZK server write
request pressure?

Could you help explain? I am looking forward to your reply. Thank you very
much!

Best,
-Huizhi

Re: ZK Transaction API multi()

Posted by Huizhi Lu <ih...@gmail.com>.

Hi Enrico,

Thanks for pointing me to BookKeeper. I've heard about it but haven't got a
chance to really try it yet. It looks very promising. We will definitely
evaluate this direction.


-Huizhi

On Mon, Jul 27, 2020 at 10:53 PM Enrico Olivelli <eo...@gmail.com>
wrote:

> Huizhi,
> If you want to achieve total atomic broadcast and have a greater throughput
> you can consider using Zookeeper brother Apache Bookkeeper, that is built
> over ZK, it is very lightweight and scalable (no central coordination
> servers).
>
> https://bookkeeper.apache.org
>
> Hope that helps
> Enrico
>
> Il Mar 28 Lug 2020, 01:43 Huizhi Lu <ih...@gmail.com> ha scritto:
>
> > Hi Ted,
> >
> > Thank you so much for the reply. Your suggestion is very valuable. I do
> > agree that we should migrate from ZK to a distributed DB for this high
> > number of writes. Due to legacy codebase and usage, it may not be that
> easy
> > for us to do that. So we are considering multi() as a short/mid term
> > solution. Finally we will move the excessive number of writes out of ZK
> to
> > achieve higher scalability.
> >
> > Lastly, I greatly appreciate your insightful explanation! FYI, I am very
> > happy to receive prompt replies from you, Ted!
> >
> > Best,
> > -Huizhi
> >
> > On Mon, Jul 27, 2020 at 4:30 PM Ted Dunning <te...@gmail.com>
> wrote:
> >
> > >
> > > This sounds like you are using ZK outside of the intended design. The
> > idea
> > > is that ZK is a coordination engine. If you have such high write rates
> > that
> > > ZK is dropping connections, you probably want a distributed database of
> > > some kind, perhaps one that uses ZK to coordinate itself. ZK is a form
> of
> > > replicated database, not a distributed one and, as such, the write rate
> > > doesn't scale and that is intentional.
> > >
> > > Even if multi() solves your immediate problem, it leaves the same
> problem
> > > in place at just a slightly higher scale. My own philosophy of scaling
> is
> > > that when you hit a problem, you should increase your scale by a large
> > > enough factor to give you time to solve some other problems or build
> new
> > > stuff before you have to fix your scaling problem again. Increasing
> scale
> > > by a factor of 2 rarely does this. I prefer to increase my scaling
> bounds
> > > by a factor of 10 or more so that I have some breathing space. I
> > remember a
> > > time in one startup where our system was on the edge of breaking and
> our
> > > traffic was doubling roughly every week. We had to improve our
> situation
> > by
> > > at least a factor of 10 each time we upgraded our systems just to stay
> in
> > > the same place. I can only hope you will have similar problems.
> > >
> > >
> > >
> > > On Mon, Jul 27, 2020 at 3:47 PM Huizhi Lu <ih...@gmail.com> wrote:
> > >
> > >> Hi Ted,
> > >>
> > >> Thank you very much for the reply! I didn't receive the reply in my
> > email
> > >> but I found it in ZK dev mail thread. So I could not reply directly to
> > the
> > >> thread.
> > >>
> > >> I really appreciate a reply from the original author of multi()! And
> > your
> > >> blog (A Tour of the Multi-update For Zookeeper) is very helpful with
> > >> understanding of multi(). Your reply helps convince my team that it
> is a
> > >> real transaction.
> > >>
> > >> Regarding my 2nd question, maybe I should have described a bit of our
> > >> challenge. When we have a large number of ZK write requests that cause
> > high
> > >> ZK write QPS, ZK sessions are expired by ZK. And this affects the
> > >> application's connection to ZK. We wonder if we could apply multi() to
> > >> batch the ZK write requests to reduce ZK write QPS so ZK wouldn't
> expire
> > >> sessions. So in this case, do you think we could still not apply
> > multi() to
> > >> achieve the purpose?
> > >>
> > >> Thank you, Ted!!
> > >>
> > >> On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ih...@gmail.com>
> wrote:
> > >>
> > >> > Hi Zookeeper Devs,
> > >> >
> > >> > Hope this email finds you well!
> > >> >
> > >> > I am working on some stuff that needs ZK multi(). I would like to
> > confirm a
> > >> > few things about this API.
> > >> >
> > >> > 1. Is this a real transaction operation in ZK? My understanding is,
> > it is a
> > >> > real transaction. If I put 3 write operations in this transaction
> > request,
> > >> > these 3 write operations are committed in 1 transaction with the
> same
> > zxid
> > >> > and 1 proposal. Observers should either see all the updates or none
> > of the
> > >> > updates. Observers should not see partial updates, eg. only 1 of
> the 3
> > >> > updates.
> > >> >
> > >>
> > >> Yes. The multi() is atomic. It will happen or not and the program
> > invoking
> > >> the operation will be told why or why not.
> > >>
> > >> 2. We have a case to write multiple znodes. Currently it is sending
> the
> > >>
> > >>
> > >> > requests one by one. With transaction, I believe we could batch
> > writes in 1
> > >> > transaction request. This is intended to reduce ZK write pressure.
> > Eg. we
> > >> > are writing 100 znodes, putting it in one transaction request would
> > reduce
> > >> > write requests from 100 (single write request using create() or
> > set()) to 1
> > >> > write request (transaction), right? And does this reduce ZK server
> > write
> > >> > request pressure?
> > >> >
> > >>
> > >> Not really. The way that multi() works is that it does a group commit.
> > >> There may be some economies in terms of number of network exchanges,
> but
> > >> the internal work of testing whether the operations will succeed is
> the
> > >> same. The point of multi() is to make use of and provide nuanced
> control
> > >> over the normal group commit that Zookeeper is doing anyway. It should
> > not
> > >> generally be viewed as an efficiency improvement.
> > >>
> > >> Hopefully this helps.
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ih...@gmail.com>
> wrote:
> > >>
> > >>> Hi Zookeeper Devs,
> > >>>
> > >>> Hope this email finds you well!
> > >>>
> > >>> I am working on some stuff that needs ZK multi(). I would like to
> > >>> confirm a few things about this API.
> > >>>
> > >>> 1. Is this a real transaction operation in ZK? My understanding is,
> it
> > >>> is a real transaction. If I put 3 write operations in this
> transaction
> > >>> request, these 3 write operations are committed in 1 transaction with
> > the
> > >>> same zxid and 1 proposal. Observers should either see all the updates
> > or
> > >>> none of the updates. Observers should not see partial updates, eg.
> > only 1
> > >>> of the 3 updates.
> > >>>
> > >>> 2. We have a case to write multiple znodes. Currently it is sending
> the
> > >>> requests one by one. With transaction, I believe we could batch
> writes
> > in 1
> > >>> transaction request. This is intended to reduce ZK write pressure.
> Eg.
> > we
> > >>> are writing 100 znodes, putting it in one transaction request would
> > reduce
> > >>> write requests from 100 (single write request using create() or
> set())
> > to 1
> > >>> write request (transaction), right? And does this reduce ZK server
> > write
> > >>> request pressure?
> > >>>
> > >>> Could you help explain? I am looking forward to your reply. Thank you
> > >>> very much!
> > >>>
> > >>> Best,
> > >>> -Huizhi
> > >>>
> > >>
> >
>

Re: ZK Transaction API multi()

Posted by Huizhi Lu <ih...@gmail.com>.

Hi Ted,

Again, greatly appreciate the insightful GC tuning tips! Though we've had
some GC tuning, I believe we still have something to do further based on
profiling.

I am so glad I've got so many valuable suggestions from the ZK community!

-Huizhi

On Tue, Jul 28, 2020 at 6:27 PM Ted Dunning <te...@gmail.com> wrote:

> Michael's suggestions are excellent, particularly the use of observers and
> the general warning to measure first.
>
> To expand his point about GC, consider moving to more recent JVM if GC is
> demonstrated to be the problem. To find out if it is a problem, turn on GC
> logging and see if GC delays correspond with long latency. If you are using
> an ancient JVM and you have GC delays, try moving to a more modern JVM and
> make sure that there is plenty of memory for ZK.
>
> Also, check to see if the machine that is running ZK might be
> oversubscribed. Noisy neighbors doing something intense can be nearly as
> bad as GC.
>
> To repeat and slightly rephrase Michael's (and old carpenters everywhere)
> advice, however, measure twice, cut once.
>
> On Tue, Jul 28, 2020 at 4:42 PM Michael Han <ha...@apache.org> wrote:
>
> > I agree with Ted's comments on the philosophy of scaling and the need to
> > recheck your use case to justify if ZooKeeper is the long term solution
> or
> > not.
> >
> > That said, I was in a similar position and had gone through similar
> > scaling challenges for ZooKeeper so I could probably provide some
> > suggestions which might serve as a short term solution.
> >
> > * Obvious ones - more powerful hardware with better IOPS and bigger
> memory.
> > * Run a modern version of ZooKeeper (3.5.5+ or 3.6.0+).
> > * Don't use participants to serve traffic. Use observers only.
> > * Tune SyncRequestProcessor to allow more throughput at a cost of higher
> > latency - specifically max batch size and flush delay.
> > * Tune CommitProcessor to favor more writes instead of reads (depends on
> > your actual workload).
> > * Consider using response cache to reduce pressure on JVM Eden space.
> > * JVM tuning - hard to provide concrete advice but the session expiration
> > is likely caused by JVM GC. try different options based on profiling and
> > workload characteristics.
> > * Client auditing - making sure all traffic from your client is
> > legitimate. This is often overlooked, but surprisingly prevalent as root
> > causes of ZK meltdown in practice from time to time.
> >
> > These are some general guidelines that might help. As with any
> performance
> > tuning, the general approach should scope your workload, do some
> profiling,
> > identify bottleneck(s), and apply tunings accordingly. Good luck.
> >
> > On Mon, Jul 27, 2020 at 10:53 PM Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> >> Huizhi,
> >> If you want to achieve total atomic broadcast and have a greater
> >> throughput
> >> you can consider using Zookeeper brother Apache Bookkeeper, that is
> built
> >> over ZK, it is very lightweight and scalable (no central coordination
> >> servers).
> >>
> >> https://bookkeeper.apache.org
> >>
> >> Hope that helps
> >> Enrico
> >>
> >> Il Mar 28 Lug 2020, 01:43 Huizhi Lu <ih...@gmail.com> ha scritto:
> >>
> >> > Hi Ted,
> >> >
> >> > Thank you so much for the reply. Your suggestion is very valuable. I
> do
> >> > agree that we should migrate from ZK to a distributed DB for this high
> >> > number of writes. Due to legacy codebase and usage, it may not be that
> >> easy
> >> > for us to do that. So we are considering multi() as a short/mid term
> >> > solution. Finally we will move the excessive number of writes out of
> ZK
> >> to
> >> > achieve higher scalability.
> >> >
> >> > Lastly, I greatly appreciate your insightful explanation! FYI, I am
> very
> >> > happy to receive prompt replies from you, Ted!
> >> >
> >> > Best,
> >> > -Huizhi
> >> >
> >> > On Mon, Jul 27, 2020 at 4:30 PM Ted Dunning <te...@gmail.com>
> >> wrote:
> >> >
> >> > >
> >> > > This sounds like you are using ZK outside of the intended design.
> The
> >> > idea
> >> > > is that ZK is a coordination engine. If you have such high write
> rates
> >> > that
> >> > > ZK is dropping connections, you probably want a distributed database
> >> of
> >> > > some kind, perhaps one that uses ZK to coordinate itself. ZK is a
> >> form of
> >> > > replicated database, not a distributed one and, as such, the write
> >> rate
> >> > > doesn't scale and that is intentional.
> >> > >
> >> > > Even if multi() solves your immediate problem, it leaves the same
> >> problem
> >> > > in place at just a slightly higher scale. My own philosophy of
> >> scaling is
> >> > > that when you hit a problem, you should increase your scale by a
> large
> >> > > enough factor to give you time to solve some other problems or build
> >> new
> >> > > stuff before you have to fix your scaling problem again. Increasing
> >> scale
> >> > > by a factor of 2 rarely does this. I prefer to increase my scaling
> >> bounds
> >> > > by a factor of 10 or more so that I have some breathing space. I
> >> > remember a
> >> > > time in one startup where our system was on the edge of breaking and
> >> our
> >> > > traffic was doubling roughly every week. We had to improve our
> >> situation
> >> > by
> >> > > at least a factor of 10 each time we upgraded our systems just to
> >> stay in
> >> > > the same place. I can only hope you will have similar problems.
> >> > >
> >> > >
> >> > >
> >> > > On Mon, Jul 27, 2020 at 3:47 PM Huizhi Lu <ih...@gmail.com>
> >> wrote:
> >> > >
> >> > >> Hi Ted,
> >> > >>
> >> > >> Thank you very much for the reply! I didn't receive the reply in my
> >> > email
> >> > >> but I found it in ZK dev mail thread. So I could not reply directly
> >> to
> >> > the
> >> > >> thread.
> >> > >>
> >> > >> I really appreciate a reply from the original author of multi()!
> And
> >> > your
> >> > >> blog (A Tour of the Multi-update For Zookeeper) is very helpful
> with
> >> > >> understanding of multi(). Your reply helps convince my team that it
> >> is a
> >> > >> real transaction.
> >> > >>
> >> > >> Regarding my 2nd question, maybe I should have described a bit of
> our
> >> > >> challenge. When we have a large number of ZK write requests that
> >> cause
> >> > high
> >> > >> ZK write QPS, ZK sessions are expired by ZK. And this affects the
> >> > >> application's connection to ZK. We wonder if we could apply multi()
> >> to
> >> > >> batch the ZK write requests to reduce ZK write QPS so ZK wouldn't
> >> expire
> >> > >> sessions. So in this case, do you think we could still not apply
> >> > multi() to
> >> > >> achieve the purpose?
> >> > >>
> >> > >> Thank you, Ted!!
> >> > >>
> >> > >> On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ih...@gmail.com>
> >> wrote:
> >> > >>
> >> > >> > Hi Zookeeper Devs,
> >> > >> >
> >> > >> > Hope this email finds you well!
> >> > >> >
> >> > >> > I am working on some stuff that needs ZK multi(). I would like to
> >> > confirm a
> >> > >> > few things about this API.
> >> > >> >
> >> > >> > 1. Is this a real transaction operation in ZK? My understanding
> is,
> >> > it is a
> >> > >> > real transaction. If I put 3 write operations in this transaction
> >> > request,
> >> > >> > these 3 write operations are committed in 1 transaction with the
> >> same
> >> > zxid
> >> > >> > and 1 proposal. Observers should either see all the updates or
> none
> >> > of the
> >> > >> > updates. Observers should not see partial updates, eg. only 1 of
> >> the 3
> >> > >> > updates.
> >> > >> >
> >> > >>
> >> > >> Yes. The multi() is atomic. It will happen or not and the program
> >> > invoking
> >> > >> the operation will be told why or why not.
> >> > >>
> >> > >> 2. We have a case to write multiple znodes. Currently it is sending
> >> the
> >> > >>
> >> > >>
> >> > >> > requests one by one. With transaction, I believe we could batch
> >> > writes in 1
> >> > >> > transaction request. This is intended to reduce ZK write
> pressure.
> >> > Eg. we
> >> > >> > are writing 100 znodes, putting it in one transaction request
> would
> >> > reduce
> >> > >> > write requests from 100 (single write request using create() or
> >> > set()) to 1
> >> > >> > write request (transaction), right? And does this reduce ZK
> server
> >> > write
> >> > >> > request pressure?
> >> > >> >
> >> > >>
> >> > >> Not really. The way that multi() works is that it does a group
> >> commit.
> >> > >> There may be some economies in terms of number of network
> exchanges,
> >> but
> >> > >> the internal work of testing whether the operations will succeed is
> >> the
> >> > >> same. The point of multi() is to make use of and provide nuanced
> >> control
> >> > >> over the normal group commit that Zookeeper is doing anyway. It
> >> should
> >> > not
> >> > >> generally be viewed as an efficiency improvement.
> >> > >>
> >> > >> Hopefully this helps.
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ih...@gmail.com>
> >> wrote:
> >> > >>
> >> > >>> Hi Zookeeper Devs,
> >> > >>>
> >> > >>> Hope this email finds you well!
> >> > >>>
> >> > >>> I am working on some stuff that needs ZK multi(). I would like to
> >> > >>> confirm a few things about this API.
> >> > >>>
> >> > >>> 1. Is this a real transaction operation in ZK? My understanding
> is,
> >> it
> >> > >>> is a real transaction. If I put 3 write operations in this
> >> transaction
> >> > >>> request, these 3 write operations are committed in 1 transaction
> >> with
> >> > the
> >> > >>> same zxid and 1 proposal. Observers should either see all the
> >> updates
> >> > or
> >> > >>> none of the updates. Observers should not see partial updates, eg.
> >> > only 1
> >> > >>> of the 3 updates.
> >> > >>>
> >> > >>> 2. We have a case to write multiple znodes. Currently it is
> sending
> >> the
> >> > >>> requests one by one. With transaction, I believe we could batch
> >> writes
> >> > in 1
> >> > >>> transaction request. This is intended to reduce ZK write pressure.
> >> Eg.
> >> > we
> >> > >>> are writing 100 znodes, putting it in one transaction request
> would
> >> > reduce
> >> > >>> write requests from 100 (single write request using create() or
> >> set())
> >> > to 1
> >> > >>> write request (transaction), right? And does this reduce ZK server
> >> > write
> >> > >>> request pressure?
> >> > >>>
> >> > >>> Could you help explain? I am looking forward to your reply. Thank
> >> you
> >> > >>> very much!
> >> > >>>
> >> > >>> Best,
> >> > >>> -Huizhi
> >> > >>>
> >> > >>
> >> >
> >>
> >
>

Re: ZK Transaction API multi()

Posted by Ted Dunning <te...@gmail.com>.

Michael's suggestions are excellent, particularly the use of observers and
the general warning to measure first.

To expand his point about GC, consider moving to more recent JVM if GC is
demonstrated to be the problem. To find out if it is a problem, turn on GC
logging and see if GC delays correspond with long latency. If you are using
an ancient JVM and you have GC delays, try moving to a more modern JVM and
make sure that there is plenty of memory for ZK.

Also, check to see if the machine that is running ZK might be
oversubscribed. Noisy neighbors doing something intense can be nearly as
bad as GC.

To repeat and slightly rephrase Michael's (and old carpenters everywhere)
advice, however, measure twice, cut once.

On Tue, Jul 28, 2020 at 4:42 PM Michael Han <ha...@apache.org> wrote:

> I agree with Ted's comments on the philosophy of scaling and the need to
> recheck your use case to justify if ZooKeeper is the long term solution or
> not.
>
> That said, I was in a similar position and had gone through similar
> scaling challenges for ZooKeeper so I could probably provide some
> suggestions which might serve as a short term solution.
>
> * Obvious ones - more powerful hardware with better IOPS and bigger memory.
> * Run a modern version of ZooKeeper (3.5.5+ or 3.6.0+).
> * Don't use participants to serve traffic. Use observers only.
> * Tune SyncRequestProcessor to allow more throughput at a cost of higher
> latency - specifically max batch size and flush delay.
> * Tune CommitProcessor to favor more writes instead of reads (depends on
> your actual workload).
> * Consider using response cache to reduce pressure on JVM Eden space.
> * JVM tuning - hard to provide concrete advice but the session expiration
> is likely caused by JVM GC. try different options based on profiling and
> workload characteristics.
> * Client auditing - making sure all traffic from your client is
> legitimate. This is often overlooked, but surprisingly prevalent as root
> causes of ZK meltdown in practice from time to time.
>
> These are some general guidelines that might help. As with any performance
> tuning, the general approach should scope your workload, do some profiling,
> identify bottleneck(s), and apply tunings accordingly. Good luck.
>
> On Mon, Jul 27, 2020 at 10:53 PM Enrico Olivelli <eo...@gmail.com>
> wrote:
>
>> Huizhi,
>> If you want to achieve total atomic broadcast and have a greater
>> throughput
>> you can consider using Zookeeper brother Apache Bookkeeper, that is built
>> over ZK, it is very lightweight and scalable (no central coordination
>> servers).
>>
>> https://bookkeeper.apache.org
>>
>> Hope that helps
>> Enrico
>>
>> Il Mar 28 Lug 2020, 01:43 Huizhi Lu <ih...@gmail.com> ha scritto:
>>
>> > Hi Ted,
>> >
>> > Thank you so much for the reply. Your suggestion is very valuable. I do
>> > agree that we should migrate from ZK to a distributed DB for this high
>> > number of writes. Due to legacy codebase and usage, it may not be that
>> easy
>> > for us to do that. So we are considering multi() as a short/mid term
>> > solution. Finally we will move the excessive number of writes out of ZK
>> to
>> > achieve higher scalability.
>> >
>> > Lastly, I greatly appreciate your insightful explanation! FYI, I am very
>> > happy to receive prompt replies from you, Ted!
>> >
>> > Best,
>> > -Huizhi
>> >
>> > On Mon, Jul 27, 2020 at 4:30 PM Ted Dunning <te...@gmail.com>
>> wrote:
>> >
>> > >
>> > > This sounds like you are using ZK outside of the intended design. The
>> > idea
>> > > is that ZK is a coordination engine. If you have such high write rates
>> > that
>> > > ZK is dropping connections, you probably want a distributed database
>> of
>> > > some kind, perhaps one that uses ZK to coordinate itself. ZK is a
>> form of
>> > > replicated database, not a distributed one and, as such, the write
>> rate
>> > > doesn't scale and that is intentional.
>> > >
>> > > Even if multi() solves your immediate problem, it leaves the same
>> problem
>> > > in place at just a slightly higher scale. My own philosophy of
>> scaling is
>> > > that when you hit a problem, you should increase your scale by a large
>> > > enough factor to give you time to solve some other problems or build
>> new
>> > > stuff before you have to fix your scaling problem again. Increasing
>> scale
>> > > by a factor of 2 rarely does this. I prefer to increase my scaling
>> bounds
>> > > by a factor of 10 or more so that I have some breathing space. I
>> > remember a
>> > > time in one startup where our system was on the edge of breaking and
>> our
>> > > traffic was doubling roughly every week. We had to improve our
>> situation
>> > by
>> > > at least a factor of 10 each time we upgraded our systems just to
>> stay in
>> > > the same place. I can only hope you will have similar problems.
>> > >
>> > >
>> > >
>> > > On Mon, Jul 27, 2020 at 3:47 PM Huizhi Lu <ih...@gmail.com>
>> wrote:
>> > >
>> > >> Hi Ted,
>> > >>
>> > >> Thank you very much for the reply! I didn't receive the reply in my
>> > email
>> > >> but I found it in ZK dev mail thread. So I could not reply directly
>> to
>> > the
>> > >> thread.
>> > >>
>> > >> I really appreciate a reply from the original author of multi()! And
>> > your
>> > >> blog (A Tour of the Multi-update For Zookeeper) is very helpful with
>> > >> understanding of multi(). Your reply helps convince my team that it
>> is a
>> > >> real transaction.
>> > >>
>> > >> Regarding my 2nd question, maybe I should have described a bit of our
>> > >> challenge. When we have a large number of ZK write requests that
>> cause
>> > high
>> > >> ZK write QPS, ZK sessions are expired by ZK. And this affects the
>> > >> application's connection to ZK. We wonder if we could apply multi()
>> to
>> > >> batch the ZK write requests to reduce ZK write QPS so ZK wouldn't
>> expire
>> > >> sessions. So in this case, do you think we could still not apply
>> > multi() to
>> > >> achieve the purpose?
>> > >>
>> > >> Thank you, Ted!!
>> > >>
>> > >> On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ih...@gmail.com>
>> wrote:
>> > >>
>> > >> > Hi Zookeeper Devs,
>> > >> >
>> > >> > Hope this email finds you well!
>> > >> >
>> > >> > I am working on some stuff that needs ZK multi(). I would like to
>> > confirm a
>> > >> > few things about this API.
>> > >> >
>> > >> > 1. Is this a real transaction operation in ZK? My understanding is,
>> > it is a
>> > >> > real transaction. If I put 3 write operations in this transaction
>> > request,
>> > >> > these 3 write operations are committed in 1 transaction with the
>> same
>> > zxid
>> > >> > and 1 proposal. Observers should either see all the updates or none
>> > of the
>> > >> > updates. Observers should not see partial updates, eg. only 1 of
>> the 3
>> > >> > updates.
>> > >> >
>> > >>
>> > >> Yes. The multi() is atomic. It will happen or not and the program
>> > invoking
>> > >> the operation will be told why or why not.
>> > >>
>> > >> 2. We have a case to write multiple znodes. Currently it is sending
>> the
>> > >>
>> > >>
>> > >> > requests one by one. With transaction, I believe we could batch
>> > writes in 1
>> > >> > transaction request. This is intended to reduce ZK write pressure.
>> > Eg. we
>> > >> > are writing 100 znodes, putting it in one transaction request would
>> > reduce
>> > >> > write requests from 100 (single write request using create() or
>> > set()) to 1
>> > >> > write request (transaction), right? And does this reduce ZK server
>> > write
>> > >> > request pressure?
>> > >> >
>> > >>
>> > >> Not really. The way that multi() works is that it does a group
>> commit.
>> > >> There may be some economies in terms of number of network exchanges,
>> but
>> > >> the internal work of testing whether the operations will succeed is
>> the
>> > >> same. The point of multi() is to make use of and provide nuanced
>> control
>> > >> over the normal group commit that Zookeeper is doing anyway. It
>> should
>> > not
>> > >> generally be viewed as an efficiency improvement.
>> > >>
>> > >> Hopefully this helps.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ih...@gmail.com>
>> wrote:
>> > >>
>> > >>> Hi Zookeeper Devs,
>> > >>>
>> > >>> Hope this email finds you well!
>> > >>>
>> > >>> I am working on some stuff that needs ZK multi(). I would like to
>> > >>> confirm a few things about this API.
>> > >>>
>> > >>> 1. Is this a real transaction operation in ZK? My understanding is,
>> it
>> > >>> is a real transaction. If I put 3 write operations in this
>> transaction
>> > >>> request, these 3 write operations are committed in 1 transaction
>> with
>> > the
>> > >>> same zxid and 1 proposal. Observers should either see all the
>> updates
>> > or
>> > >>> none of the updates. Observers should not see partial updates, eg.
>> > only 1
>> > >>> of the 3 updates.
>> > >>>
>> > >>> 2. We have a case to write multiple znodes. Currently it is sending
>> the
>> > >>> requests one by one. With transaction, I believe we could batch
>> writes
>> > in 1
>> > >>> transaction request. This is intended to reduce ZK write pressure.
>> Eg.
>> > we
>> > >>> are writing 100 znodes, putting it in one transaction request would
>> > reduce
>> > >>> write requests from 100 (single write request using create() or
>> set())
>> > to 1
>> > >>> write request (transaction), right? And does this reduce ZK server
>> > write
>> > >>> request pressure?
>> > >>>
>> > >>> Could you help explain? I am looking forward to your reply. Thank
>> you
>> > >>> very much!
>> > >>>
>> > >>> Best,
>> > >>> -Huizhi
>> > >>>
>> > >>
>> >
>>
>

Re: ZK Transaction API multi()

Posted by Huizhi Lu <ih...@gmail.com>.

Hi Michael,

Thanks very much for the awesome suggestions! These are brilliant!
Personally I've learned the experience.
We've had some tunings, eg. GC tuning, adding observers, etc.. But
internally on Processors, we don't touch those.

And now we are in the process of upgrading ZK from 3.4+ to 3.6.0+, which
already proves significant performance and stability gain for us.
As Ted also mentioned, though we are now solving the pain as a short term,
we should target a long term solution that scales more.


-Huizhi

On Tue, Jul 28, 2020 at 4:42 PM Michael Han <ha...@apache.org> wrote:

> I agree with Ted's comments on the philosophy of scaling and the need to
> recheck your use case to justify if ZooKeeper is the long term solution or
> not.
>
> That said, I was in a similar position and had gone through similar scaling
> challenges for ZooKeeper so I could probably provide some suggestions which
> might serve as a short term solution.
>
> * Obvious ones - more powerful hardware with better IOPS and bigger memory.
> * Run a modern version of ZooKeeper (3.5.5+ or 3.6.0+).
> * Don't use participants to serve traffic. Use observers only.
> * Tune SyncRequestProcessor to allow more throughput at a cost of higher
> latency - specifically max batch size and flush delay.
> * Tune CommitProcessor to favor more writes instead of reads (depends on
> your actual workload).
> * Consider using response cache to reduce pressure on JVM Eden space.
> * JVM tuning - hard to provide concrete advice but the session expiration
> is likely caused by JVM GC. try different options based on profiling and
> workload characteristics.
> * Client auditing - making sure all traffic from your client is legitimate.
> This is often overlooked, but surprisingly prevalent as root causes of ZK
> meltdown in practice from time to time.
>
> These are some general guidelines that might help. As with any performance
> tuning, the general approach should scope your workload, do some profiling,
> identify bottleneck(s), and apply tunings accordingly. Good luck.
>
> On Mon, Jul 27, 2020 at 10:53 PM Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Huizhi,
> > If you want to achieve total atomic broadcast and have a greater
> throughput
> > you can consider using Zookeeper brother Apache Bookkeeper, that is built
> > over ZK, it is very lightweight and scalable (no central coordination
> > servers).
> >
> > https://bookkeeper.apache.org
> >
> > Hope that helps
> > Enrico
> >
> > Il Mar 28 Lug 2020, 01:43 Huizhi Lu <ih...@gmail.com> ha scritto:
> >
> > > Hi Ted,
> > >
> > > Thank you so much for the reply. Your suggestion is very valuable. I do
> > > agree that we should migrate from ZK to a distributed DB for this high
> > > number of writes. Due to legacy codebase and usage, it may not be that
> > easy
> > > for us to do that. So we are considering multi() as a short/mid term
> > > solution. Finally we will move the excessive number of writes out of ZK
> > to
> > > achieve higher scalability.
> > >
> > > Lastly, I greatly appreciate your insightful explanation! FYI, I am
> very
> > > happy to receive prompt replies from you, Ted!
> > >
> > > Best,
> > > -Huizhi
> > >
> > > On Mon, Jul 27, 2020 at 4:30 PM Ted Dunning <te...@gmail.com>
> > wrote:
> > >
> > > >
> > > > This sounds like you are using ZK outside of the intended design. The
> > > idea
> > > > is that ZK is a coordination engine. If you have such high write
> rates
> > > that
> > > > ZK is dropping connections, you probably want a distributed database
> of
> > > > some kind, perhaps one that uses ZK to coordinate itself. ZK is a
> form
> > of
> > > > replicated database, not a distributed one and, as such, the write
> rate
> > > > doesn't scale and that is intentional.
> > > >
> > > > Even if multi() solves your immediate problem, it leaves the same
> > problem
> > > > in place at just a slightly higher scale. My own philosophy of
> scaling
> > is
> > > > that when you hit a problem, you should increase your scale by a
> large
> > > > enough factor to give you time to solve some other problems or build
> > new
> > > > stuff before you have to fix your scaling problem again. Increasing
> > scale
> > > > by a factor of 2 rarely does this. I prefer to increase my scaling
> > bounds
> > > > by a factor of 10 or more so that I have some breathing space. I
> > > remember a
> > > > time in one startup where our system was on the edge of breaking and
> > our
> > > > traffic was doubling roughly every week. We had to improve our
> > situation
> > > by
> > > > at least a factor of 10 each time we upgraded our systems just to
> stay
> > in
> > > > the same place. I can only hope you will have similar problems.
> > > >
> > > >
> > > >
> > > > On Mon, Jul 27, 2020 at 3:47 PM Huizhi Lu <ih...@gmail.com>
> wrote:
> > > >
> > > >> Hi Ted,
> > > >>
> > > >> Thank you very much for the reply! I didn't receive the reply in my
> > > email
> > > >> but I found it in ZK dev mail thread. So I could not reply directly
> to
> > > the
> > > >> thread.
> > > >>
> > > >> I really appreciate a reply from the original author of multi()! And
> > > your
> > > >> blog (A Tour of the Multi-update For Zookeeper) is very helpful with
> > > >> understanding of multi(). Your reply helps convince my team that it
> > is a
> > > >> real transaction.
> > > >>
> > > >> Regarding my 2nd question, maybe I should have described a bit of
> our
> > > >> challenge. When we have a large number of ZK write requests that
> cause
> > > high
> > > >> ZK write QPS, ZK sessions are expired by ZK. And this affects the
> > > >> application's connection to ZK. We wonder if we could apply multi()
> to
> > > >> batch the ZK write requests to reduce ZK write QPS so ZK wouldn't
> > expire
> > > >> sessions. So in this case, do you think we could still not apply
> > > multi() to
> > > >> achieve the purpose?
> > > >>
> > > >> Thank you, Ted!!
> > > >>
> > > >> On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ih...@gmail.com>
> > wrote:
> > > >>
> > > >> > Hi Zookeeper Devs,
> > > >> >
> > > >> > Hope this email finds you well!
> > > >> >
> > > >> > I am working on some stuff that needs ZK multi(). I would like to
> > > confirm a
> > > >> > few things about this API.
> > > >> >
> > > >> > 1. Is this a real transaction operation in ZK? My understanding
> is,
> > > it is a
> > > >> > real transaction. If I put 3 write operations in this transaction
> > > request,
> > > >> > these 3 write operations are committed in 1 transaction with the
> > same
> > > zxid
> > > >> > and 1 proposal. Observers should either see all the updates or
> none
> > > of the
> > > >> > updates. Observers should not see partial updates, eg. only 1 of
> > the 3
> > > >> > updates.
> > > >> >
> > > >>
> > > >> Yes. The multi() is atomic. It will happen or not and the program
> > > invoking
> > > >> the operation will be told why or why not.
> > > >>
> > > >> 2. We have a case to write multiple znodes. Currently it is sending
> > the
> > > >>
> > > >>
> > > >> > requests one by one. With transaction, I believe we could batch
> > > writes in 1
> > > >> > transaction request. This is intended to reduce ZK write pressure.
> > > Eg. we
> > > >> > are writing 100 znodes, putting it in one transaction request
> would
> > > reduce
> > > >> > write requests from 100 (single write request using create() or
> > > set()) to 1
> > > >> > write request (transaction), right? And does this reduce ZK server
> > > write
> > > >> > request pressure?
> > > >> >
> > > >>
> > > >> Not really. The way that multi() works is that it does a group
> commit.
> > > >> There may be some economies in terms of number of network exchanges,
> > but
> > > >> the internal work of testing whether the operations will succeed is
> > the
> > > >> same. The point of multi() is to make use of and provide nuanced
> > control
> > > >> over the normal group commit that Zookeeper is doing anyway. It
> should
> > > not
> > > >> generally be viewed as an efficiency improvement.
> > > >>
> > > >> Hopefully this helps.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ih...@gmail.com>
> > wrote:
> > > >>
> > > >>> Hi Zookeeper Devs,
> > > >>>
> > > >>> Hope this email finds you well!
> > > >>>
> > > >>> I am working on some stuff that needs ZK multi(). I would like to
> > > >>> confirm a few things about this API.
> > > >>>
> > > >>> 1. Is this a real transaction operation in ZK? My understanding is,
> > it
> > > >>> is a real transaction. If I put 3 write operations in this
> > transaction
> > > >>> request, these 3 write operations are committed in 1 transaction
> with
> > > the
> > > >>> same zxid and 1 proposal. Observers should either see all the
> updates
> > > or
> > > >>> none of the updates. Observers should not see partial updates, eg.
> > > only 1
> > > >>> of the 3 updates.
> > > >>>
> > > >>> 2. We have a case to write multiple znodes. Currently it is sending
> > the
> > > >>> requests one by one. With transaction, I believe we could batch
> > writes
> > > in 1
> > > >>> transaction request. This is intended to reduce ZK write pressure.
> > Eg.
> > > we
> > > >>> are writing 100 znodes, putting it in one transaction request would
> > > reduce
> > > >>> write requests from 100 (single write request using create() or
> > set())
> > > to 1
> > > >>> write request (transaction), right? And does this reduce ZK server
> > > write
> > > >>> request pressure?
> > > >>>
> > > >>> Could you help explain? I am looking forward to your reply. Thank
> you
> > > >>> very much!
> > > >>>
> > > >>> Best,
> > > >>> -Huizhi
> > > >>>
> > > >>
> > >
> >
>

Re: ZK Transaction API multi()

Posted by Michael Han <ha...@apache.org>.

I agree with Ted's comments on the philosophy of scaling and the need to
recheck your use case to justify if ZooKeeper is the long term solution or
not.

That said, I was in a similar position and had gone through similar scaling
challenges for ZooKeeper so I could probably provide some suggestions which
might serve as a short term solution.

* Obvious ones - more powerful hardware with better IOPS and bigger memory.
* Run a modern version of ZooKeeper (3.5.5+ or 3.6.0+).
* Don't use participants to serve traffic. Use observers only.
* Tune SyncRequestProcessor to allow more throughput at a cost of higher
latency - specifically max batch size and flush delay.
* Tune CommitProcessor to favor more writes instead of reads (depends on
your actual workload).
* Consider using response cache to reduce pressure on JVM Eden space.
* JVM tuning - hard to provide concrete advice but the session expiration
is likely caused by JVM GC. try different options based on profiling and
workload characteristics.
* Client auditing - making sure all traffic from your client is legitimate.
This is often overlooked, but surprisingly prevalent as root causes of ZK
meltdown in practice from time to time.

These are some general guidelines that might help. As with any performance
tuning, the general approach should scope your workload, do some profiling,
identify bottleneck(s), and apply tunings accordingly. Good luck.

On Mon, Jul 27, 2020 at 10:53 PM Enrico Olivelli <eo...@gmail.com>
wrote:

> Huizhi,
> If you want to achieve total atomic broadcast and have a greater throughput
> you can consider using Zookeeper brother Apache Bookkeeper, that is built
> over ZK, it is very lightweight and scalable (no central coordination
> servers).
>
> https://bookkeeper.apache.org
>
> Hope that helps
> Enrico
>
> Il Mar 28 Lug 2020, 01:43 Huizhi Lu <ih...@gmail.com> ha scritto:
>
> > Hi Ted,
> >
> > Thank you so much for the reply. Your suggestion is very valuable. I do
> > agree that we should migrate from ZK to a distributed DB for this high
> > number of writes. Due to legacy codebase and usage, it may not be that
> easy
> > for us to do that. So we are considering multi() as a short/mid term
> > solution. Finally we will move the excessive number of writes out of ZK
> to
> > achieve higher scalability.
> >
> > Lastly, I greatly appreciate your insightful explanation! FYI, I am very
> > happy to receive prompt replies from you, Ted!
> >
> > Best,
> > -Huizhi
> >
> > On Mon, Jul 27, 2020 at 4:30 PM Ted Dunning <te...@gmail.com>
> wrote:
> >
> > >
> > > This sounds like you are using ZK outside of the intended design. The
> > idea
> > > is that ZK is a coordination engine. If you have such high write rates
> > that
> > > ZK is dropping connections, you probably want a distributed database of
> > > some kind, perhaps one that uses ZK to coordinate itself. ZK is a form
> of
> > > replicated database, not a distributed one and, as such, the write rate
> > > doesn't scale and that is intentional.
> > >
> > > Even if multi() solves your immediate problem, it leaves the same
> problem
> > > in place at just a slightly higher scale. My own philosophy of scaling
> is
> > > that when you hit a problem, you should increase your scale by a large
> > > enough factor to give you time to solve some other problems or build
> new
> > > stuff before you have to fix your scaling problem again. Increasing
> scale
> > > by a factor of 2 rarely does this. I prefer to increase my scaling
> bounds
> > > by a factor of 10 or more so that I have some breathing space. I
> > remember a
> > > time in one startup where our system was on the edge of breaking and
> our
> > > traffic was doubling roughly every week. We had to improve our
> situation
> > by
> > > at least a factor of 10 each time we upgraded our systems just to stay
> in
> > > the same place. I can only hope you will have similar problems.
> > >
> > >
> > >
> > > On Mon, Jul 27, 2020 at 3:47 PM Huizhi Lu <ih...@gmail.com> wrote:
> > >
> > >> Hi Ted,
> > >>
> > >> Thank you very much for the reply! I didn't receive the reply in my
> > email
> > >> but I found it in ZK dev mail thread. So I could not reply directly to
> > the
> > >> thread.
> > >>
> > >> I really appreciate a reply from the original author of multi()! And
> > your
> > >> blog (A Tour of the Multi-update For Zookeeper) is very helpful with
> > >> understanding of multi(). Your reply helps convince my team that it
> is a
> > >> real transaction.
> > >>
> > >> Regarding my 2nd question, maybe I should have described a bit of our
> > >> challenge. When we have a large number of ZK write requests that cause
> > high
> > >> ZK write QPS, ZK sessions are expired by ZK. And this affects the
> > >> application's connection to ZK. We wonder if we could apply multi() to
> > >> batch the ZK write requests to reduce ZK write QPS so ZK wouldn't
> expire
> > >> sessions. So in this case, do you think we could still not apply
> > multi() to
> > >> achieve the purpose?
> > >>
> > >> Thank you, Ted!!
> > >>
> > >> On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ih...@gmail.com>
> wrote:
> > >>
> > >> > Hi Zookeeper Devs,
> > >> >
> > >> > Hope this email finds you well!
> > >> >
> > >> > I am working on some stuff that needs ZK multi(). I would like to
> > confirm a
> > >> > few things about this API.
> > >> >
> > >> > 1. Is this a real transaction operation in ZK? My understanding is,
> > it is a
> > >> > real transaction. If I put 3 write operations in this transaction
> > request,
> > >> > these 3 write operations are committed in 1 transaction with the
> same
> > zxid
> > >> > and 1 proposal. Observers should either see all the updates or none
> > of the
> > >> > updates. Observers should not see partial updates, eg. only 1 of
> the 3
> > >> > updates.
> > >> >
> > >>
> > >> Yes. The multi() is atomic. It will happen or not and the program
> > invoking
> > >> the operation will be told why or why not.
> > >>
> > >> 2. We have a case to write multiple znodes. Currently it is sending
> the
> > >>
> > >>
> > >> > requests one by one. With transaction, I believe we could batch
> > writes in 1
> > >> > transaction request. This is intended to reduce ZK write pressure.
> > Eg. we
> > >> > are writing 100 znodes, putting it in one transaction request would
> > reduce
> > >> > write requests from 100 (single write request using create() or
> > set()) to 1
> > >> > write request (transaction), right? And does this reduce ZK server
> > write
> > >> > request pressure?
> > >> >
> > >>
> > >> Not really. The way that multi() works is that it does a group commit.
> > >> There may be some economies in terms of number of network exchanges,
> but
> > >> the internal work of testing whether the operations will succeed is
> the
> > >> same. The point of multi() is to make use of and provide nuanced
> control
> > >> over the normal group commit that Zookeeper is doing anyway. It should
> > not
> > >> generally be viewed as an efficiency improvement.
> > >>
> > >> Hopefully this helps.
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ih...@gmail.com>
> wrote:
> > >>
> > >>> Hi Zookeeper Devs,
> > >>>
> > >>> Hope this email finds you well!
> > >>>
> > >>> I am working on some stuff that needs ZK multi(). I would like to
> > >>> confirm a few things about this API.
> > >>>
> > >>> 1. Is this a real transaction operation in ZK? My understanding is,
> it
> > >>> is a real transaction. If I put 3 write operations in this
> transaction
> > >>> request, these 3 write operations are committed in 1 transaction with
> > the
> > >>> same zxid and 1 proposal. Observers should either see all the updates
> > or
> > >>> none of the updates. Observers should not see partial updates, eg.
> > only 1
> > >>> of the 3 updates.
> > >>>
> > >>> 2. We have a case to write multiple znodes. Currently it is sending
> the
> > >>> requests one by one. With transaction, I believe we could batch
> writes
> > in 1
> > >>> transaction request. This is intended to reduce ZK write pressure.
> Eg.
> > we
> > >>> are writing 100 znodes, putting it in one transaction request would
> > reduce
> > >>> write requests from 100 (single write request using create() or
> set())
> > to 1
> > >>> write request (transaction), right? And does this reduce ZK server
> > write
> > >>> request pressure?
> > >>>
> > >>> Could you help explain? I am looking forward to your reply. Thank you
> > >>> very much!
> > >>>
> > >>> Best,
> > >>> -Huizhi
> > >>>
> > >>
> >
>

Re: ZK Transaction API multi()

Posted by Enrico Olivelli <eo...@gmail.com>.

Huizhi,
If you want to achieve total atomic broadcast and have a greater throughput
you can consider using Zookeeper brother Apache Bookkeeper, that is built
over ZK, it is very lightweight and scalable (no central coordination
servers).

https://bookkeeper.apache.org

Hope that helps
Enrico

Il Mar 28 Lug 2020, 01:43 Huizhi Lu <ih...@gmail.com> ha scritto:

> Hi Ted,
>
> Thank you so much for the reply. Your suggestion is very valuable. I do
> agree that we should migrate from ZK to a distributed DB for this high
> number of writes. Due to legacy codebase and usage, it may not be that easy
> for us to do that. So we are considering multi() as a short/mid term
> solution. Finally we will move the excessive number of writes out of ZK to
> achieve higher scalability.
>
> Lastly, I greatly appreciate your insightful explanation! FYI, I am very
> happy to receive prompt replies from you, Ted!
>
> Best,
> -Huizhi
>
> On Mon, Jul 27, 2020 at 4:30 PM Ted Dunning <te...@gmail.com> wrote:
>
> >
> > This sounds like you are using ZK outside of the intended design. The
> idea
> > is that ZK is a coordination engine. If you have such high write rates
> that
> > ZK is dropping connections, you probably want a distributed database of
> > some kind, perhaps one that uses ZK to coordinate itself. ZK is a form of
> > replicated database, not a distributed one and, as such, the write rate
> > doesn't scale and that is intentional.
> >
> > Even if multi() solves your immediate problem, it leaves the same problem
> > in place at just a slightly higher scale. My own philosophy of scaling is
> > that when you hit a problem, you should increase your scale by a large
> > enough factor to give you time to solve some other problems or build new
> > stuff before you have to fix your scaling problem again. Increasing scale
> > by a factor of 2 rarely does this. I prefer to increase my scaling bounds
> > by a factor of 10 or more so that I have some breathing space. I
> remember a
> > time in one startup where our system was on the edge of breaking and our
> > traffic was doubling roughly every week. We had to improve our situation
> by
> > at least a factor of 10 each time we upgraded our systems just to stay in
> > the same place. I can only hope you will have similar problems.
> >
> >
> >
> > On Mon, Jul 27, 2020 at 3:47 PM Huizhi Lu <ih...@gmail.com> wrote:
> >
> >> Hi Ted,
> >>
> >> Thank you very much for the reply! I didn't receive the reply in my
> email
> >> but I found it in ZK dev mail thread. So I could not reply directly to
> the
> >> thread.
> >>
> >> I really appreciate a reply from the original author of multi()! And
> your
> >> blog (A Tour of the Multi-update For Zookeeper) is very helpful with
> >> understanding of multi(). Your reply helps convince my team that it is a
> >> real transaction.
> >>
> >> Regarding my 2nd question, maybe I should have described a bit of our
> >> challenge. When we have a large number of ZK write requests that cause
> high
> >> ZK write QPS, ZK sessions are expired by ZK. And this affects the
> >> application's connection to ZK. We wonder if we could apply multi() to
> >> batch the ZK write requests to reduce ZK write QPS so ZK wouldn't expire
> >> sessions. So in this case, do you think we could still not apply
> multi() to
> >> achieve the purpose?
> >>
> >> Thank you, Ted!!
> >>
> >> On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ih...@gmail.com> wrote:
> >>
> >> > Hi Zookeeper Devs,
> >> >
> >> > Hope this email finds you well!
> >> >
> >> > I am working on some stuff that needs ZK multi(). I would like to
> confirm a
> >> > few things about this API.
> >> >
> >> > 1. Is this a real transaction operation in ZK? My understanding is,
> it is a
> >> > real transaction. If I put 3 write operations in this transaction
> request,
> >> > these 3 write operations are committed in 1 transaction with the same
> zxid
> >> > and 1 proposal. Observers should either see all the updates or none
> of the
> >> > updates. Observers should not see partial updates, eg. only 1 of the 3
> >> > updates.
> >> >
> >>
> >> Yes. The multi() is atomic. It will happen or not and the program
> invoking
> >> the operation will be told why or why not.
> >>
> >> 2. We have a case to write multiple znodes. Currently it is sending the
> >>
> >>
> >> > requests one by one. With transaction, I believe we could batch
> writes in 1
> >> > transaction request. This is intended to reduce ZK write pressure.
> Eg. we
> >> > are writing 100 znodes, putting it in one transaction request would
> reduce
> >> > write requests from 100 (single write request using create() or
> set()) to 1
> >> > write request (transaction), right? And does this reduce ZK server
> write
> >> > request pressure?
> >> >
> >>
> >> Not really. The way that multi() works is that it does a group commit.
> >> There may be some economies in terms of number of network exchanges, but
> >> the internal work of testing whether the operations will succeed is the
> >> same. The point of multi() is to make use of and provide nuanced control
> >> over the normal group commit that Zookeeper is doing anyway. It should
> not
> >> generally be viewed as an efficiency improvement.
> >>
> >> Hopefully this helps.
> >>
> >>
> >>
> >>
> >> On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ih...@gmail.com> wrote:
> >>
> >>> Hi Zookeeper Devs,
> >>>
> >>> Hope this email finds you well!
> >>>
> >>> I am working on some stuff that needs ZK multi(). I would like to
> >>> confirm a few things about this API.
> >>>
> >>> 1. Is this a real transaction operation in ZK? My understanding is, it
> >>> is a real transaction. If I put 3 write operations in this transaction
> >>> request, these 3 write operations are committed in 1 transaction with
> the
> >>> same zxid and 1 proposal. Observers should either see all the updates
> or
> >>> none of the updates. Observers should not see partial updates, eg.
> only 1
> >>> of the 3 updates.
> >>>
> >>> 2. We have a case to write multiple znodes. Currently it is sending the
> >>> requests one by one. With transaction, I believe we could batch writes
> in 1
> >>> transaction request. This is intended to reduce ZK write pressure. Eg.
> we
> >>> are writing 100 znodes, putting it in one transaction request would
> reduce
> >>> write requests from 100 (single write request using create() or set())
> to 1
> >>> write request (transaction), right? And does this reduce ZK server
> write
> >>> request pressure?
> >>>
> >>> Could you help explain? I am looking forward to your reply. Thank you
> >>> very much!
> >>>
> >>> Best,
> >>> -Huizhi
> >>>
> >>
>

Re: ZK Transaction API multi()

Posted by Huizhi Lu <ih...@gmail.com>.

Hi Ted,

Thank you so much for the reply. Your suggestion is very valuable. I do
agree that we should migrate from ZK to a distributed DB for this high
number of writes. Due to legacy codebase and usage, it may not be that easy
for us to do that. So we are considering multi() as a short/mid term
solution. Finally we will move the excessive number of writes out of ZK to
achieve higher scalability.

Lastly, I greatly appreciate your insightful explanation! FYI, I am very
happy to receive prompt replies from you, Ted!

Best,
-Huizhi

On Mon, Jul 27, 2020 at 4:30 PM Ted Dunning <te...@gmail.com> wrote:

>
> This sounds like you are using ZK outside of the intended design. The idea
> is that ZK is a coordination engine. If you have such high write rates that
> ZK is dropping connections, you probably want a distributed database of
> some kind, perhaps one that uses ZK to coordinate itself. ZK is a form of
> replicated database, not a distributed one and, as such, the write rate
> doesn't scale and that is intentional.
>
> Even if multi() solves your immediate problem, it leaves the same problem
> in place at just a slightly higher scale. My own philosophy of scaling is
> that when you hit a problem, you should increase your scale by a large
> enough factor to give you time to solve some other problems or build new
> stuff before you have to fix your scaling problem again. Increasing scale
> by a factor of 2 rarely does this. I prefer to increase my scaling bounds
> by a factor of 10 or more so that I have some breathing space. I remember a
> time in one startup where our system was on the edge of breaking and our
> traffic was doubling roughly every week. We had to improve our situation by
> at least a factor of 10 each time we upgraded our systems just to stay in
> the same place. I can only hope you will have similar problems.
>
>
>
> On Mon, Jul 27, 2020 at 3:47 PM Huizhi Lu <ih...@gmail.com> wrote:
>
>> Hi Ted,
>>
>> Thank you very much for the reply! I didn't receive the reply in my email
>> but I found it in ZK dev mail thread. So I could not reply directly to the
>> thread.
>>
>> I really appreciate a reply from the original author of multi()! And your
>> blog (A Tour of the Multi-update For Zookeeper) is very helpful with
>> understanding of multi(). Your reply helps convince my team that it is a
>> real transaction.
>>
>> Regarding my 2nd question, maybe I should have described a bit of our
>> challenge. When we have a large number of ZK write requests that cause high
>> ZK write QPS, ZK sessions are expired by ZK. And this affects the
>> application's connection to ZK. We wonder if we could apply multi() to
>> batch the ZK write requests to reduce ZK write QPS so ZK wouldn't expire
>> sessions. So in this case, do you think we could still not apply multi() to
>> achieve the purpose?
>>
>> Thank you, Ted!!
>>
>> On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ih...@gmail.com> wrote:
>>
>> > Hi Zookeeper Devs,
>> >
>> > Hope this email finds you well!
>> >
>> > I am working on some stuff that needs ZK multi(). I would like to confirm a
>> > few things about this API.
>> >
>> > 1. Is this a real transaction operation in ZK? My understanding is, it is a
>> > real transaction. If I put 3 write operations in this transaction request,
>> > these 3 write operations are committed in 1 transaction with the same zxid
>> > and 1 proposal. Observers should either see all the updates or none of the
>> > updates. Observers should not see partial updates, eg. only 1 of the 3
>> > updates.
>> >
>>
>> Yes. The multi() is atomic. It will happen or not and the program invoking
>> the operation will be told why or why not.
>>
>> 2. We have a case to write multiple znodes. Currently it is sending the
>>
>>
>> > requests one by one. With transaction, I believe we could batch writes in 1
>> > transaction request. This is intended to reduce ZK write pressure. Eg. we
>> > are writing 100 znodes, putting it in one transaction request would reduce
>> > write requests from 100 (single write request using create() or set()) to 1
>> > write request (transaction), right? And does this reduce ZK server write
>> > request pressure?
>> >
>>
>> Not really. The way that multi() works is that it does a group commit.
>> There may be some economies in terms of number of network exchanges, but
>> the internal work of testing whether the operations will succeed is the
>> same. The point of multi() is to make use of and provide nuanced control
>> over the normal group commit that Zookeeper is doing anyway. It should not
>> generally be viewed as an efficiency improvement.
>>
>> Hopefully this helps.
>>
>>
>>
>>
>> On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ih...@gmail.com> wrote:
>>
>>> Hi Zookeeper Devs,
>>>
>>> Hope this email finds you well!
>>>
>>> I am working on some stuff that needs ZK multi(). I would like to
>>> confirm a few things about this API.
>>>
>>> 1. Is this a real transaction operation in ZK? My understanding is, it
>>> is a real transaction. If I put 3 write operations in this transaction
>>> request, these 3 write operations are committed in 1 transaction with the
>>> same zxid and 1 proposal. Observers should either see all the updates or
>>> none of the updates. Observers should not see partial updates, eg. only 1
>>> of the 3 updates.
>>>
>>> 2. We have a case to write multiple znodes. Currently it is sending the
>>> requests one by one. With transaction, I believe we could batch writes in 1
>>> transaction request. This is intended to reduce ZK write pressure. Eg. we
>>> are writing 100 znodes, putting it in one transaction request would reduce
>>> write requests from 100 (single write request using create() or set()) to 1
>>> write request (transaction), right? And does this reduce ZK server write
>>> request pressure?
>>>
>>> Could you help explain? I am looking forward to your reply. Thank you
>>> very much!
>>>
>>> Best,
>>> -Huizhi
>>>
>>

Re: ZK Transaction API multi()

Posted by Ted Dunning <te...@gmail.com>.

This sounds like you are using ZK outside of the intended design. The idea
is that ZK is a coordination engine. If you have such high write rates that
ZK is dropping connections, you probably want a distributed database of
some kind, perhaps one that uses ZK to coordinate itself. ZK is a form of
replicated database, not a distributed one and, as such, the write rate
doesn't scale and that is intentional.

Even if multi() solves your immediate problem, it leaves the same problem
in place at just a slightly higher scale. My own philosophy of scaling is
that when you hit a problem, you should increase your scale by a large
enough factor to give you time to solve some other problems or build new
stuff before you have to fix your scaling problem again. Increasing scale
by a factor of 2 rarely does this. I prefer to increase my scaling bounds
by a factor of 10 or more so that I have some breathing space. I remember a
time in one startup where our system was on the edge of breaking and our
traffic was doubling roughly every week. We had to improve our situation by
at least a factor of 10 each time we upgraded our systems just to stay in
the same place. I can only hope you will have similar problems.



On Mon, Jul 27, 2020 at 3:47 PM Huizhi Lu <ih...@gmail.com> wrote:

> Hi Ted,
>
> Thank you very much for the reply! I didn't receive the reply in my email
> but I found it in ZK dev mail thread. So I could not reply directly to the
> thread.
>
> I really appreciate a reply from the original author of multi()! And your
> blog (A Tour of the Multi-update For Zookeeper) is very helpful with
> understanding of multi(). Your reply helps convince my team that it is a
> real transaction.
>
> Regarding my 2nd question, maybe I should have described a bit of our
> challenge. When we have a large number of ZK write requests that cause high
> ZK write QPS, ZK sessions are expired by ZK. And this affects the
> application's connection to ZK. We wonder if we could apply multi() to
> batch the ZK write requests to reduce ZK write QPS so ZK wouldn't expire
> sessions. So in this case, do you think we could still not apply multi() to
> achieve the purpose?
>
> Thank you, Ted!!
>
> On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ih...@gmail.com> wrote:
>
> > Hi Zookeeper Devs,
> >
> > Hope this email finds you well!
> >
> > I am working on some stuff that needs ZK multi(). I would like to confirm a
> > few things about this API.
> >
> > 1. Is this a real transaction operation in ZK? My understanding is, it is a
> > real transaction. If I put 3 write operations in this transaction request,
> > these 3 write operations are committed in 1 transaction with the same zxid
> > and 1 proposal. Observers should either see all the updates or none of the
> > updates. Observers should not see partial updates, eg. only 1 of the 3
> > updates.
> >
>
> Yes. The multi() is atomic. It will happen or not and the program invoking
> the operation will be told why or why not.
>
> 2. We have a case to write multiple znodes. Currently it is sending the
>
>
> > requests one by one. With transaction, I believe we could batch writes in 1
> > transaction request. This is intended to reduce ZK write pressure. Eg. we
> > are writing 100 znodes, putting it in one transaction request would reduce
> > write requests from 100 (single write request using create() or set()) to 1
> > write request (transaction), right? And does this reduce ZK server write
> > request pressure?
> >
>
> Not really. The way that multi() works is that it does a group commit.
> There may be some economies in terms of number of network exchanges, but
> the internal work of testing whether the operations will succeed is the
> same. The point of multi() is to make use of and provide nuanced control
> over the normal group commit that Zookeeper is doing anyway. It should not
> generally be viewed as an efficiency improvement.
>
> Hopefully this helps.
>
>
>
>
> On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ih...@gmail.com> wrote:
>
>> Hi Zookeeper Devs,
>>
>> Hope this email finds you well!
>>
>> I am working on some stuff that needs ZK multi(). I would like to confirm
>> a few things about this API.
>>
>> 1. Is this a real transaction operation in ZK? My understanding is, it is
>> a real transaction. If I put 3 write operations in this transaction
>> request, these 3 write operations are committed in 1 transaction with the
>> same zxid and 1 proposal. Observers should either see all the updates or
>> none of the updates. Observers should not see partial updates, eg. only 1
>> of the 3 updates.
>>
>> 2. We have a case to write multiple znodes. Currently it is sending the
>> requests one by one. With transaction, I believe we could batch writes in 1
>> transaction request. This is intended to reduce ZK write pressure. Eg. we
>> are writing 100 znodes, putting it in one transaction request would reduce
>> write requests from 100 (single write request using create() or set()) to 1
>> write request (transaction), right? And does this reduce ZK server write
>> request pressure?
>>
>> Could you help explain? I am looking forward to your reply. Thank you
>> very much!
>>
>> Best,
>> -Huizhi
>>
>

Re: ZK Transaction API multi()

Posted by Huizhi Lu <ih...@gmail.com>.

Hi Ted,

Thank you very much for the reply! I didn't receive the reply in my email
but I found it in ZK dev mail thread. So I could not reply directly to the
thread.

I really appreciate a reply from the original author of multi()! And your
blog (A Tour of the Multi-update For Zookeeper) is very helpful with
understanding of multi(). Your reply helps convince my team that it is a
real transaction.

Regarding my 2nd question, maybe I should have described a bit of our
challenge. When we have a large number of ZK write requests that cause high
ZK write QPS, ZK sessions are expired by ZK. And this affects the
application's connection to ZK. We wonder if we could apply multi() to
batch the ZK write requests to reduce ZK write QPS so ZK wouldn't expire
sessions. So in this case, do you think we could still not apply multi() to
achieve the purpose?

Thank you, Ted!!

On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ih...@gmail.com> wrote:

> Hi Zookeeper Devs,
>
> Hope this email finds you well!
>
> I am working on some stuff that needs ZK multi(). I would like to confirm a
> few things about this API.
>
> 1. Is this a real transaction operation in ZK? My understanding is, it is a
> real transaction. If I put 3 write operations in this transaction request,
> these 3 write operations are committed in 1 transaction with the same zxid
> and 1 proposal. Observers should either see all the updates or none of the
> updates. Observers should not see partial updates, eg. only 1 of the 3
> updates.
>

Yes. The multi() is atomic. It will happen or not and the program invoking
the operation will be told why or why not.

2. We have a case to write multiple znodes. Currently it is sending the

> requests one by one. With transaction, I believe we could batch writes in 1
> transaction request. This is intended to reduce ZK write pressure. Eg. we
> are writing 100 znodes, putting it in one transaction request would reduce
> write requests from 100 (single write request using create() or set()) to 1
> write request (transaction), right? And does this reduce ZK server write
> request pressure?
>

Not really. The way that multi() works is that it does a group commit.
There may be some economies in terms of number of network exchanges, but
the internal work of testing whether the operations will succeed is the
same. The point of multi() is to make use of and provide nuanced control
over the normal group commit that Zookeeper is doing anyway. It should not
generally be viewed as an efficiency improvement.

Hopefully this helps.

On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ih...@gmail.com> wrote:

> Hi Zookeeper Devs,
>
> Hope this email finds you well!
>
> I am working on some stuff that needs ZK multi(). I would like to confirm
> a few things about this API.
>
> 1. Is this a real transaction operation in ZK? My understanding is, it is
> a real transaction. If I put 3 write operations in this transaction
> request, these 3 write operations are committed in 1 transaction with the
> same zxid and 1 proposal. Observers should either see all the updates or
> none of the updates. Observers should not see partial updates, eg. only 1
> of the 3 updates.
>
> 2. We have a case to write multiple znodes. Currently it is sending the
> requests one by one. With transaction, I believe we could batch writes in 1
> transaction request. This is intended to reduce ZK write pressure. Eg. we
> are writing 100 znodes, putting it in one transaction request would reduce
> write requests from 100 (single write request using create() or set()) to 1
> write request (transaction), right? And does this reduce ZK server write
> request pressure?
>
> Could you help explain? I am looking forward to your reply. Thank you very
> much!
>
> Best,
> -Huizhi
>

Re: ZK Transaction API multi()

Posted by Ted Dunning <te...@gmail.com>.

On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ih...@gmail.com> wrote:

> Hi Zookeeper Devs,
>
> Hope this email finds you well!
>
> I am working on some stuff that needs ZK multi(). I would like to confirm a
> few things about this API.
>
> 1. Is this a real transaction operation in ZK? My understanding is, it is a
> real transaction. If I put 3 write operations in this transaction request,
> these 3 write operations are committed in 1 transaction with the same zxid
> and 1 proposal. Observers should either see all the updates or none of the
> updates. Observers should not see partial updates, eg. only 1 of the 3
> updates.
>

Yes. The multi() is atomic. It will happen or not and the program invoking
the operation will be told why or why not.

2. We have a case to write multiple znodes. Currently it is sending the
> requests one by one. With transaction, I believe we could batch writes in 1
> transaction request. This is intended to reduce ZK write pressure. Eg. we
> are writing 100 znodes, putting it in one transaction request would reduce
> write requests from 100 (single write request using create() or set()) to 1
> write request (transaction), right? And does this reduce ZK server write
> request pressure?
>

Not really. The way that multi() works is that it does a group commit.
There may be some economies in terms of number of network exchanges, but
the internal work of testing whether the operations will succeed is the
same. The point of multi() is to make use of and provide nuanced control
over the normal group commit that Zookeeper is doing anyway. It should not
generally be viewed as an efficiency improvement.

Hopefully this helps.