You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by Michael Han <ha...@apache.org> on 2021/05/03 22:13:09 UTC

Re: write performance issue in 3.6.2

Hi Li,

Thanks for following up.

>> write_commitproc_time_ms were large

This measures how long a local write op hears back from the leader. If it's
big, then either the leader is very busy acking the request, or your
network RTT is high.

How does the local fsync time (fsynctime) look like between two tests?

>> We've found that increasing the maxCommitBatchSize

Are you able to successfully tune this value so your benchmark against 3.6
is on par with 3.4 now? The original report mentioned lots of session
timeout and con loss. I am wondering if we can fix this first by tuning
this size.

The major difference of CommitProcessor between 3.4.14 and 3.6.2 is the
newly added per session commit queue such that reads and a write from
different sessions can be processed concurrently. This works best for mixed
read / write workloads, but for pure write workloads, the new
CommitProcessor is not superior, as all writes still have to be serialized
due to global ordering, plus the per session queue has the overhead for
example in this test ZK has to manage 500 queues (and enqueue / dequeue and
so on cost cycles). Though, I didn't expect this overhead can create such a
big difference in your test..

Also this is obvious but just want to confirm if the benchmark for two
versions of ZK was done on exact same test environment including same OS
and networking configuration?

On Mon, Apr 26, 2021 at 7:35 PM Li Wang <li...@gmail.com> wrote:

> Hi Michael,
>
> Thanks for your reply.
>
> 1. The workload is 500 concurrent users creating nodes with data size of 4
> bytes.
> 2. It's pure write
> 3. The perf issue is that under the same load, there were many session
> expired and connection loss errors when using ZK 3.6.2 but no such errors
> in ZK 3.4.14.
>
> The following are some updates on the issue.
>
> 1. We've checked the fine grained metrics and found that the
> CommitProcessor was the bottleneck. The commit_commit_proc_req_queued and
> the write_commitproc_time_ms were large.
> The errors were caused by too many commit requests queued up in the
> CommitProcessor and waiting to be processed.
> 2. We've found that increasing the maxCommitBatchSize can reduce both the
> session expired and connection loss errors.
> 3. We didn't observe any significant perf impact from the RequestThrottler.
>
>
> Please let me know if you or anyone has any questions.
>
> Thanks,
>
> Li
>
>
>
> On Tue, Apr 20, 2021 at 8:03 PM Michael Han <ha...@apache.org> wrote:
>
> > What is the workload looking like? Is it pure write, or mixed read write?
> >
> > A couple of ideas to move this forward:
> > * Publish the performance benchmark so the community can help.
> > * Bisect git commit and find the bad commit that caused the regression.
> > * Use the fine grained metrics introduced in 3.6 (e.g per processor stage
> > metrics) to measure where time spends during writes. We might have to add
> > these metrics on 3.4 to get a fair comparison.
> >
> > For the throttling - the RequestThrottler introduced in 3.6 does
> introduce
> > latency, but should not impact throughput this much.
> >
> > On Thu, Mar 11, 2021 at 11:46 AM Li Wang <li...@gmail.com> wrote:
> >
> > > The CPU usage of both server and client are normal (< 50%) during the
> > test.
> > >
> > > Based on the investigation, the server is too busy with the load.
> > >
> > > The issue doesn't exist in 3.4.14. I wonder why there is a significant
> > > write performance degradation from 3.4.14 to 3.6.2 and how we can
> address
> > > the issue.
> > >
> > > Best,
> > >
> > > Li
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Mar 11, 2021 at 11:25 AM Andor Molnar <an...@apache.org>
> wrote:
> > >
> > > > What is the CPU usage of both server and client during the test?
> > > >
> > > > Looks like server is dropping the clients because either the server
> or
> > > > both are too busy to deal with the load.
> > > > This log line is also concerning: "Too busy to snap, skipping”
> > > >
> > > > If that’s the case I believe you'll have to profile the server
> process
> > to
> > > > figure out where the perf bottleneck is.
> > > >
> > > > Andor
> > > >
> > > >
> > > >
> > > >
> > > > > On 2021. Feb 22., at 5:31, Li Wang <li...@gmail.com> wrote:
> > > > >
> > > > > Thanks, Patrick.
> > > > >
> > > > > Yes, we are using the same JVM version and GC configurations when
> > > > > running the two tests. I have checked the GC metrics and also the
> > heap
> > > > dump
> > > > > of the 3.6, the GC pause and the memory usage look okay.
> > > > >
> > > > > Best,
> > > > >
> > > > > Li
> > > > >
> > > > > On Sun, Feb 21, 2021 at 3:34 PM Patrick Hunt <ph...@apache.org>
> > wrote:
> > > > >
> > > > >> On Sun, Feb 21, 2021 at 3:28 PM Li Wang <li...@gmail.com>
> wrote:
> > > > >>
> > > > >>> Hi Enrico, Sushant,
> > > > >>>
> > > > >>> I re-run the perf test with the data consistency check feature
> > > disabled
> > > > >>> (i.e. -Dzookeeper.digest.enabled=false), the write performance
> > issue
> > > of
> > > > >> 3.6
> > > > >>> is still there.
> > > > >>>
> > > > >>> With everything exactly the same, the throughput of 3.6 was only
> > 1/2
> > > of
> > > > >> 3.4
> > > > >>> and the max latency was more than 8 times.
> > > > >>>
> > > > >>> Any other points or thoughts?
> > > > >>>
> > > > >>>
> > > > >> In the past I've noticed a big impact of GC when doing certain
> > > > performance
> > > > >> measurements. I assume you are using the same JVM version and GC
> > when
> > > > >> running the two tests? Perhaps our memory footprint has expanded
> > over
> > > > time.
> > > > >> You should rule out GC by running with gc logging turned on with
> > both
> > > > >> versions and compare the impact.
> > > > >>
> > > > >> Regards,
> > > > >>
> > > > >> Patrick
> > > > >>
> > > > >>
> > > > >>> Cheers,
> > > > >>>
> > > > >>> Li
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> On Sat, Feb 20, 2021 at 9:04 PM Li Wang <li...@gmail.com>
> wrote:
> > > > >>>
> > > > >>>> Thanks Sushant and Enrico!
> > > > >>>>
> > > > >>>> This is a really good point.  According to the 3.6
> documentation,
> > > the
> > > > >>>> feature is disabled by default.
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://zookeeper.apache.org/doc/r3.6.2/zookeeperAdmin.html#ch_administration
> > > > >>> .
> > > > >>>> However, checking the code, the default is enabled.
> > > > >>>>
> > > > >>>> Let me set the zookeeper.digest.enabled to false and see how the
> > > write
> > > > >>>> operation performs.
> > > > >>>>
> > > > >>>> Best,
> > > > >>>>
> > > > >>>> Li
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Fri, Feb 19, 2021 at 1:32 PM Sushant Mane <
> > > sushantmane7@gmail.com>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Hi Li,
> > > > >>>>>
> > > > >>>>> On 3.6.2 consistency checker (adhash based) is enabled by
> > default:
> > > > >>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/zookeeper/blob/803c7f1a12f85978cb049af5e4ef23bd8b688715/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java#L136
> > > > >>>>> .
> > > > >>>>> It is not present in ZK 3.4.14.
> > > > >>>>>
> > > > >>>>> This feature does have some impact on write performance.
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>> Sushant
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Fri, Feb 19, 2021 at 12:50 PM Enrico Olivelli <
> > > > eolivelli@gmail.com
> > > > >>>
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> Li,
> > > > >>>>>> I wonder of we have some new throttling/back pressure
> mechanisms
> > > > >> that
> > > > >>> is
> > > > >>>>>> enabled by default.
> > > > >>>>>>
> > > > >>>>>> Does anyone has some pointer to relevant implementations?
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> Enrico
> > > > >>>>>>
> > > > >>>>>> Il Ven 19 Feb 2021, 19:46 Li Wang <li...@gmail.com> ha
> > scritto:
> > > > >>>>>>
> > > > >>>>>>> Hi,
> > > > >>>>>>>
> > > > >>>>>>> We switched to Netty on both client side and server side and
> > the
> > > > >>>>>>> performance issue is still there.  Anyone has any insights on
> > > what
> > > > >>>>> could
> > > > >>>>>> be
> > > > >>>>>>> the cause of higher latency?
> > > > >>>>>>>
> > > > >>>>>>> Thanks,
> > > > >>>>>>>
> > > > >>>>>>> Li
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Mon, Feb 15, 2021 at 2:17 PM Li Wang <li...@gmail.com>
> > > > >> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi Enrico,
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks for the reply.
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> 1. We are using NIO based stack, not Netty based yet.
> > > > >>>>>>>>
> > > > >>>>>>>> 2. Yes, here are some metrics on the client side.
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> 3.6: throughput: 7K, failure: 81215228, Avg Latency: 57ms,
> > Max
> > > > >>>>> Latency
> > > > >>>>>>> 31s
> > > > >>>>>>>>
> > > > >>>>>>>> 3.4: throughput: 15k, failure: 0,  Avg Latency: 30ms,  Max
> > > > >>> Latency:
> > > > >>>>>> 1.6s
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> 3. Yes, the JVM and zoo.cfg config are the exact same
> > > > >>>>>>>>
> > > > >>>>>>>> 10G of Heap
> > > > >>>>>>>>
> > > > >>>>>>>> 13G of Memory
> > > > >>>>>>>>
> > > > >>>>>>>> 5 Participante
> > > > >>>>>>>>
> > > > >>>>>>>> 5 Observere
> > > > >>>>>>>>
> > > > >>>>>>>> Client session timeout: 3000ms
> > > > >>>>>>>>
> > > > >>>>>>>> Server min session time: 4000ms
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> 4. Yes, there are two types of  WARN logs and many “Expiring
> > > > >>>>> session”
> > > > >>>>>>>> INFO log
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> 2021-02-15 22:04:36,506 [myid:4] - WARN
> > > > >>>>>>>> [NIOWorkerThread-7:NIOServerCnxn@365] - Unexpected
> exception
> > > > >>>>>>>>
> > > > >>>>>>>> EndOfStreamException: Unable to read additional data from
> > > > >> client,
> > > > >>> it
> > > > >>>>>>>> probably closed the socket: address = /100.108.63.116:43366
> ,
> > > > >>>>> session =
> > > > >>>>>>>> 0x400189fee9a000b
> > > > >>>>>>>>
> > > > >>>>>>>> at
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:164)
> > > > >>>>>>>>
> > > > >>>>>>>> at
> > > > >>>>>>
> > > > >>
> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:327)
> > > > >>>>>>>>
> > > > >>>>>>>> at
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
> > > > >>>>>>>>
> > > > >>>>>>>> at
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> > > > >>>>>>>>
> > > > >>>>>>>> at
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> > > > >>>>>>>>
> > > > >>>>>>>> at
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> > > > >>>>>>>>
> > > > >>>>>>>> at java.base/java.lang.Thread.run(Thread.java:834)
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> 2021-02-15 22:05:14,428 [myid:4] - WARN
> > > > >>>>>>>> [SyncThread:4:SyncRequestProcessor@188] - Too busy to snap,
> > > > >>>>> skipping
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> 2021-02-15 22:01:51,427 [myid:4] - INFO
> > > > >>>>>>>> [SessionTracker:ZooKeeperServer@610] - Expiring session
> > > > >>>>>>>> 0x400189fee9a001e, timeout of 4000ms exceeded
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> 5. Yes we upgrade both the client and the server to 3.6.
> > > > >> Actually,
> > > > >>>>> the
> > > > >>>>>>>> issue happened with the combinations of
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> 3.4 client and 3.6 server
> > > > >>>>>>>>
> > > > >>>>>>>> 3.6 client and 3.6 server
> > > > >>>>>>>>
> > > > >>>>>>>> Please let me know if you need any additional info.
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks,
> > > > >>>>>>>>
> > > > >>>>>>>> Li
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> On Mon, Feb 15, 2021 at 1:44 PM Li Wang <li...@gmail.com>
> > > > >>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi Enrico,
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thanks for the reply.
> > > > >>>>>>>>>
> > > > >>>>>>>>> 1. We are using direct NIO based stack, not Netty based
> yet.
> > > > >>>>>>>>> 2. Yes, on the client side, here are the metrics
> > > > >>>>>>>>>
> > > > >>>>>>>>> 3.6:
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Mon, Feb 15, 2021 at 10:44 AM Enrico Olivelli <
> > > > >>>>> eolivelli@gmail.com
> > > > >>>>>>>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> IIRC The main difference is about the switch to Netty 4
> and
> > > > >>> about
> > > > >>>>>> using
> > > > >>>>>>>>>> more DirectMemory. Are you using the Netty based stack?
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Apart from that macro difference there have been many many
> > > > >>> changes
> > > > >>>>>>> since
> > > > >>>>>>>>>> 3.4.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Do you have some metrics to share?
> > > > >>>>>>>>>> Are the  JVM configurations and zoo.cfg configuration
> equals
> > > > >> to
> > > > >>>>> each
> > > > >>>>>>>>>> other?
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Do you see warnings on the server logs?
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Did you upgrade both the client and the server or only the
> > > > >>> server?
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Enrico
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Il Lun 15 Feb 2021, 18:30 Li Wang <li...@gmail.com> ha
> > > > >>> scritto:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Hi,
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> We want to upgrade from 3.4.14 to 3.6.2.  During the
> > > > >>>>> perform/load
> > > > >>>>>>>>>>> comparison test,  it was found that the performance of
> 3.6
> > > > >> has
> > > > >>>>> been
> > > > >>>>>>>>>>> significantly degraded compared to 3.4 for the write
> > > > >>> operation.
> > > > >>>>>> Under
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>> same load, there was a huge number of SessionExpired and
> > > > >>>>>>> ConnectionLoss
> > > > >>>>>>>>>>> errors in 3.6 while no such errors in 3.4.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> The load testing is 500 concurrent users with a cluster
> of
> > 5
> > > > >>>>>>>>>> participants
> > > > >>>>>>>>>>> and 5 observers. The min session timeout on the server
> side
> > > > >> is
> > > > >>>>>>> 4000ms.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> I wonder if anyone has seen the same issue and has any
> > > > >>> insights
> > > > >>>>> on
> > > > >>>>>>> what
> > > > >>>>>>>>>>> could be the cause of the performance degradation.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Thanks
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Li
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: write performance issue in 3.6.2

Posted by Michael Han <ha...@apache.org>.

>> because the tests were run with Prometheus enabled,  which is new in 3.6
and has significant negative perf impact.

Interesting, let's see what the numbers are without Prometheus involved. It
could be that the increased latency we observed in CommitProcessor is just
a symptom rather than the cause if Prometheus is the culprit.

In my production environment we use an in house in-process metrics library
which hasn't caused any trouble for performance so far.

>> Does it measure how long a local write op takes in the commit processor
phase?

Yes.

>> I don't think the writes can be processed concurrently.

Correct. There can only be a single inflight write operation that's
committing at any given time. The concurrency is between a single write and
multiple reads where these reads belong to different sessions. Previously,
in 3.4, a single write will block processing of every other operation
including both writes and reads, and in 3.6, a single write will only block
reads that's from the same session; reads from other sessions can proceed
concurrently.

>> Is it correct to say that the new CommitProcessor works best for the
reads in the read/write workloads scenario,

Yes, as previously mentioned, a single write will now only
conditionally block other reads as opposed to unanimously block all reads
which increase overall throughput.


On Mon, May 3, 2021 at 5:06 PM Li Wang <li...@gmail.com> wrote:

> Hi Michael,
>
> Thanks for your additional inputs.
>
> On Mon, May 3, 2021 at 3:13 PM Michael Han <ha...@apache.org> wrote:
>
> > Hi Li,
> >
> > Thanks for following up.
> >
> > >> write_commitproc_time_ms were large
> >
> > This measures how long a local write op hears back from the leader. If
> it's
> > big, then either the leader is very busy acking the request, or your
> > network RTT is high.
> >
>
> Does it measure how long a local write op takes in the commit processor
> phase?
>
> ServerMetrics.getMetrics().WRITE_COMMITPROC_TIME.add(currentTime -
> request.commitProcQueueStartTime);
>
>
> > How does the local fsync time (fsynctime) look like between two tests?
> >
>
> The fsync time looks similar between two tests.
>
> >
> > >> We've found that increasing the maxCommitBatchSize
> >
> > Are you able to successfully tune this value so your benchmark against
> 3.6
> > is on par with 3.4 now? The original report mentioned lots of session
> > timeout and con loss. I am wondering if we can fix this first by tuning
> > this size.
> >
>
> We ran load tests with maxCommitBatchSize as 500 vs 1. 500 was used as we
> have 500 concurrent users in the load test.
> The connection loss error count was reduced about 40% and the session
> expired error was reduced about 45%.
> Tuning maxCommitBatchSize can significantly reduce the errors. We can not
> say the benchmark is on par with 3.4 (i.e. no errors)
> because the tests were run with Prometheus enabled,  which is new in 3.6
> and has significant negative perf impact.
> We will run tests with Prometheus disabled and maxCommitBatchSize as 500
> when we get a chance.
>
> >
> > The major difference of CommitProcessor between 3.4.14 and 3.6.2 is the
> > newly added per session commit queue such that reads and a write from
> > different sessions can be processed concurrently.
>
> Yes, I noticed that there is a queue per session in 3.6, but I don't think
> the writes can be processed concurrently.
> The CommitProcessor is single threaded and the CommitProc worker threads
> are only for reads. Did I miss anything?
>
> This works best for mixed
> > read / write workloads, but for pure write workloads, the new
> > CommitProcessor is not superior, as all writes still have to be
> serialized
> > due to global ordering, plus the per session queue has the overhead for
> > example in this test ZK has to manage 500 queues (and enqueue / dequeue
> and
> > so on cost cycles). Though, I didn't expect this overhead can create
> such a
> > big difference in your test..
> >
>
> Is it correct to say that the new CommitProcessor works best for the reads
> in the read/write workloads scenario,
> as only the reads can be processed concurrently?
>
>
> > Also this is obvious but just want to confirm if the benchmark for two
> > versions of ZK was done on exact same test environment including same OS
> > and networking configuration?
> >
>
> Yes, the benchmark for the two versions was done on the same test
> environment and configuration.
>
> >
> > On Mon, Apr 26, 2021 at 7:35 PM Li Wang <li...@gmail.com> wrote:
> >
> > > Hi Michael,
> > >
> > > Thanks for your reply.
> > >
> > > 1. The workload is 500 concurrent users creating nodes with data size
> of
> > 4
> > > bytes.
> > > 2. It's pure write
> > > 3. The perf issue is that under the same load, there were many session
> > > expired and connection loss errors when using ZK 3.6.2 but no such
> errors
> > > in ZK 3.4.14.
> > >
> > > The following are some updates on the issue.
> > >
> > > 1. We've checked the fine grained metrics and found that the
> > > CommitProcessor was the bottleneck. The commit_commit_proc_req_queued
> and
> > > the write_commitproc_time_ms were large.
> > > The errors were caused by too many commit requests queued up in the
> > > CommitProcessor and waiting to be processed.
> > > 2. We've found that increasing the maxCommitBatchSize can reduce both
> the
> > > session expired and connection loss errors.
> > > 3. We didn't observe any significant perf impact from the
> > RequestThrottler.
> > >
> > >
> > > Please let me know if you or anyone has any questions.
> > >
> > > Thanks,
> > >
> > > Li
> > >
> > >
> > >
> > > On Tue, Apr 20, 2021 at 8:03 PM Michael Han <ha...@apache.org> wrote:
> > >
> > > > What is the workload looking like? Is it pure write, or mixed read
> > write?
> > > >
> > > > A couple of ideas to move this forward:
> > > > * Publish the performance benchmark so the community can help.
> > > > * Bisect git commit and find the bad commit that caused the
> regression.
> > > > * Use the fine grained metrics introduced in 3.6 (e.g per processor
> > stage
> > > > metrics) to measure where time spends during writes. We might have to
> > add
> > > > these metrics on 3.4 to get a fair comparison.
> > > >
> > > > For the throttling - the RequestThrottler introduced in 3.6 does
> > > introduce
> > > > latency, but should not impact throughput this much.
> > > >
> > > > On Thu, Mar 11, 2021 at 11:46 AM Li Wang <li...@gmail.com> wrote:
> > > >
> > > > > The CPU usage of both server and client are normal (< 50%) during
> the
> > > > test.
> > > > >
> > > > > Based on the investigation, the server is too busy with the load.
> > > > >
> > > > > The issue doesn't exist in 3.4.14. I wonder why there is a
> > significant
> > > > > write performance degradation from 3.4.14 to 3.6.2 and how we can
> > > address
> > > > > the issue.
> > > > >
> > > > > Best,
> > > > >
> > > > > Li
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Mar 11, 2021 at 11:25 AM Andor Molnar <an...@apache.org>
> > > wrote:
> > > > >
> > > > > > What is the CPU usage of both server and client during the test?
> > > > > >
> > > > > > Looks like server is dropping the clients because either the
> server
> > > or
> > > > > > both are too busy to deal with the load.
> > > > > > This log line is also concerning: "Too busy to snap, skipping”
> > > > > >
> > > > > > If that’s the case I believe you'll have to profile the server
> > > process
> > > > to
> > > > > > figure out where the perf bottleneck is.
> > > > > >
> > > > > > Andor
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On 2021. Feb 22., at 5:31, Li Wang <li...@gmail.com> wrote:
> > > > > > >
> > > > > > > Thanks, Patrick.
> > > > > > >
> > > > > > > Yes, we are using the same JVM version and GC configurations
> when
> > > > > > > running the two tests. I have checked the GC metrics and also
> the
> > > > heap
> > > > > > dump
> > > > > > > of the 3.6, the GC pause and the memory usage look okay.
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Li
> > > > > > >
> > > > > > > On Sun, Feb 21, 2021 at 3:34 PM Patrick Hunt <phunt@apache.org
> >
> > > > wrote:
> > > > > > >
> > > > > > >> On Sun, Feb 21, 2021 at 3:28 PM Li Wang <li...@gmail.com>
> > > wrote:
> > > > > > >>
> > > > > > >>> Hi Enrico, Sushant,
> > > > > > >>>
> > > > > > >>> I re-run the perf test with the data consistency check
> feature
> > > > > disabled
> > > > > > >>> (i.e. -Dzookeeper.digest.enabled=false), the write
> performance
> > > > issue
> > > > > of
> > > > > > >> 3.6
> > > > > > >>> is still there.
> > > > > > >>>
> > > > > > >>> With everything exactly the same, the throughput of 3.6 was
> > only
> > > > 1/2
> > > > > of
> > > > > > >> 3.4
> > > > > > >>> and the max latency was more than 8 times.
> > > > > > >>>
> > > > > > >>> Any other points or thoughts?
> > > > > > >>>
> > > > > > >>>
> > > > > > >> In the past I've noticed a big impact of GC when doing certain
> > > > > > performance
> > > > > > >> measurements. I assume you are using the same JVM version and
> GC
> > > > when
> > > > > > >> running the two tests? Perhaps our memory footprint has
> expanded
> > > > over
> > > > > > time.
> > > > > > >> You should rule out GC by running with gc logging turned on
> with
> > > > both
> > > > > > >> versions and compare the impact.
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >>
> > > > > > >> Patrick
> > > > > > >>
> > > > > > >>
> > > > > > >>> Cheers,
> > > > > > >>>
> > > > > > >>> Li
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> On Sat, Feb 20, 2021 at 9:04 PM Li Wang <li...@gmail.com>
> > > wrote:
> > > > > > >>>
> > > > > > >>>> Thanks Sushant and Enrico!
> > > > > > >>>>
> > > > > > >>>> This is a really good point.  According to the 3.6
> > > documentation,
> > > > > the
> > > > > > >>>> feature is disabled by default.
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://zookeeper.apache.org/doc/r3.6.2/zookeeperAdmin.html#ch_administration
> > > > > > >>> .
> > > > > > >>>> However, checking the code, the default is enabled.
> > > > > > >>>>
> > > > > > >>>> Let me set the zookeeper.digest.enabled to false and see how
> > the
> > > > > write
> > > > > > >>>> operation performs.
> > > > > > >>>>
> > > > > > >>>> Best,
> > > > > > >>>>
> > > > > > >>>> Li
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> On Fri, Feb 19, 2021 at 1:32 PM Sushant Mane <
> > > > > sushantmane7@gmail.com>
> > > > > > >>>> wrote:
> > > > > > >>>>
> > > > > > >>>>> Hi Li,
> > > > > > >>>>>
> > > > > > >>>>> On 3.6.2 consistency checker (adhash based) is enabled by
> > > > default:
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/zookeeper/blob/803c7f1a12f85978cb049af5e4ef23bd8b688715/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java#L136
> > > > > > >>>>> .
> > > > > > >>>>> It is not present in ZK 3.4.14.
> > > > > > >>>>>
> > > > > > >>>>> This feature does have some impact on write performance.
> > > > > > >>>>>
> > > > > > >>>>> Thanks,
> > > > > > >>>>> Sushant
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> On Fri, Feb 19, 2021 at 12:50 PM Enrico Olivelli <
> > > > > > eolivelli@gmail.com
> > > > > > >>>
> > > > > > >>>>> wrote:
> > > > > > >>>>>
> > > > > > >>>>>> Li,
> > > > > > >>>>>> I wonder of we have some new throttling/back pressure
> > > mechanisms
> > > > > > >> that
> > > > > > >>> is
> > > > > > >>>>>> enabled by default.
> > > > > > >>>>>>
> > > > > > >>>>>> Does anyone has some pointer to relevant implementations?
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>> Enrico
> > > > > > >>>>>>
> > > > > > >>>>>> Il Ven 19 Feb 2021, 19:46 Li Wang <li...@gmail.com> ha
> > > > scritto:
> > > > > > >>>>>>
> > > > > > >>>>>>> Hi,
> > > > > > >>>>>>>
> > > > > > >>>>>>> We switched to Netty on both client side and server side
> > and
> > > > the
> > > > > > >>>>>>> performance issue is still there.  Anyone has any
> insights
> > on
> > > > > what
> > > > > > >>>>> could
> > > > > > >>>>>> be
> > > > > > >>>>>>> the cause of higher latency?
> > > > > > >>>>>>>
> > > > > > >>>>>>> Thanks,
> > > > > > >>>>>>>
> > > > > > >>>>>>> Li
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> On Mon, Feb 15, 2021 at 2:17 PM Li Wang <
> li4wang@gmail.com
> > >
> > > > > > >> wrote:
> > > > > > >>>>>>>
> > > > > > >>>>>>>> Hi Enrico,
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Thanks for the reply.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 1. We are using NIO based stack, not Netty based yet.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 2. Yes, here are some metrics on the client side.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 3.6: throughput: 7K, failure: 81215228, Avg Latency:
> 57ms,
> > > > Max
> > > > > > >>>>> Latency
> > > > > > >>>>>>> 31s
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 3.4: throughput: 15k, failure: 0,  Avg Latency: 30ms,
> Max
> > > > > > >>> Latency:
> > > > > > >>>>>> 1.6s
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 3. Yes, the JVM and zoo.cfg config are the exact same
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 10G of Heap
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 13G of Memory
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 5 Participante
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 5 Observere
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Client session timeout: 3000ms
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Server min session time: 4000ms
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 4. Yes, there are two types of  WARN logs and many
> > “Expiring
> > > > > > >>>>> session”
> > > > > > >>>>>>>> INFO log
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 2021-02-15 22:04:36,506 [myid:4] - WARN
> > > > > > >>>>>>>> [NIOWorkerThread-7:NIOServerCnxn@365] - Unexpected
> > > exception
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> EndOfStreamException: Unable to read additional data
> from
> > > > > > >> client,
> > > > > > >>> it
> > > > > > >>>>>>>> probably closed the socket: address = /
> > 100.108.63.116:43366
> > > ,
> > > > > > >>>>> session =
> > > > > > >>>>>>>> 0x400189fee9a000b
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:164)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at
> > > > > > >>>>>>
> > > > > > >>
> > > >
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:327)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at java.base/java.lang.Thread.run(Thread.java:834)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 2021-02-15 22:05:14,428 [myid:4] - WARN
> > > > > > >>>>>>>> [SyncThread:4:SyncRequestProcessor@188] - Too busy to
> > snap,
> > > > > > >>>>> skipping
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 2021-02-15 22:01:51,427 [myid:4] - INFO
> > > > > > >>>>>>>> [SessionTracker:ZooKeeperServer@610] - Expiring session
> > > > > > >>>>>>>> 0x400189fee9a001e, timeout of 4000ms exceeded
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 5. Yes we upgrade both the client and the server to 3.6.
> > > > > > >> Actually,
> > > > > > >>>>> the
> > > > > > >>>>>>>> issue happened with the combinations of
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 3.4 client and 3.6 server
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 3.6 client and 3.6 server
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Please let me know if you need any additional info.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Thanks,
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Li
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> On Mon, Feb 15, 2021 at 1:44 PM Li Wang <
> > li4wang@gmail.com>
> > > > > > >>> wrote:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>> Hi Enrico,
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> Thanks for the reply.
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> 1. We are using direct NIO based stack, not Netty based
> > > yet.
> > > > > > >>>>>>>>> 2. Yes, on the client side, here are the metrics
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> 3.6:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> On Mon, Feb 15, 2021 at 10:44 AM Enrico Olivelli <
> > > > > > >>>>> eolivelli@gmail.com
> > > > > > >>>>>>>
> > > > > > >>>>>>>>> wrote:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>> IIRC The main difference is about the switch to Netty
> 4
> > > and
> > > > > > >>> about
> > > > > > >>>>>> using
> > > > > > >>>>>>>>>> more DirectMemory. Are you using the Netty based
> stack?
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Apart from that macro difference there have been many
> > many
> > > > > > >>> changes
> > > > > > >>>>>>> since
> > > > > > >>>>>>>>>> 3.4.
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Do you have some metrics to share?
> > > > > > >>>>>>>>>> Are the  JVM configurations and zoo.cfg configuration
> > > equals
> > > > > > >> to
> > > > > > >>>>> each
> > > > > > >>>>>>>>>> other?
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Do you see warnings on the server logs?
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Did you upgrade both the client and the server or only
> > the
> > > > > > >>> server?
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Enrico
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Il Lun 15 Feb 2021, 18:30 Li Wang <li...@gmail.com>
> > ha
> > > > > > >>> scritto:
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>>> Hi,
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> We want to upgrade from 3.4.14 to 3.6.2.  During the
> > > > > > >>>>> perform/load
> > > > > > >>>>>>>>>>> comparison test,  it was found that the performance
> of
> > > 3.6
> > > > > > >> has
> > > > > > >>>>> been
> > > > > > >>>>>>>>>>> significantly degraded compared to 3.4 for the write
> > > > > > >>> operation.
> > > > > > >>>>>> Under
> > > > > > >>>>>>>>>> the
> > > > > > >>>>>>>>>>> same load, there was a huge number of SessionExpired
> > and
> > > > > > >>>>>>> ConnectionLoss
> > > > > > >>>>>>>>>>> errors in 3.6 while no such errors in 3.4.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> The load testing is 500 concurrent users with a
> cluster
> > > of
> > > > 5
> > > > > > >>>>>>>>>> participants
> > > > > > >>>>>>>>>>> and 5 observers. The min session timeout on the
> server
> > > side
> > > > > > >> is
> > > > > > >>>>>>> 4000ms.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> I wonder if anyone has seen the same issue and has
> any
> > > > > > >>> insights
> > > > > > >>>>> on
> > > > > > >>>>>>> what
> > > > > > >>>>>>>>>>> could be the cause of the performance degradation.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> Thanks
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> Li
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: write performance issue in 3.6.2

Posted by Michael Han <ha...@apache.org>.

>> because the tests were run with Prometheus enabled,  which is new in 3.6
and has significant negative perf impact.

Interesting, let's see what the numbers are without Prometheus involved. It
could be that the increased latency we observed in CommitProcessor is just
a symptom rather than the cause if Prometheus is the culprit.

In my production environment we use an in house in-process metrics library
which hasn't caused any trouble for performance so far.

>> Does it measure how long a local write op takes in the commit processor
phase?

Yes.

>> I don't think the writes can be processed concurrently.

Correct. There can only be a single inflight write operation that's
committing at any given time. The concurrency is between a single write and
multiple reads where these reads belong to different sessions. Previously,
in 3.4, a single write will block processing of every other operation
including both writes and reads, and in 3.6, a single write will only block
reads that's from the same session; reads from other sessions can proceed
concurrently.

>> Is it correct to say that the new CommitProcessor works best for the
reads in the read/write workloads scenario,

Yes, as previously mentioned, a single write will now only
conditionally block other reads as opposed to unanimously block all reads
which increase overall throughput.


On Mon, May 3, 2021 at 5:06 PM Li Wang <li...@gmail.com> wrote:

> Hi Michael,
>
> Thanks for your additional inputs.
>
> On Mon, May 3, 2021 at 3:13 PM Michael Han <ha...@apache.org> wrote:
>
> > Hi Li,
> >
> > Thanks for following up.
> >
> > >> write_commitproc_time_ms were large
> >
> > This measures how long a local write op hears back from the leader. If
> it's
> > big, then either the leader is very busy acking the request, or your
> > network RTT is high.
> >
>
> Does it measure how long a local write op takes in the commit processor
> phase?
>
> ServerMetrics.getMetrics().WRITE_COMMITPROC_TIME.add(currentTime -
> request.commitProcQueueStartTime);
>
>
> > How does the local fsync time (fsynctime) look like between two tests?
> >
>
> The fsync time looks similar between two tests.
>
> >
> > >> We've found that increasing the maxCommitBatchSize
> >
> > Are you able to successfully tune this value so your benchmark against
> 3.6
> > is on par with 3.4 now? The original report mentioned lots of session
> > timeout and con loss. I am wondering if we can fix this first by tuning
> > this size.
> >
>
> We ran load tests with maxCommitBatchSize as 500 vs 1. 500 was used as we
> have 500 concurrent users in the load test.
> The connection loss error count was reduced about 40% and the session
> expired error was reduced about 45%.
> Tuning maxCommitBatchSize can significantly reduce the errors. We can not
> say the benchmark is on par with 3.4 (i.e. no errors)
> because the tests were run with Prometheus enabled,  which is new in 3.6
> and has significant negative perf impact.
> We will run tests with Prometheus disabled and maxCommitBatchSize as 500
> when we get a chance.
>
> >
> > The major difference of CommitProcessor between 3.4.14 and 3.6.2 is the
> > newly added per session commit queue such that reads and a write from
> > different sessions can be processed concurrently.
>
> Yes, I noticed that there is a queue per session in 3.6, but I don't think
> the writes can be processed concurrently.
> The CommitProcessor is single threaded and the CommitProc worker threads
> are only for reads. Did I miss anything?
>
> This works best for mixed
> > read / write workloads, but for pure write workloads, the new
> > CommitProcessor is not superior, as all writes still have to be
> serialized
> > due to global ordering, plus the per session queue has the overhead for
> > example in this test ZK has to manage 500 queues (and enqueue / dequeue
> and
> > so on cost cycles). Though, I didn't expect this overhead can create
> such a
> > big difference in your test..
> >
>
> Is it correct to say that the new CommitProcessor works best for the reads
> in the read/write workloads scenario,
> as only the reads can be processed concurrently?
>
>
> > Also this is obvious but just want to confirm if the benchmark for two
> > versions of ZK was done on exact same test environment including same OS
> > and networking configuration?
> >
>
> Yes, the benchmark for the two versions was done on the same test
> environment and configuration.
>
> >
> > On Mon, Apr 26, 2021 at 7:35 PM Li Wang <li...@gmail.com> wrote:
> >
> > > Hi Michael,
> > >
> > > Thanks for your reply.
> > >
> > > 1. The workload is 500 concurrent users creating nodes with data size
> of
> > 4
> > > bytes.
> > > 2. It's pure write
> > > 3. The perf issue is that under the same load, there were many session
> > > expired and connection loss errors when using ZK 3.6.2 but no such
> errors
> > > in ZK 3.4.14.
> > >
> > > The following are some updates on the issue.
> > >
> > > 1. We've checked the fine grained metrics and found that the
> > > CommitProcessor was the bottleneck. The commit_commit_proc_req_queued
> and
> > > the write_commitproc_time_ms were large.
> > > The errors were caused by too many commit requests queued up in the
> > > CommitProcessor and waiting to be processed.
> > > 2. We've found that increasing the maxCommitBatchSize can reduce both
> the
> > > session expired and connection loss errors.
> > > 3. We didn't observe any significant perf impact from the
> > RequestThrottler.
> > >
> > >
> > > Please let me know if you or anyone has any questions.
> > >
> > > Thanks,
> > >
> > > Li
> > >
> > >
> > >
> > > On Tue, Apr 20, 2021 at 8:03 PM Michael Han <ha...@apache.org> wrote:
> > >
> > > > What is the workload looking like? Is it pure write, or mixed read
> > write?
> > > >
> > > > A couple of ideas to move this forward:
> > > > * Publish the performance benchmark so the community can help.
> > > > * Bisect git commit and find the bad commit that caused the
> regression.
> > > > * Use the fine grained metrics introduced in 3.6 (e.g per processor
> > stage
> > > > metrics) to measure where time spends during writes. We might have to
> > add
> > > > these metrics on 3.4 to get a fair comparison.
> > > >
> > > > For the throttling - the RequestThrottler introduced in 3.6 does
> > > introduce
> > > > latency, but should not impact throughput this much.
> > > >
> > > > On Thu, Mar 11, 2021 at 11:46 AM Li Wang <li...@gmail.com> wrote:
> > > >
> > > > > The CPU usage of both server and client are normal (< 50%) during
> the
> > > > test.
> > > > >
> > > > > Based on the investigation, the server is too busy with the load.
> > > > >
> > > > > The issue doesn't exist in 3.4.14. I wonder why there is a
> > significant
> > > > > write performance degradation from 3.4.14 to 3.6.2 and how we can
> > > address
> > > > > the issue.
> > > > >
> > > > > Best,
> > > > >
> > > > > Li
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Mar 11, 2021 at 11:25 AM Andor Molnar <an...@apache.org>
> > > wrote:
> > > > >
> > > > > > What is the CPU usage of both server and client during the test?
> > > > > >
> > > > > > Looks like server is dropping the clients because either the
> server
> > > or
> > > > > > both are too busy to deal with the load.
> > > > > > This log line is also concerning: "Too busy to snap, skipping”
> > > > > >
> > > > > > If that’s the case I believe you'll have to profile the server
> > > process
> > > > to
> > > > > > figure out where the perf bottleneck is.
> > > > > >
> > > > > > Andor
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On 2021. Feb 22., at 5:31, Li Wang <li...@gmail.com> wrote:
> > > > > > >
> > > > > > > Thanks, Patrick.
> > > > > > >
> > > > > > > Yes, we are using the same JVM version and GC configurations
> when
> > > > > > > running the two tests. I have checked the GC metrics and also
> the
> > > > heap
> > > > > > dump
> > > > > > > of the 3.6, the GC pause and the memory usage look okay.
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Li
> > > > > > >
> > > > > > > On Sun, Feb 21, 2021 at 3:34 PM Patrick Hunt <phunt@apache.org
> >
> > > > wrote:
> > > > > > >
> > > > > > >> On Sun, Feb 21, 2021 at 3:28 PM Li Wang <li...@gmail.com>
> > > wrote:
> > > > > > >>
> > > > > > >>> Hi Enrico, Sushant,
> > > > > > >>>
> > > > > > >>> I re-run the perf test with the data consistency check
> feature
> > > > > disabled
> > > > > > >>> (i.e. -Dzookeeper.digest.enabled=false), the write
> performance
> > > > issue
> > > > > of
> > > > > > >> 3.6
> > > > > > >>> is still there.
> > > > > > >>>
> > > > > > >>> With everything exactly the same, the throughput of 3.6 was
> > only
> > > > 1/2
> > > > > of
> > > > > > >> 3.4
> > > > > > >>> and the max latency was more than 8 times.
> > > > > > >>>
> > > > > > >>> Any other points or thoughts?
> > > > > > >>>
> > > > > > >>>
> > > > > > >> In the past I've noticed a big impact of GC when doing certain
> > > > > > performance
> > > > > > >> measurements. I assume you are using the same JVM version and
> GC
> > > > when
> > > > > > >> running the two tests? Perhaps our memory footprint has
> expanded
> > > > over
> > > > > > time.
> > > > > > >> You should rule out GC by running with gc logging turned on
> with
> > > > both
> > > > > > >> versions and compare the impact.
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >>
> > > > > > >> Patrick
> > > > > > >>
> > > > > > >>
> > > > > > >>> Cheers,
> > > > > > >>>
> > > > > > >>> Li
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> On Sat, Feb 20, 2021 at 9:04 PM Li Wang <li...@gmail.com>
> > > wrote:
> > > > > > >>>
> > > > > > >>>> Thanks Sushant and Enrico!
> > > > > > >>>>
> > > > > > >>>> This is a really good point.  According to the 3.6
> > > documentation,
> > > > > the
> > > > > > >>>> feature is disabled by default.
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://zookeeper.apache.org/doc/r3.6.2/zookeeperAdmin.html#ch_administration
> > > > > > >>> .
> > > > > > >>>> However, checking the code, the default is enabled.
> > > > > > >>>>
> > > > > > >>>> Let me set the zookeeper.digest.enabled to false and see how
> > the
> > > > > write
> > > > > > >>>> operation performs.
> > > > > > >>>>
> > > > > > >>>> Best,
> > > > > > >>>>
> > > > > > >>>> Li
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> On Fri, Feb 19, 2021 at 1:32 PM Sushant Mane <
> > > > > sushantmane7@gmail.com>
> > > > > > >>>> wrote:
> > > > > > >>>>
> > > > > > >>>>> Hi Li,
> > > > > > >>>>>
> > > > > > >>>>> On 3.6.2 consistency checker (adhash based) is enabled by
> > > > default:
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/zookeeper/blob/803c7f1a12f85978cb049af5e4ef23bd8b688715/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java#L136
> > > > > > >>>>> .
> > > > > > >>>>> It is not present in ZK 3.4.14.
> > > > > > >>>>>
> > > > > > >>>>> This feature does have some impact on write performance.
> > > > > > >>>>>
> > > > > > >>>>> Thanks,
> > > > > > >>>>> Sushant
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> On Fri, Feb 19, 2021 at 12:50 PM Enrico Olivelli <
> > > > > > eolivelli@gmail.com
> > > > > > >>>
> > > > > > >>>>> wrote:
> > > > > > >>>>>
> > > > > > >>>>>> Li,
> > > > > > >>>>>> I wonder of we have some new throttling/back pressure
> > > mechanisms
> > > > > > >> that
> > > > > > >>> is
> > > > > > >>>>>> enabled by default.
> > > > > > >>>>>>
> > > > > > >>>>>> Does anyone has some pointer to relevant implementations?
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>> Enrico
> > > > > > >>>>>>
> > > > > > >>>>>> Il Ven 19 Feb 2021, 19:46 Li Wang <li...@gmail.com> ha
> > > > scritto:
> > > > > > >>>>>>
> > > > > > >>>>>>> Hi,
> > > > > > >>>>>>>
> > > > > > >>>>>>> We switched to Netty on both client side and server side
> > and
> > > > the
> > > > > > >>>>>>> performance issue is still there.  Anyone has any
> insights
> > on
> > > > > what
> > > > > > >>>>> could
> > > > > > >>>>>> be
> > > > > > >>>>>>> the cause of higher latency?
> > > > > > >>>>>>>
> > > > > > >>>>>>> Thanks,
> > > > > > >>>>>>>
> > > > > > >>>>>>> Li
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> On Mon, Feb 15, 2021 at 2:17 PM Li Wang <
> li4wang@gmail.com
> > >
> > > > > > >> wrote:
> > > > > > >>>>>>>
> > > > > > >>>>>>>> Hi Enrico,
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Thanks for the reply.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 1. We are using NIO based stack, not Netty based yet.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 2. Yes, here are some metrics on the client side.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 3.6: throughput: 7K, failure: 81215228, Avg Latency:
> 57ms,
> > > > Max
> > > > > > >>>>> Latency
> > > > > > >>>>>>> 31s
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 3.4: throughput: 15k, failure: 0,  Avg Latency: 30ms,
> Max
> > > > > > >>> Latency:
> > > > > > >>>>>> 1.6s
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 3. Yes, the JVM and zoo.cfg config are the exact same
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 10G of Heap
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 13G of Memory
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 5 Participante
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 5 Observere
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Client session timeout: 3000ms
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Server min session time: 4000ms
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 4. Yes, there are two types of  WARN logs and many
> > “Expiring
> > > > > > >>>>> session”
> > > > > > >>>>>>>> INFO log
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 2021-02-15 22:04:36,506 [myid:4] - WARN
> > > > > > >>>>>>>> [NIOWorkerThread-7:NIOServerCnxn@365] - Unexpected
> > > exception
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> EndOfStreamException: Unable to read additional data
> from
> > > > > > >> client,
> > > > > > >>> it
> > > > > > >>>>>>>> probably closed the socket: address = /
> > 100.108.63.116:43366
> > > ,
> > > > > > >>>>> session =
> > > > > > >>>>>>>> 0x400189fee9a000b
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:164)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at
> > > > > > >>>>>>
> > > > > > >>
> > > >
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:327)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> at java.base/java.lang.Thread.run(Thread.java:834)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 2021-02-15 22:05:14,428 [myid:4] - WARN
> > > > > > >>>>>>>> [SyncThread:4:SyncRequestProcessor@188] - Too busy to
> > snap,
> > > > > > >>>>> skipping
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 2021-02-15 22:01:51,427 [myid:4] - INFO
> > > > > > >>>>>>>> [SessionTracker:ZooKeeperServer@610] - Expiring session
> > > > > > >>>>>>>> 0x400189fee9a001e, timeout of 4000ms exceeded
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 5. Yes we upgrade both the client and the server to 3.6.
> > > > > > >> Actually,
> > > > > > >>>>> the
> > > > > > >>>>>>>> issue happened with the combinations of
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 3.4 client and 3.6 server
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 3.6 client and 3.6 server
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Please let me know if you need any additional info.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Thanks,
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Li
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> On Mon, Feb 15, 2021 at 1:44 PM Li Wang <
> > li4wang@gmail.com>
> > > > > > >>> wrote:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>> Hi Enrico,
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> Thanks for the reply.
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> 1. We are using direct NIO based stack, not Netty based
> > > yet.
> > > > > > >>>>>>>>> 2. Yes, on the client side, here are the metrics
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> 3.6:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> On Mon, Feb 15, 2021 at 10:44 AM Enrico Olivelli <
> > > > > > >>>>> eolivelli@gmail.com
> > > > > > >>>>>>>
> > > > > > >>>>>>>>> wrote:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>> IIRC The main difference is about the switch to Netty
> 4
> > > and
> > > > > > >>> about
> > > > > > >>>>>> using
> > > > > > >>>>>>>>>> more DirectMemory. Are you using the Netty based
> stack?
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Apart from that macro difference there have been many
> > many
> > > > > > >>> changes
> > > > > > >>>>>>> since
> > > > > > >>>>>>>>>> 3.4.
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Do you have some metrics to share?
> > > > > > >>>>>>>>>> Are the  JVM configurations and zoo.cfg configuration
> > > equals
> > > > > > >> to
> > > > > > >>>>> each
> > > > > > >>>>>>>>>> other?
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Do you see warnings on the server logs?
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Did you upgrade both the client and the server or only
> > the
> > > > > > >>> server?
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Enrico
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Il Lun 15 Feb 2021, 18:30 Li Wang <li...@gmail.com>
> > ha
> > > > > > >>> scritto:
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>>> Hi,
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> We want to upgrade from 3.4.14 to 3.6.2.  During the
> > > > > > >>>>> perform/load
> > > > > > >>>>>>>>>>> comparison test,  it was found that the performance
> of
> > > 3.6
> > > > > > >> has
> > > > > > >>>>> been
> > > > > > >>>>>>>>>>> significantly degraded compared to 3.4 for the write
> > > > > > >>> operation.
> > > > > > >>>>>> Under
> > > > > > >>>>>>>>>> the
> > > > > > >>>>>>>>>>> same load, there was a huge number of SessionExpired
> > and
> > > > > > >>>>>>> ConnectionLoss
> > > > > > >>>>>>>>>>> errors in 3.6 while no such errors in 3.4.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> The load testing is 500 concurrent users with a
> cluster
> > > of
> > > > 5
> > > > > > >>>>>>>>>> participants
> > > > > > >>>>>>>>>>> and 5 observers. The min session timeout on the
> server
> > > side
> > > > > > >> is
> > > > > > >>>>>>> 4000ms.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> I wonder if anyone has seen the same issue and has
> any
> > > > > > >>> insights
> > > > > > >>>>> on
> > > > > > >>>>>>> what
> > > > > > >>>>>>>>>>> could be the cause of the performance degradation.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> Thanks
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> Li
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: write performance issue in 3.6.2

Posted by Li Wang <li...@gmail.com>.

Hi Michael,

Thanks for your additional inputs.

On Mon, May 3, 2021 at 3:13 PM Michael Han <ha...@apache.org> wrote:

> Hi Li,
>
> Thanks for following up.
>
> >> write_commitproc_time_ms were large
>
> This measures how long a local write op hears back from the leader. If it's
> big, then either the leader is very busy acking the request, or your
> network RTT is high.
>

Does it measure how long a local write op takes in the commit processor
phase?

ServerMetrics.getMetrics().WRITE_COMMITPROC_TIME.add(currentTime -
request.commitProcQueueStartTime);


> How does the local fsync time (fsynctime) look like between two tests?
>

The fsync time looks similar between two tests.

>
> >> We've found that increasing the maxCommitBatchSize
>
> Are you able to successfully tune this value so your benchmark against 3.6
> is on par with 3.4 now? The original report mentioned lots of session
> timeout and con loss. I am wondering if we can fix this first by tuning
> this size.
>

We ran load tests with maxCommitBatchSize as 500 vs 1. 500 was used as we
have 500 concurrent users in the load test.
The connection loss error count was reduced about 40% and the session
expired error was reduced about 45%.
Tuning maxCommitBatchSize can significantly reduce the errors. We can not
say the benchmark is on par with 3.4 (i.e. no errors)
because the tests were run with Prometheus enabled,  which is new in 3.6
and has significant negative perf impact.
We will run tests with Prometheus disabled and maxCommitBatchSize as 500
when we get a chance.

>
> The major difference of CommitProcessor between 3.4.14 and 3.6.2 is the
> newly added per session commit queue such that reads and a write from
> different sessions can be processed concurrently.

Yes, I noticed that there is a queue per session in 3.6, but I don't think
the writes can be processed concurrently.
The CommitProcessor is single threaded and the CommitProc worker threads
are only for reads. Did I miss anything?

This works best for mixed
> read / write workloads, but for pure write workloads, the new
> CommitProcessor is not superior, as all writes still have to be serialized
> due to global ordering, plus the per session queue has the overhead for
> example in this test ZK has to manage 500 queues (and enqueue / dequeue and
> so on cost cycles). Though, I didn't expect this overhead can create such a
> big difference in your test..
>

Is it correct to say that the new CommitProcessor works best for the reads
in the read/write workloads scenario,
as only the reads can be processed concurrently?


> Also this is obvious but just want to confirm if the benchmark for two
> versions of ZK was done on exact same test environment including same OS
> and networking configuration?
>

Yes, the benchmark for the two versions was done on the same test
environment and configuration.

>
> On Mon, Apr 26, 2021 at 7:35 PM Li Wang <li...@gmail.com> wrote:
>
> > Hi Michael,
> >
> > Thanks for your reply.
> >
> > 1. The workload is 500 concurrent users creating nodes with data size of
> 4
> > bytes.
> > 2. It's pure write
> > 3. The perf issue is that under the same load, there were many session
> > expired and connection loss errors when using ZK 3.6.2 but no such errors
> > in ZK 3.4.14.
> >
> > The following are some updates on the issue.
> >
> > 1. We've checked the fine grained metrics and found that the
> > CommitProcessor was the bottleneck. The commit_commit_proc_req_queued and
> > the write_commitproc_time_ms were large.
> > The errors were caused by too many commit requests queued up in the
> > CommitProcessor and waiting to be processed.
> > 2. We've found that increasing the maxCommitBatchSize can reduce both the
> > session expired and connection loss errors.
> > 3. We didn't observe any significant perf impact from the
> RequestThrottler.
> >
> >
> > Please let me know if you or anyone has any questions.
> >
> > Thanks,
> >
> > Li
> >
> >
> >
> > On Tue, Apr 20, 2021 at 8:03 PM Michael Han <ha...@apache.org> wrote:
> >
> > > What is the workload looking like? Is it pure write, or mixed read
> write?
> > >
> > > A couple of ideas to move this forward:
> > > * Publish the performance benchmark so the community can help.
> > > * Bisect git commit and find the bad commit that caused the regression.
> > > * Use the fine grained metrics introduced in 3.6 (e.g per processor
> stage
> > > metrics) to measure where time spends during writes. We might have to
> add
> > > these metrics on 3.4 to get a fair comparison.
> > >
> > > For the throttling - the RequestThrottler introduced in 3.6 does
> > introduce
> > > latency, but should not impact throughput this much.
> > >
> > > On Thu, Mar 11, 2021 at 11:46 AM Li Wang <li...@gmail.com> wrote:
> > >
> > > > The CPU usage of both server and client are normal (< 50%) during the
> > > test.
> > > >
> > > > Based on the investigation, the server is too busy with the load.
> > > >
> > > > The issue doesn't exist in 3.4.14. I wonder why there is a
> significant
> > > > write performance degradation from 3.4.14 to 3.6.2 and how we can
> > address
> > > > the issue.
> > > >
> > > > Best,
> > > >
> > > > Li
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Mar 11, 2021 at 11:25 AM Andor Molnar <an...@apache.org>
> > wrote:
> > > >
> > > > > What is the CPU usage of both server and client during the test?
> > > > >
> > > > > Looks like server is dropping the clients because either the server
> > or
> > > > > both are too busy to deal with the load.
> > > > > This log line is also concerning: "Too busy to snap, skipping”
> > > > >
> > > > > If that’s the case I believe you'll have to profile the server
> > process
> > > to
> > > > > figure out where the perf bottleneck is.
> > > > >
> > > > > Andor
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > On 2021. Feb 22., at 5:31, Li Wang <li...@gmail.com> wrote:
> > > > > >
> > > > > > Thanks, Patrick.
> > > > > >
> > > > > > Yes, we are using the same JVM version and GC configurations when
> > > > > > running the two tests. I have checked the GC metrics and also the
> > > heap
> > > > > dump
> > > > > > of the 3.6, the GC pause and the memory usage look okay.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Li
> > > > > >
> > > > > > On Sun, Feb 21, 2021 at 3:34 PM Patrick Hunt <ph...@apache.org>
> > > wrote:
> > > > > >
> > > > > >> On Sun, Feb 21, 2021 at 3:28 PM Li Wang <li...@gmail.com>
> > wrote:
> > > > > >>
> > > > > >>> Hi Enrico, Sushant,
> > > > > >>>
> > > > > >>> I re-run the perf test with the data consistency check feature
> > > > disabled
> > > > > >>> (i.e. -Dzookeeper.digest.enabled=false), the write performance
> > > issue
> > > > of
> > > > > >> 3.6
> > > > > >>> is still there.
> > > > > >>>
> > > > > >>> With everything exactly the same, the throughput of 3.6 was
> only
> > > 1/2
> > > > of
> > > > > >> 3.4
> > > > > >>> and the max latency was more than 8 times.
> > > > > >>>
> > > > > >>> Any other points or thoughts?
> > > > > >>>
> > > > > >>>
> > > > > >> In the past I've noticed a big impact of GC when doing certain
> > > > > performance
> > > > > >> measurements. I assume you are using the same JVM version and GC
> > > when
> > > > > >> running the two tests? Perhaps our memory footprint has expanded
> > > over
> > > > > time.
> > > > > >> You should rule out GC by running with gc logging turned on with
> > > both
> > > > > >> versions and compare the impact.
> > > > > >>
> > > > > >> Regards,
> > > > > >>
> > > > > >> Patrick
> > > > > >>
> > > > > >>
> > > > > >>> Cheers,
> > > > > >>>
> > > > > >>> Li
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> On Sat, Feb 20, 2021 at 9:04 PM Li Wang <li...@gmail.com>
> > wrote:
> > > > > >>>
> > > > > >>>> Thanks Sushant and Enrico!
> > > > > >>>>
> > > > > >>>> This is a really good point.  According to the 3.6
> > documentation,
> > > > the
> > > > > >>>> feature is disabled by default.
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://zookeeper.apache.org/doc/r3.6.2/zookeeperAdmin.html#ch_administration
> > > > > >>> .
> > > > > >>>> However, checking the code, the default is enabled.
> > > > > >>>>
> > > > > >>>> Let me set the zookeeper.digest.enabled to false and see how
> the
> > > > write
> > > > > >>>> operation performs.
> > > > > >>>>
> > > > > >>>> Best,
> > > > > >>>>
> > > > > >>>> Li
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Fri, Feb 19, 2021 at 1:32 PM Sushant Mane <
> > > > sushantmane7@gmail.com>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>>> Hi Li,
> > > > > >>>>>
> > > > > >>>>> On 3.6.2 consistency checker (adhash based) is enabled by
> > > default:
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/zookeeper/blob/803c7f1a12f85978cb049af5e4ef23bd8b688715/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java#L136
> > > > > >>>>> .
> > > > > >>>>> It is not present in ZK 3.4.14.
> > > > > >>>>>
> > > > > >>>>> This feature does have some impact on write performance.
> > > > > >>>>>
> > > > > >>>>> Thanks,
> > > > > >>>>> Sushant
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Fri, Feb 19, 2021 at 12:50 PM Enrico Olivelli <
> > > > > eolivelli@gmail.com
> > > > > >>>
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>>> Li,
> > > > > >>>>>> I wonder of we have some new throttling/back pressure
> > mechanisms
> > > > > >> that
> > > > > >>> is
> > > > > >>>>>> enabled by default.
> > > > > >>>>>>
> > > > > >>>>>> Does anyone has some pointer to relevant implementations?
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> Enrico
> > > > > >>>>>>
> > > > > >>>>>> Il Ven 19 Feb 2021, 19:46 Li Wang <li...@gmail.com> ha
> > > scritto:
> > > > > >>>>>>
> > > > > >>>>>>> Hi,
> > > > > >>>>>>>
> > > > > >>>>>>> We switched to Netty on both client side and server side
> and
> > > the
> > > > > >>>>>>> performance issue is still there.  Anyone has any insights
> on
> > > > what
> > > > > >>>>> could
> > > > > >>>>>> be
> > > > > >>>>>>> the cause of higher latency?
> > > > > >>>>>>>
> > > > > >>>>>>> Thanks,
> > > > > >>>>>>>
> > > > > >>>>>>> Li
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Mon, Feb 15, 2021 at 2:17 PM Li Wang <li4wang@gmail.com
> >
> > > > > >> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Hi Enrico,
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> Thanks for the reply.
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 1. We are using NIO based stack, not Netty based yet.
> > > > > >>>>>>>>
> > > > > >>>>>>>> 2. Yes, here are some metrics on the client side.
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 3.6: throughput: 7K, failure: 81215228, Avg Latency: 57ms,
> > > Max
> > > > > >>>>> Latency
> > > > > >>>>>>> 31s
> > > > > >>>>>>>>
> > > > > >>>>>>>> 3.4: throughput: 15k, failure: 0,  Avg Latency: 30ms,  Max
> > > > > >>> Latency:
> > > > > >>>>>> 1.6s
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 3. Yes, the JVM and zoo.cfg config are the exact same
> > > > > >>>>>>>>
> > > > > >>>>>>>> 10G of Heap
> > > > > >>>>>>>>
> > > > > >>>>>>>> 13G of Memory
> > > > > >>>>>>>>
> > > > > >>>>>>>> 5 Participante
> > > > > >>>>>>>>
> > > > > >>>>>>>> 5 Observere
> > > > > >>>>>>>>
> > > > > >>>>>>>> Client session timeout: 3000ms
> > > > > >>>>>>>>
> > > > > >>>>>>>> Server min session time: 4000ms
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 4. Yes, there are two types of  WARN logs and many
> “Expiring
> > > > > >>>>> session”
> > > > > >>>>>>>> INFO log
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 2021-02-15 22:04:36,506 [myid:4] - WARN
> > > > > >>>>>>>> [NIOWorkerThread-7:NIOServerCnxn@365] - Unexpected
> > exception
> > > > > >>>>>>>>
> > > > > >>>>>>>> EndOfStreamException: Unable to read additional data from
> > > > > >> client,
> > > > > >>> it
> > > > > >>>>>>>> probably closed the socket: address = /
> 100.108.63.116:43366
> > ,
> > > > > >>>>> session =
> > > > > >>>>>>>> 0x400189fee9a000b
> > > > > >>>>>>>>
> > > > > >>>>>>>> at
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:164)
> > > > > >>>>>>>>
> > > > > >>>>>>>> at
> > > > > >>>>>>
> > > > > >>
> > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:327)
> > > > > >>>>>>>>
> > > > > >>>>>>>> at
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
> > > > > >>>>>>>>
> > > > > >>>>>>>> at
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> > > > > >>>>>>>>
> > > > > >>>>>>>> at
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> > > > > >>>>>>>>
> > > > > >>>>>>>> at
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> > > > > >>>>>>>>
> > > > > >>>>>>>> at java.base/java.lang.Thread.run(Thread.java:834)
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 2021-02-15 22:05:14,428 [myid:4] - WARN
> > > > > >>>>>>>> [SyncThread:4:SyncRequestProcessor@188] - Too busy to
> snap,
> > > > > >>>>> skipping
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 2021-02-15 22:01:51,427 [myid:4] - INFO
> > > > > >>>>>>>> [SessionTracker:ZooKeeperServer@610] - Expiring session
> > > > > >>>>>>>> 0x400189fee9a001e, timeout of 4000ms exceeded
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 5. Yes we upgrade both the client and the server to 3.6.
> > > > > >> Actually,
> > > > > >>>>> the
> > > > > >>>>>>>> issue happened with the combinations of
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 3.4 client and 3.6 server
> > > > > >>>>>>>>
> > > > > >>>>>>>> 3.6 client and 3.6 server
> > > > > >>>>>>>>
> > > > > >>>>>>>> Please let me know if you need any additional info.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Thanks,
> > > > > >>>>>>>>
> > > > > >>>>>>>> Li
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Mon, Feb 15, 2021 at 1:44 PM Li Wang <
> li4wang@gmail.com>
> > > > > >>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Hi Enrico,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thanks for the reply.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> 1. We are using direct NIO based stack, not Netty based
> > yet.
> > > > > >>>>>>>>> 2. Yes, on the client side, here are the metrics
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> 3.6:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Mon, Feb 15, 2021 at 10:44 AM Enrico Olivelli <
> > > > > >>>>> eolivelli@gmail.com
> > > > > >>>>>>>
> > > > > >>>>>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> IIRC The main difference is about the switch to Netty 4
> > and
> > > > > >>> about
> > > > > >>>>>> using
> > > > > >>>>>>>>>> more DirectMemory. Are you using the Netty based stack?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Apart from that macro difference there have been many
> many
> > > > > >>> changes
> > > > > >>>>>>> since
> > > > > >>>>>>>>>> 3.4.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Do you have some metrics to share?
> > > > > >>>>>>>>>> Are the  JVM configurations and zoo.cfg configuration
> > equals
> > > > > >> to
> > > > > >>>>> each
> > > > > >>>>>>>>>> other?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Do you see warnings on the server logs?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Did you upgrade both the client and the server or only
> the
> > > > > >>> server?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Enrico
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Il Lun 15 Feb 2021, 18:30 Li Wang <li...@gmail.com>
> ha
> > > > > >>> scritto:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> Hi,
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> We want to upgrade from 3.4.14 to 3.6.2.  During the
> > > > > >>>>> perform/load
> > > > > >>>>>>>>>>> comparison test,  it was found that the performance of
> > 3.6
> > > > > >> has
> > > > > >>>>> been
> > > > > >>>>>>>>>>> significantly degraded compared to 3.4 for the write
> > > > > >>> operation.
> > > > > >>>>>> Under
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>>> same load, there was a huge number of SessionExpired
> and
> > > > > >>>>>>> ConnectionLoss
> > > > > >>>>>>>>>>> errors in 3.6 while no such errors in 3.4.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> The load testing is 500 concurrent users with a cluster
> > of
> > > 5
> > > > > >>>>>>>>>> participants
> > > > > >>>>>>>>>>> and 5 observers. The min session timeout on the server
> > side
> > > > > >> is
> > > > > >>>>>>> 4000ms.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> I wonder if anyone has seen the same issue and has any
> > > > > >>> insights
> > > > > >>>>> on
> > > > > >>>>>>> what
> > > > > >>>>>>>>>>> could be the cause of the performance degradation.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Thanks
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Li
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: write performance issue in 3.6.2

Posted by Li Wang <li...@gmail.com>.

Hi Michael,

Thanks for your additional inputs.

On Mon, May 3, 2021 at 3:13 PM Michael Han <ha...@apache.org> wrote:

> Hi Li,
>
> Thanks for following up.
>
> >> write_commitproc_time_ms were large
>
> This measures how long a local write op hears back from the leader. If it's
> big, then either the leader is very busy acking the request, or your
> network RTT is high.
>

Does it measure how long a local write op takes in the commit processor
phase?

ServerMetrics.getMetrics().WRITE_COMMITPROC_TIME.add(currentTime -
request.commitProcQueueStartTime);


> How does the local fsync time (fsynctime) look like between two tests?
>

The fsync time looks similar between two tests.

>
> >> We've found that increasing the maxCommitBatchSize
>
> Are you able to successfully tune this value so your benchmark against 3.6
> is on par with 3.4 now? The original report mentioned lots of session
> timeout and con loss. I am wondering if we can fix this first by tuning
> this size.
>

We ran load tests with maxCommitBatchSize as 500 vs 1. 500 was used as we
have 500 concurrent users in the load test.
The connection loss error count was reduced about 40% and the session
expired error was reduced about 45%.
Tuning maxCommitBatchSize can significantly reduce the errors. We can not
say the benchmark is on par with 3.4 (i.e. no errors)
because the tests were run with Prometheus enabled,  which is new in 3.6
and has significant negative perf impact.
We will run tests with Prometheus disabled and maxCommitBatchSize as 500
when we get a chance.

>
> The major difference of CommitProcessor between 3.4.14 and 3.6.2 is the
> newly added per session commit queue such that reads and a write from
> different sessions can be processed concurrently.

Yes, I noticed that there is a queue per session in 3.6, but I don't think
the writes can be processed concurrently.
The CommitProcessor is single threaded and the CommitProc worker threads
are only for reads. Did I miss anything?

This works best for mixed
> read / write workloads, but for pure write workloads, the new
> CommitProcessor is not superior, as all writes still have to be serialized
> due to global ordering, plus the per session queue has the overhead for
> example in this test ZK has to manage 500 queues (and enqueue / dequeue and
> so on cost cycles). Though, I didn't expect this overhead can create such a
> big difference in your test..
>

Is it correct to say that the new CommitProcessor works best for the reads
in the read/write workloads scenario,
as only the reads can be processed concurrently?


> Also this is obvious but just want to confirm if the benchmark for two
> versions of ZK was done on exact same test environment including same OS
> and networking configuration?
>

Yes, the benchmark for the two versions was done on the same test
environment and configuration.

>
> On Mon, Apr 26, 2021 at 7:35 PM Li Wang <li...@gmail.com> wrote:
>
> > Hi Michael,
> >
> > Thanks for your reply.
> >
> > 1. The workload is 500 concurrent users creating nodes with data size of
> 4
> > bytes.
> > 2. It's pure write
> > 3. The perf issue is that under the same load, there were many session
> > expired and connection loss errors when using ZK 3.6.2 but no such errors
> > in ZK 3.4.14.
> >
> > The following are some updates on the issue.
> >
> > 1. We've checked the fine grained metrics and found that the
> > CommitProcessor was the bottleneck. The commit_commit_proc_req_queued and
> > the write_commitproc_time_ms were large.
> > The errors were caused by too many commit requests queued up in the
> > CommitProcessor and waiting to be processed.
> > 2. We've found that increasing the maxCommitBatchSize can reduce both the
> > session expired and connection loss errors.
> > 3. We didn't observe any significant perf impact from the
> RequestThrottler.
> >
> >
> > Please let me know if you or anyone has any questions.
> >
> > Thanks,
> >
> > Li
> >
> >
> >
> > On Tue, Apr 20, 2021 at 8:03 PM Michael Han <ha...@apache.org> wrote:
> >
> > > What is the workload looking like? Is it pure write, or mixed read
> write?
> > >
> > > A couple of ideas to move this forward:
> > > * Publish the performance benchmark so the community can help.
> > > * Bisect git commit and find the bad commit that caused the regression.
> > > * Use the fine grained metrics introduced in 3.6 (e.g per processor
> stage
> > > metrics) to measure where time spends during writes. We might have to
> add
> > > these metrics on 3.4 to get a fair comparison.
> > >
> > > For the throttling - the RequestThrottler introduced in 3.6 does
> > introduce
> > > latency, but should not impact throughput this much.
> > >
> > > On Thu, Mar 11, 2021 at 11:46 AM Li Wang <li...@gmail.com> wrote:
> > >
> > > > The CPU usage of both server and client are normal (< 50%) during the
> > > test.
> > > >
> > > > Based on the investigation, the server is too busy with the load.
> > > >
> > > > The issue doesn't exist in 3.4.14. I wonder why there is a
> significant
> > > > write performance degradation from 3.4.14 to 3.6.2 and how we can
> > address
> > > > the issue.
> > > >
> > > > Best,
> > > >
> > > > Li
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Mar 11, 2021 at 11:25 AM Andor Molnar <an...@apache.org>
> > wrote:
> > > >
> > > > > What is the CPU usage of both server and client during the test?
> > > > >
> > > > > Looks like server is dropping the clients because either the server
> > or
> > > > > both are too busy to deal with the load.
> > > > > This log line is also concerning: "Too busy to snap, skipping”
> > > > >
> > > > > If that’s the case I believe you'll have to profile the server
> > process
> > > to
> > > > > figure out where the perf bottleneck is.
> > > > >
> > > > > Andor
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > On 2021. Feb 22., at 5:31, Li Wang <li...@gmail.com> wrote:
> > > > > >
> > > > > > Thanks, Patrick.
> > > > > >
> > > > > > Yes, we are using the same JVM version and GC configurations when
> > > > > > running the two tests. I have checked the GC metrics and also the
> > > heap
> > > > > dump
> > > > > > of the 3.6, the GC pause and the memory usage look okay.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Li
> > > > > >
> > > > > > On Sun, Feb 21, 2021 at 3:34 PM Patrick Hunt <ph...@apache.org>
> > > wrote:
> > > > > >
> > > > > >> On Sun, Feb 21, 2021 at 3:28 PM Li Wang <li...@gmail.com>
> > wrote:
> > > > > >>
> > > > > >>> Hi Enrico, Sushant,
> > > > > >>>
> > > > > >>> I re-run the perf test with the data consistency check feature
> > > > disabled
> > > > > >>> (i.e. -Dzookeeper.digest.enabled=false), the write performance
> > > issue
> > > > of
> > > > > >> 3.6
> > > > > >>> is still there.
> > > > > >>>
> > > > > >>> With everything exactly the same, the throughput of 3.6 was
> only
> > > 1/2
> > > > of
> > > > > >> 3.4
> > > > > >>> and the max latency was more than 8 times.
> > > > > >>>
> > > > > >>> Any other points or thoughts?
> > > > > >>>
> > > > > >>>
> > > > > >> In the past I've noticed a big impact of GC when doing certain
> > > > > performance
> > > > > >> measurements. I assume you are using the same JVM version and GC
> > > when
> > > > > >> running the two tests? Perhaps our memory footprint has expanded
> > > over
> > > > > time.
> > > > > >> You should rule out GC by running with gc logging turned on with
> > > both
> > > > > >> versions and compare the impact.
> > > > > >>
> > > > > >> Regards,
> > > > > >>
> > > > > >> Patrick
> > > > > >>
> > > > > >>
> > > > > >>> Cheers,
> > > > > >>>
> > > > > >>> Li
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> On Sat, Feb 20, 2021 at 9:04 PM Li Wang <li...@gmail.com>
> > wrote:
> > > > > >>>
> > > > > >>>> Thanks Sushant and Enrico!
> > > > > >>>>
> > > > > >>>> This is a really good point.  According to the 3.6
> > documentation,
> > > > the
> > > > > >>>> feature is disabled by default.
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://zookeeper.apache.org/doc/r3.6.2/zookeeperAdmin.html#ch_administration
> > > > > >>> .
> > > > > >>>> However, checking the code, the default is enabled.
> > > > > >>>>
> > > > > >>>> Let me set the zookeeper.digest.enabled to false and see how
> the
> > > > write
> > > > > >>>> operation performs.
> > > > > >>>>
> > > > > >>>> Best,
> > > > > >>>>
> > > > > >>>> Li
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Fri, Feb 19, 2021 at 1:32 PM Sushant Mane <
> > > > sushantmane7@gmail.com>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>>> Hi Li,
> > > > > >>>>>
> > > > > >>>>> On 3.6.2 consistency checker (adhash based) is enabled by
> > > default:
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/zookeeper/blob/803c7f1a12f85978cb049af5e4ef23bd8b688715/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java#L136
> > > > > >>>>> .
> > > > > >>>>> It is not present in ZK 3.4.14.
> > > > > >>>>>
> > > > > >>>>> This feature does have some impact on write performance.
> > > > > >>>>>
> > > > > >>>>> Thanks,
> > > > > >>>>> Sushant
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Fri, Feb 19, 2021 at 12:50 PM Enrico Olivelli <
> > > > > eolivelli@gmail.com
> > > > > >>>
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>>> Li,
> > > > > >>>>>> I wonder of we have some new throttling/back pressure
> > mechanisms
> > > > > >> that
> > > > > >>> is
> > > > > >>>>>> enabled by default.
> > > > > >>>>>>
> > > > > >>>>>> Does anyone has some pointer to relevant implementations?
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> Enrico
> > > > > >>>>>>
> > > > > >>>>>> Il Ven 19 Feb 2021, 19:46 Li Wang <li...@gmail.com> ha
> > > scritto:
> > > > > >>>>>>
> > > > > >>>>>>> Hi,
> > > > > >>>>>>>
> > > > > >>>>>>> We switched to Netty on both client side and server side
> and
> > > the
> > > > > >>>>>>> performance issue is still there.  Anyone has any insights
> on
> > > > what
> > > > > >>>>> could
> > > > > >>>>>> be
> > > > > >>>>>>> the cause of higher latency?
> > > > > >>>>>>>
> > > > > >>>>>>> Thanks,
> > > > > >>>>>>>
> > > > > >>>>>>> Li
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Mon, Feb 15, 2021 at 2:17 PM Li Wang <li4wang@gmail.com
> >
> > > > > >> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Hi Enrico,
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> Thanks for the reply.
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 1. We are using NIO based stack, not Netty based yet.
> > > > > >>>>>>>>
> > > > > >>>>>>>> 2. Yes, here are some metrics on the client side.
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 3.6: throughput: 7K, failure: 81215228, Avg Latency: 57ms,
> > > Max
> > > > > >>>>> Latency
> > > > > >>>>>>> 31s
> > > > > >>>>>>>>
> > > > > >>>>>>>> 3.4: throughput: 15k, failure: 0,  Avg Latency: 30ms,  Max
> > > > > >>> Latency:
> > > > > >>>>>> 1.6s
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 3. Yes, the JVM and zoo.cfg config are the exact same
> > > > > >>>>>>>>
> > > > > >>>>>>>> 10G of Heap
> > > > > >>>>>>>>
> > > > > >>>>>>>> 13G of Memory
> > > > > >>>>>>>>
> > > > > >>>>>>>> 5 Participante
> > > > > >>>>>>>>
> > > > > >>>>>>>> 5 Observere
> > > > > >>>>>>>>
> > > > > >>>>>>>> Client session timeout: 3000ms
> > > > > >>>>>>>>
> > > > > >>>>>>>> Server min session time: 4000ms
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 4. Yes, there are two types of  WARN logs and many
> “Expiring
> > > > > >>>>> session”
> > > > > >>>>>>>> INFO log
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 2021-02-15 22:04:36,506 [myid:4] - WARN
> > > > > >>>>>>>> [NIOWorkerThread-7:NIOServerCnxn@365] - Unexpected
> > exception
> > > > > >>>>>>>>
> > > > > >>>>>>>> EndOfStreamException: Unable to read additional data from
> > > > > >> client,
> > > > > >>> it
> > > > > >>>>>>>> probably closed the socket: address = /
> 100.108.63.116:43366
> > ,
> > > > > >>>>> session =
> > > > > >>>>>>>> 0x400189fee9a000b
> > > > > >>>>>>>>
> > > > > >>>>>>>> at
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:164)
> > > > > >>>>>>>>
> > > > > >>>>>>>> at
> > > > > >>>>>>
> > > > > >>
> > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:327)
> > > > > >>>>>>>>
> > > > > >>>>>>>> at
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
> > > > > >>>>>>>>
> > > > > >>>>>>>> at
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> > > > > >>>>>>>>
> > > > > >>>>>>>> at
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> > > > > >>>>>>>>
> > > > > >>>>>>>> at
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> > > > > >>>>>>>>
> > > > > >>>>>>>> at java.base/java.lang.Thread.run(Thread.java:834)
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 2021-02-15 22:05:14,428 [myid:4] - WARN
> > > > > >>>>>>>> [SyncThread:4:SyncRequestProcessor@188] - Too busy to
> snap,
> > > > > >>>>> skipping
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 2021-02-15 22:01:51,427 [myid:4] - INFO
> > > > > >>>>>>>> [SessionTracker:ZooKeeperServer@610] - Expiring session
> > > > > >>>>>>>> 0x400189fee9a001e, timeout of 4000ms exceeded
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 5. Yes we upgrade both the client and the server to 3.6.
> > > > > >> Actually,
> > > > > >>>>> the
> > > > > >>>>>>>> issue happened with the combinations of
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> 3.4 client and 3.6 server
> > > > > >>>>>>>>
> > > > > >>>>>>>> 3.6 client and 3.6 server
> > > > > >>>>>>>>
> > > > > >>>>>>>> Please let me know if you need any additional info.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Thanks,
> > > > > >>>>>>>>
> > > > > >>>>>>>> Li
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Mon, Feb 15, 2021 at 1:44 PM Li Wang <
> li4wang@gmail.com>
> > > > > >>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Hi Enrico,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thanks for the reply.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> 1. We are using direct NIO based stack, not Netty based
> > yet.
> > > > > >>>>>>>>> 2. Yes, on the client side, here are the metrics
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> 3.6:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Mon, Feb 15, 2021 at 10:44 AM Enrico Olivelli <
> > > > > >>>>> eolivelli@gmail.com
> > > > > >>>>>>>
> > > > > >>>>>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> IIRC The main difference is about the switch to Netty 4
> > and
> > > > > >>> about
> > > > > >>>>>> using
> > > > > >>>>>>>>>> more DirectMemory. Are you using the Netty based stack?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Apart from that macro difference there have been many
> many
> > > > > >>> changes
> > > > > >>>>>>> since
> > > > > >>>>>>>>>> 3.4.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Do you have some metrics to share?
> > > > > >>>>>>>>>> Are the  JVM configurations and zoo.cfg configuration
> > equals
> > > > > >> to
> > > > > >>>>> each
> > > > > >>>>>>>>>> other?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Do you see warnings on the server logs?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Did you upgrade both the client and the server or only
> the
> > > > > >>> server?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Enrico
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Il Lun 15 Feb 2021, 18:30 Li Wang <li...@gmail.com>
> ha
> > > > > >>> scritto:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> Hi,
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> We want to upgrade from 3.4.14 to 3.6.2.  During the
> > > > > >>>>> perform/load
> > > > > >>>>>>>>>>> comparison test,  it was found that the performance of
> > 3.6
> > > > > >> has
> > > > > >>>>> been
> > > > > >>>>>>>>>>> significantly degraded compared to 3.4 for the write
> > > > > >>> operation.
> > > > > >>>>>> Under
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>>> same load, there was a huge number of SessionExpired
> and
> > > > > >>>>>>> ConnectionLoss
> > > > > >>>>>>>>>>> errors in 3.6 while no such errors in 3.4.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> The load testing is 500 concurrent users with a cluster
> > of
> > > 5
> > > > > >>>>>>>>>> participants
> > > > > >>>>>>>>>>> and 5 observers. The min session timeout on the server
> > side
> > > > > >> is
> > > > > >>>>>>> 4000ms.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> I wonder if anyone has seen the same issue and has any
> > > > > >>> insights
> > > > > >>>>> on
> > > > > >>>>>>> what
> > > > > >>>>>>>>>>> could be the cause of the performance degradation.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Thanks
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Li
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>