You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Kfir Lev-Ari (JIRA)" <ji...@apache.org> on 2016/04/13 10:48:25 UTC

[jira] [Commented] (ZOOKEEPER-2024) Major throughput improvement with mixed workloads

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238870#comment-15238870 ] 

Kfir Lev-Ari commented on ZOOKEEPER-2024:
-----------------------------------------

[~fpj], thanks for pushing for the latency numbers, they turned out to be quite good and I already had it in the logs. I've updated in the evaluation document the throughput graphs - now they contain latency data for each scenario. Its not the most elegant way to present it, but here is the summary.

As expected, the latency in read-only workload is equivalent in both algorithms, as well as in write-only workloads.
The new algorithm significantly reduces read and write latency in mixed workloads in which the write percentage is below 30 (for reads up to 96% improvement, for writes up to 89%).
In cases of mixed workload in which the R/W clients have write intensive workload (of 100%-70%) the latency of the RO is reduced due to the starvation problem in the original algorithm (up to -184% reduction). I.e., the readers in the original algorithm have lower latency, but they read older values and postpone the update of the local DB by remote writes. See the median latency in the 3 servers scenario, 50% R/W Clients, 50% RO Clients, 0% reads column.


> Major throughput improvement with mixed workloads
> -------------------------------------------------
>
>                 Key: ZOOKEEPER-2024
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2024
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: quorum, server
>            Reporter: Kfir Lev-Ari
>            Assignee: Kfir Lev-Ari
>             Fix For: 3.5.3
>
>         Attachments: ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch
>
>
> The patch is applied to the commit processor, and solves two problems:
> 1. Stalling - once the commit processor encounters a local write request, it stalls local processing of all sessions until it receives a commit of that request from the leader. 
> In mixed workloads, this severely hampers performance as it does not allow read-only sessions to proceed at faster speed than read-write ones.
> 2. Starvation - as long as there are read requests to process, older remote committed write requests are starved. 
> This occurs due to a bug fix (https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that forces processing of local read requests before handling any committed write. The problem is only manifested under high local read load. 
> Our solution solves these two problems. It improves throughput in mixed workloads (in our tests, by up to 8x), and reduces latency, especially higher percentiles (i.e., slowest requests). 
> The main idea is to separate sessions that inherently need to stall in order to enforce order semantics, from ones that do not need to stall. To this end, we add data structures for buffering and managing pending requests of stalled sessions; these requests are moved out of the critical path to these data structures, allowing continued processing of unaffected sessions. 
> Please see the docs:  
> 1) https://goo.gl/m1cINJ - includes a detailed description of the new commit processor algorithm.
> 2) The attached patch implements our solution, and a collection of related unit tests (https://reviews.apache.org/r/25160)
> 3) https://goo.gl/W0xDUP - performance results. 
> See also https://issues.apache.org/jira/browse/ZOOKEEPER-1609



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)