You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Mate Szalay-Beko (Jira)" <ji...@apache.org> on 2022/04/05 06:29:00 UTC

[jira] [Commented] (ZOOKEEPER-3652) Improper synchronization in ClientCnxn

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517245#comment-17517245 ] 

Mate Szalay-Beko commented on ZOOKEEPER-3652:
---------------------------------------------

Great catch, thanks for the fix!

Is there a reason why not to backport this to branch-3.5? (it is an important bugfix and c.5 is not EoL yet)

> Improper synchronization in ClientCnxn
> --------------------------------------
>
>                 Key: ZOOKEEPER-3652
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3652
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.5.6
>            Reporter: Sylvain Wallez
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ZOOKEEPER-2111 introduced {{synchronized(state)}} statements in {{ClientCnxn}} and {{ClientCnxn.SendThread}} to coordinate insertion in {{outgoingQueue}} and draining it when the client connection isn't alive.
> There are several issues with this approach:
>  - the value of the {{state}} field is not stable, meaning we don't always synchronize on the same object.
>  - the {{state}} field is an enum value, which are global objects. So in an application with several ZooKeeper clients connected to different servers, this causes some contention between clients.
> An easy fix is change those {{synchronized(state)}} statements to {{synchronized(outgoingQueue)}} since it is local to each client and is what we want to coordinate.
> I'll be happy to prepare a PR with the above change if this is deemed to be the correct way to fix it.
>  
> Another issue that makes contention worse is {{ClientCnxnSocketNIO.cleanup()}} that is called from within the above synchronized block and contains {{Thread.sleep(100)}}. Why is this sleep statement needed, and can we remove it?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)