You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Mohammad Arshad (Jira)" <ji...@apache.org> on 2021/06/09 12:33:00 UTC
[jira] [Assigned] (ZOOKEEPER-4275) Slowness in sasl login or
subject.doAs() causes zk client to falsely assume that the server did not
respond, closes connection and goes to unnecessary retries
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mohammad Arshad reassigned ZOOKEEPER-4275:
------------------------------------------
Assignee: Ravi Kishore Valeti
> Slowness in sasl login or subject.doAs() causes zk client to falsely assume that the server did not respond, closes connection and goes to unnecessary retries
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-4275
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4275
> Project: ZooKeeper
> Issue Type: Bug
> Components: java client
> Affects Versions: 3.5.9
> Reporter: Ravi Kishore Valeti
> Assignee: Ravi Kishore Valeti
> Priority: Minor
> Labels: pull-request-available
> Fix For: 3.5.10, 3.8.0, 3.7.1, 3.6.4
>
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> Zookeeper client does sasl auth (login and subject.doAs())as a preset before attempting a connection to server.
> If there is a delay in sasl auth (possibly due to slow Kerberos communication), ZK client falsely assumes that the zk server did not respond and runs in to unnecessary multiple retries.
> Client configuration:
> "zookeeper.session.timeout" = "3000"
> "zookeeper.recovery.retry" = "1"
> "zookeeper.recovery.retry.intervalmill" = "500"
> This configuration translates to
> connect timeout as 1000ms
> Read Timeout as 2000ms
> Example: There was a 3 second delay in logging in the user as seen from the logs below. The connection attempt was made later. However, zk client did not wait for server response but logged a timeout (3 seconds > 1 sec connect timeout), closed the connection and went to retries. Since there was a consistent delay at Kerberos master, we had seen this retries go as long as 10 mins causing requests to timeout/fail.
> Logs:
> 3/23/21 4:15:*32.389* AM jute.maxbuffer value is xxxxx Bytes
> 3/23/21 4:15:*35.395* AM Client successfully logged in.
> 3/23/21 4:15:35.396 AM TGT refresh sleeping until: Wed Mar 24 00:34:31 GMT 2021
> 3/23/21 4:15:35.396 AM TGT refresh thread started.
> 3/23/21 4:15:35.396 AM Client will use GSSAPI as SASL mechanism.
> 3/23/21 4:15:35.396 AM TGT expires: xxx Mar xx 04:15:35 GMT 2021
> 3/23/21 4:15:35.396 AM TGT valid starting at: xxx Mar xx 04:15:35 GMT 2021
> 3/23/21 4:15:*35.397* AM *Opening socket connection* to server xxxxx:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
> 3/23/21 4:15:*35.397* AM *Client session timed out, have not heard from server in* *3008ms* for sessionid 0x0
> 3/23/21 4:15:35.397 AM Client session timed out, have not heard from server in 3008ms for sessionid 0x0, closing socket connection and attempting reconnect
> 3/23/21 4:15:35.498 AM TGT renewal thread has been interrupted and will exit.
> 3/23/21 4:15:38.503 AM Client successfully logged in.
> 3/23/21 4:15:38.503 AM TGT expires: xxx Mar xx 04:15:38 GMT 2021
> 3/23/21 4:15:38.503 AM Client will use GSSAPI as SASL mechanism.
> 3/23/21 4:15:38.503 AM TGT valid starting at: xxx Mar xx 04:15:38 GMT 2021
> 3/23/21 4:15:38.503 AM TGT refresh thread started.
> 3/23/21 4:15:38.503 AM TGT refresh sleeping until: Wed Mar 24 00:10:10 GMT 2021
> 3/23/21 4:15:38.506 AM Opening socket connection to server xxxxx:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
> 3/23/21 4:15:38.506 AM Client session timed out, have not heard from server in 3009ms for sessionid 0x0, closing socket connection and attempting reconnect
> 3/23/21 4:15:38.506 AM Client session timed out, have not heard from server in 3009ms for sessionid 0x0
> 3/23/21 4:15:38.606 AM TGT renewal thread has been interrupted and will exit.
> 3/23/21 4:15:41.610 AM Client successfully logged in.
> 3/23/21 4:15:41.611 AM TGT refresh sleeping until: xxx Mar xx 23:42:03 GMT 2021
> 3/23/21 4:15:41.611 AM Client will use GSSAPI as SASL mechanism.
> 3/23/21 4:15:41.611 AM TGT valid starting at: xxx Mar xx 04:15:41 GMT 2021
> 3/23/21 4:15:41.611 AM TGT expires: xxx Mar xx 04:15:41 GMT 2021
> 3/23/21 4:15:41.611 AM TGT refresh thread started.
> 3/23/21 4:15:41.612 AM Opening socket connection to server xxxxx:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
> 3/23/21 4:15:41.613 AM Client session timed out, have not heard from server in 3006ms for sessionid 0x0
> 3/23/21 4:15:41.613 AM Client session timed out, have not heard from server in 3006ms for sessionid 0x0, closing socket connection and attempting reconnect
--
This message was sent by Atlassian Jira
(v8.3.4#803005)