You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pegasus.apache.org by GitBox <gi...@apache.org> on 2022/08/01 09:11:02 UTC

[GitHub] [incubator-pegasus] foreverneverer opened a new issue, #1092: zk c client 3.7 does not run well with zk server 3.4

foreverneverer opened a new issue, #1092:
URL: https://github.com/apache/incubator-pegasus/issues/1092

   ## Bug Report
   
   https://github.com/XiaoMi/rdsn/pull/979 upgrade zk c client to 3.7(from 3.4.10) for kerberos, but it sometime can't access zk server 3.4.5
   
   In my test, when I upgrade the meta to latest pegasus server, the meta error log:
   ```
   E2022-07-27 11:41:16.67 (1658893276067532317 139122)   meta.default0.00001f3100010001: meta_service.cpp:463:start(): initialize server state from remote storage failed, err = ERR_INCONSISTENT_STATE, retry ...
   F2022-07-27 11:41:16.69 (1658893276069053850 139122)   meta.default0.00001f3100010001: server_state.cpp:709:sync_apps_from_remote_storage(): assertion expression: ERR_OK == err
   F2022-07-27 11:41:16.69 (1658893276069061403 139122)   meta.default0.00001f3100010001: server_state.cpp:709:sync_apps_from_remote_storage(): can't handle this error (ERR_TIMEOUT)
   ```
   
   `ERR_INCONSISTENT_STATE` is from zk error `ZRUNTIMEINCONSISTENCY`, see the zk error log:
   ```
   2022-07-28 16:25:15,412:63056(0x7f2f8d699700):ZOO_INFO@log_env@1250: Client environment:zookeeper.version=zookeeper C client 3.7.0
   2022-07-28 16:25:15,412:63056(0x7f2f8d699700):ZOO_INFO@log_env@1254: Client environment:host.name=host
   2022-07-28 16:25:15,412:63056(0x7f2f8d699700):ZOO_INFO@log_env@1261: Client environment:os.name=Linux
   2022-07-28 16:25:15,412:63056(0x7f2f8d699700):ZOO_INFO@log_env@1262: Client environment:os.arch=3.18.6-2.el7.centos.x86_64
   2022-07-28 16:25:15,412:63056(0x7f2f8d699700):ZOO_INFO@log_env@1263: Client environment:os.version=#1 SMP Mon Oct 24 13:01:33 CST 2016
   2022-07-28 16:25:15,412:63056(0x7f2f8d699700):ZOO_INFO@log_env@1271: Client environment:user.name=(null)
   2022-07-28 16:25:15,412:63056(0x7f2f8d699700):ZOO_INFO@log_env@1279: Client environment:user.home=/home/work
   2022-07-28 16:25:15,412:63056(0x7f2f8d699700):ZOO_INFO@log_env@1291: Client environment:user.dir=/home/work/app/pegasus/tjwqtst-msg/meta
   2022-07-28 16:25:15,412:63056(0x7f2f8d699700):ZOO_INFO@zookeeper_init_internal@1344: Initiating client connection, host=host:21000 sessionTimeout=10000 watcher=0x7f2fa97f1fc0 sessionId=0 sessionPasswd=<null> context=0x2082790 flags=0
   2022-07-28 16:25:15,449:63056(0x7f2f7c652700):ZOO_INFO@check_events@2988: initiated connection to server IP:21000
   2022-07-28 16:25:15,467:63056(0x7f2f7c652700):ZOO_INFO@finalize_session_establishment@2868: session establishment complete on server IP:21000, sessionId=0xff82392b354205db, negotiated timeout=10000
   2022-07-28 16:25:47,814:63056(0x7f2f7c652700):ZOO_WARN@zookeeper_interest@2532: Exceeded deadline by 1021ms
   2022-07-28 16:26:13,404:63056(0x7f2f7c652700):ZOO_WARN@zookeeper_interest@2532: Exceeded deadline by 1021ms
   2022-07-28 16:26:29,014:63056(0x7f2f7c652700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known 
   ```
   
   The above bug does not always occur, I only find the bug in one cluster of our company,  The following resolution will be ok:
   - revert zk client version to 3.4.10
   - change zk config from host to ip
   
   ## Related jira:
   https://issues.apache.org/jira/browse/ZOOKEEPER-4603


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] foreverneverer commented on issue #1092: zk c client 3.7 does not run well with zk server 3.4

Posted by GitBox <gi...@apache.org>.
foreverneverer commented on issue #1092:
URL: https://github.com/apache/incubator-pegasus/issues/1092#issuecomment-1209067351

   > > > This issue exists in all zk client versions, this patch just add a new API (i.e. `zoo_set_servers_resolution_delay `, see https://github.com/apache/zookeeper/pull/1068/files#diff-d7c3594993d19853c9d2b312b783be085bd9f02b3f5fe8a573e7c94e0ec22734R725) to set the delay time, I'm not sure if it can resolve the problem you faced, but you can have a try.
   > > 
   > > 
   > > OK, how add .patch file? I can try it
   > 
   > zk c client 3.7.0 already have this patch . I add config on https://github.com/apache/incubator-pegasus/pull/1100/files, try it
   
   Great!
   
   I have test it and it doesn't work well. the log:
   
   Pegasus:
   ```
   E2022-08-09 16:16:14.503 (1660032974503742256 148273)   meta.THREAD_POOL_META_SERVER0.020200020000034e: server_state.cpp:644:operator()(): get partition node failed, reason(ERR_TIMEOUT)
   D2022-08-09 16:16:14.503 (1660032974503750762 148294)   meta.io-thrd.148294: zookeeper_session.cpp:323:global_watcher(): global watcher, type(session event), state(expired_session_state)
   E2022-08-09 16:16:14.503 (1660032974503777787 148294)   meta.io-thrd.148294: distributed_lock_service_zookeeper.cpp:282:on_zoo_session_evt(): get zoo state: expired_session_state, which means the session is expired
   E2022-08-09 16:16:14.503 (1660032974503805295 148274)   meta.THREAD_POOL_META_SERVER1.0202000000000315: server_state.cpp:644:operator()(): get partition node failed, reason(ERR_TIMEOUT)
   E2022-08-09 16:16:14.503 (1660032974503820448 148274)   meta.THREAD_POOL_META_SERVER1.0202000200000350: server_state.cpp:644:operator()(): get partition node failed, reason(ERR_TIMEOUT)
   E2022-08-09 16:16:14.503 (1660032974503828983 148274)   meta.THREAD_POOL_META_SERVER1.0202000300000337: server_state.cpp:644:operator()(): get partition node failed, reason(ERR_TIMEOUT)
   E2022-08-09 16:16:14.503 (1660032974503836417 148274)   meta.THREAD_POOL_META_SERVER1.0202000300000338: server_state.cpp:644:operator()(): get partition node failed, reason(ERR_TIMEOUT)
   E2022-08-09 16:16:14.503 (1660032974503849575 148289)   meta.     fd0.0201000000000005: meta_server_failure_detector.cpp:162:operator()(): leader lock expired callback: err(ERR_EXPIRED), owner(ip:port), version(174)
   E2022-08-09 16:16:14.503 (1660032974503877813 148274)   meta.THREAD_POOL_META_SERVER1.0202000300000339: server_state.cpp:644:operator()(): get partition node failed, reason(ERR_TIMEOUT)
   
   ```
   
   Zookeeper:
   ```
   2022-08-09 16:15:05,844:148195(0x7faadc489700):ZOO_INFO@log_env@1250: Client environment:zookeeper.version=zookeeper C client 3.7.0
   2022-08-09 16:15:05,844:148195(0x7faadc489700):ZOO_INFO@log_env@1254: Client environment:host.name=tj1-hadoop-pegasus-tst-meta03.kscn
   2022-08-09 16:15:05,844:148195(0x7faadc489700):ZOO_INFO@log_env@1261: Client environment:os.name=Linux
   2022-08-09 16:15:05,844:148195(0x7faadc489700):ZOO_INFO@log_env@1262: Client environment:os.arch=3.18.6-2.el7.centos.x86_64
   2022-08-09 16:15:05,844:148195(0x7faadc489700):ZOO_INFO@log_env@1263: Client environment:os.version=#1 SMP Mon Oct 24 13:01:33 CST 2016
   2022-08-09 16:15:05,845:148195(0x7faadc489700):ZOO_INFO@log_env@1271: Client environment:user.name=(null)
   2022-08-09 16:15:05,845:148195(0x7faadc489700):ZOO_INFO@log_env@1279: Client environment:user.home=/home/work
   2022-08-09 16:15:05,845:148195(0x7faadc489700):ZOO_INFO@log_env@1291: Client environment:user.dir=/home/work/app/pegasus/tjwqtst-msg/meta
   2022-08-09 16:15:05,845:148195(0x7faadc489700):ZOO_INFO@zookeeper_init_internal@1344: Initiating client connection, host=host:port sessionTimeout=10000 watcher=0x7faaf85e4460 sessionId=0 sessionPasswd=<null> context=0x3ef8790 flags=0
   2022-08-09 16:15:05,852:148195(0x7faacb442700):ZOO_INFO@check_events@2988: initiated connection to server ip:port
   2022-08-09 16:15:05,858:148195(0x7faacb442700):ZOO_INFO@finalize_session_establishment@2868: session establishment complete on server 10.38.161.55:21000, sessionId=0xff82392b30fa55e5, negotiated timeout=10000
   2022-08-09 16:15:44,869:148195(0x7faacb442700):ZOO_WARN@zookeeper_interest@2532: Exceeded deadline by 749ms
   2022-08-09 16:15:48,962:148195(0x7faacb442700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2022-08-09 16:15:48,962:148195(0x7faacb442700):ZOO_WARN@zookeeper_interest@2532: Exceeded deadline by 684ms
   2022-08-09 16:16:14,488:148195(0x7faacb442700):ZOO_ERROR@handle_socket_error_msg@3007: Socket 10.38.161.55:21000 zk retcode=-4, errno=112(Host is down): failed while receiving a server response
   2022-08-09 16:16:14,496:148195(0x7faacb442700):ZOO_INFO@check_events@2988: initiated connection to server ip:port
   2022-08-09 16:16:14,501:148195(0x7faacb442700):ZOO_ERROR@handle_socket_error_msg@3043: Socket 10.38.161.60:21000 zk retcode=-112, errno=116(Stale file handle): sessionId=0xff82392b30fa55e5 has expired.
   2022-08-09 16:16:15,594:149127(0x7ff5a74d4700):ZOO_INFO@log_env@1250: Client environment:zookeeper.version=zookeeper C client 3.7.0
   2022-08-09 16:16:15,594:149127(0x7ff5a74d4700):ZOO_INFO@log_env@1254: Client environment:host.name=tj1-hadoop-pegasus-tst-meta03.kscn
   2022-08-09 16:16:15,594:149127(0x7ff5a74d4700):ZOO_INFO@log_env@1261: Client environment:os.name=Linux
   2022-08-09 16:16:15,594:149127(0x7ff5a74d4700):ZOO_INFO@log_env@1262: Client environment:os.arch=3.18.6-2.el7.centos.x86_64
   2022-08-09 16:16:15,594:149127(0x7ff5a74d4700):ZOO_INFO@log_env@1263: Client environment:os.version=#1 SMP Mon Oct 24 13:01:33 CST 2016
   2022-08-09 16:16:15,594:149127(0x7ff5a74d4700):ZOO_INFO@log_env@1271: Client environment:user.name=(null)
   2022-08-09 16:16:15,594:149127(0x7ff5a74d4700):ZOO_INFO@log_env@1279: Client environment:user.home=/home/work
   2022-08-09 16:16:15,594:149127(0x7ff5a74d4700):ZOO_INFO@log_env@1291: Client environment:user.dir=/home/work/app/pegasus/tjwqtst-msg/meta
   2022-08-09 16:16:15,594:149127(0x7ff5a74d4700):ZOO_INFO@zookeeper_init_internal@1344: Initiating client connection, host=tjwqtst.zk.hadoop.srv:21000 sessionTimeout=10000 watcher=0x7ff5c362f460 sessionId=0 sessionPasswd=<null> context=0x215a790 flags=0
   2022-08-09 16:16:15,603:149127(0x7ff59648d700):ZOO_INFO@check_events@2988: initiated connection to server ip:port
   2022-08-09 16:16:15,609:149127(0x7ff59648d700):ZOO_INFO@finalize_session_establishment@2868: session establishment complete on server ip:port, sessionId=0xff82392b3542553b, negotiated timeout=10000
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] GehaFearless commented on issue #1092: zk c client 3.7 does not run well with zk server 3.4

Posted by GitBox <gi...@apache.org>.
GehaFearless commented on issue #1092:
URL: https://github.com/apache/incubator-pegasus/issues/1092#issuecomment-1204879050

   > > This issue exists in all zk client versions, this patch just add a new API (i.e. `zoo_set_servers_resolution_delay `, see https://github.com/apache/zookeeper/pull/1068/files#diff-d7c3594993d19853c9d2b312b783be085bd9f02b3f5fe8a573e7c94e0ec22734R725) to set the delay time, I'm not sure if it can resolve the problem you faced, but you can have a try.
   > 
   > OK, how add .patch file? I can try it
   
   zk c client 3.7.0 already have this patch . I add config on https://github.com/apache/incubator-pegasus/pull/1100/files, try it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] acelyc111 commented on issue #1092: zk c client 3.7 does not run well with zk server 3.4

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1092:
URL: https://github.com/apache/incubator-pegasus/issues/1092#issuecomment-1201061693

   @foreverneverer have you the zookeeper server side log?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] foreverneverer commented on issue #1092: zk c client 3.7 does not run well with zk server 3.4

Posted by GitBox <gi...@apache.org>.
foreverneverer commented on issue #1092:
URL: https://github.com/apache/incubator-pegasus/issues/1092#issuecomment-1201960203

   > This issuse seems related to this problem, https://issues.apache.org/jira/browse/ZOOKEEPER-1998.
   > We can try to set a delay_ms by API `zoo_set_servers_resolution_delay`, or add a .patch file for zk thirdparty lib to revert this commit.
   
   The issue seem say `getaddrinfo` is unconditionally in old version, and fixed in 3.7, that's also mean, it has existed in old version but not added in new version?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] foreverneverer commented on issue #1092: zk c client 3.7 does not run well with zk server 3.4

Posted by GitBox <gi...@apache.org>.
foreverneverer commented on issue #1092:
URL: https://github.com/apache/incubator-pegasus/issues/1092#issuecomment-1201954814

   > meta server log and zk client log are not at the same time, I'm not sure if they are related. Could you paste the 3 parties logs and make sure they are at the same time?
   
   The log was copied from different time, but the log detail is same, I don't re-produce the log


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] foreverneverer commented on issue #1092: zk c client 3.7 does not run well with zk server 3.4

Posted by GitBox <gi...@apache.org>.
foreverneverer commented on issue #1092:
URL: https://github.com/apache/incubator-pegasus/issues/1092#issuecomment-1201954979

   > @foreverneverer have you checked the zookeeper server side log?
   
   added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] foreverneverer commented on issue #1092: zk c client 3.7 does not run well with zk server 3.4

Posted by GitBox <gi...@apache.org>.
foreverneverer commented on issue #1092:
URL: https://github.com/apache/incubator-pegasus/issues/1092#issuecomment-1204725105

   > This issue exists in all zk client versions, this patch just add a new API (i.e. `zoo_set_servers_resolution_delay `, see https://github.com/apache/zookeeper/pull/1068/files#diff-d7c3594993d19853c9d2b312b783be085bd9f02b3f5fe8a573e7c94e0ec22734R725) to set the delay time, I'm not sure if it can resolve the problem you faced, but you can have a try.
   
   OK, how add .patch file? I can try it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] GehaFearless commented on issue #1092: zk c client 3.7 does not run well with zk server 3.4

Posted by GitBox <gi...@apache.org>.
GehaFearless commented on issue #1092:
URL: https://github.com/apache/incubator-pegasus/issues/1092#issuecomment-1204876204

   > 
   https://github.com/apache/incubator-pegasus/pull/1100/files
   try it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] acelyc111 commented on issue #1092: zk c client 3.7 does not run well with zk server 3.4

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1092:
URL: https://github.com/apache/incubator-pegasus/issues/1092#issuecomment-1201133283

   This issuse seems related to this problem, https://issues.apache.org/jira/browse/ZOOKEEPER-1998.
   We can try to set a delay_ms by API `zoo_set_servers_resolution_delay`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] acelyc111 commented on issue #1092: zk c client 3.7 does not run well with zk server 3.4

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1092:
URL: https://github.com/apache/incubator-pegasus/issues/1092#issuecomment-1201064547

   meta server log and zk client log are not at the same time, I'm not sure if they are related. Could you paste the 3 parties logs and make sure they are at the same time?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] acelyc111 commented on issue #1092: zk c client 3.7 does not run well with zk server 3.4

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1092:
URL: https://github.com/apache/incubator-pegasus/issues/1092#issuecomment-1202082593

   > > This issuse seems related to this problem, https://issues.apache.org/jira/browse/ZOOKEEPER-1998.
   > > We can try to set a delay_ms by API `zoo_set_servers_resolution_delay`, or add a .patch file for zk thirdparty lib to revert this commit.
   > 
   > The issue seem say `getaddrinfo` is unconditionally in old version, and fixed in 3.7, that's also mean, it has existed in old version but not added in new version?
   
   This issue exists in all zk client versions, this patch just add a new API (i.e. `zoo_set_servers_resolution_delay `, see https://github.com/apache/zookeeper/pull/1068/files#diff-d7c3594993d19853c9d2b312b783be085bd9f02b3f5fe8a573e7c94e0ec22734R725) to set the delay time, I'm not sure if it can resolve the problem you faced, but you can have a try.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org