You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/10/19 20:25:45 UTC

[GitHub] [pulsar-helm-chart] michaeljmarshall opened a new issue, #311: Flaky TLS Tests

michaeljmarshall opened a new issue, #311:
URL: https://github.com/apache/pulsar-helm-chart/issues/311

   **Describe the bug**
   `Pulsar Helm Chart (ZK & BK TLS Only) / lint-test (pull_request)` and `Precommit - Pulsar Helm Chart (ZK TLS Only) / lint-test (pull_request)` fail frequently. The errors are often different.
   
   **Examples**
   https://github.com/apache/pulsar-helm-chart/actions/runs/3280376295/attempts/3
   
   In that example, I can see this error in the GitHub workflow logs:
   ```
   2022-10-19T16:35:47.4105412Z pulsar               12m         Warning   Unhealthy                 pod/pulsar-ci-zookeeper-0                                              Liveness probe failed: 139980898686272:error:0200206F:system library:connect:Connection refused:../crypto/bio/b_sock2.c:110:
   2022-10-19T16:35:47.4106017Z 139980898686272:error:2008A067:BIO routines:BIO_connect:connect error:../crypto/bio/b_sock2.c:111:
   2022-10-19T16:35:47.4106458Z 139980898686272:error:0200206F:system library:connect:Connection refused:../crypto/bio/b_sock2.c:110:
   2022-10-19T16:35:47.4106891Z 139980898686272:error:2008A067:BIO routines:BIO_connect:connect error:../crypto/bio/b_sock2.c:111:
   2022-10-19T16:35:47.4107202Z connect:errno=111
   ```
   
   The error indicates the server isn't listening on the port. However, when I look at the exported logs, I can see the following:
   
   ```
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T16:23:16,817+0000 [main] INFO  org.apache.zookeeper.server.NettyServerCnxnFactory - binding to port 0.0.0.0/0.0.0.0:2281
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T16:23:16,823+0000 [main] INFO  org.apache.zookeeper.server.NettyServerCnxnFactory - bound to port 2281
   ```
   
   It seems like something might be wrong with the environment because sometimes other pods fail for other reasons.
   
   **Additional context**
   I'll continue to add examples to this issue as I find them.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-helm-chart] michaeljmarshall commented on issue #311: Flaky TLS Tests

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #311:
URL: https://github.com/apache/pulsar-helm-chart/issues/311#issuecomment-1284568134

   Here is another failure:
   https://github.com/apache/pulsar-helm-chart/actions/runs/3280376295/jobs/5410958061
   
   ```
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] Exception in thread "main" com.google.common.util.concurrent.UncheckedExecutionException: org.apache.bookkeeper.bookie.BookieException$MetadataStoreException: Failed to get cluster instance id
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	at org.apache.bookkeeper.tools.cli.commands.bookies.InstanceIdCommand.apply(InstanceIdCommand.java:61)
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	at org.apache.bookkeeper.bookie.BookieShell$WhatIsInstanceId.runCmd(BookieShell.java:1495)
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	at org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:238)
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:2278)
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:2369)
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] Caused by: java.util.concurrent.ExecutionException: org.apache.bookkeeper.bookie.BookieException$MetadataStoreException: Failed to get cluster instance id
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithMetadataBookieDriver(MetadataDrivers.java:355)
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithRegistrationManager(MetadataDrivers.java:375)
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	at org.apache.bookkeeper.tools.cli.commands.bookies.InstanceIdCommand.apply(InstanceIdCommand.java:49)
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	... 4 more
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] Caused by: org.apache.bookkeeper.bookie.BookieException$MetadataStoreException: Failed to get cluster instance id
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	at org.apache.bookkeeper.discover.ZKRegistrationManager.getClusterInstanceId(ZKRegistrationManager.java:429)
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	at org.apache.bookkeeper.tools.cli.commands.bookies.InstanceIdCommand.lambda$apply$0(InstanceIdCommand.java:52)
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	at org.apache.bookkeeper.meta.MetadataDrivers.lambda$runFunctionWithRegistrationManager$1(MetadataDrivers.java:375)
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithMetadataBookieDriver(MetadataDrivers.java:350)
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	... 6 more
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for BookKeeper metadata
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	at org.apache.bookkeeper.discover.ZKRegistrationManager.getClusterInstanceId(ZKRegistrationManager.java:419)
   [pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] 	... 9 more
   ```
   
   This failure is surprising given that the zk pod is described as `pod/pulsar-ci-zookeeper-0         1/1     Running    0          26m`. The ZK logs indicate a number expired sessions. Here is a copy of some o the logs:
   
   ```
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:08,204+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701bb, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:09,976+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:14,202+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701bc, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:18,205+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701be, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:19,273+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:20,851+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:22,467+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:23,782+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:26,203+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701c1, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:26,204+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701c2, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:28,953+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:28,954+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:44528
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:32,326+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:32,342+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:41404
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:34,427+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:37,350+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:37,502+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:38,202+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701c7, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:38,202+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701c6, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:38,203+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:46,204+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701c9, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:49,930+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:50,871+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:53,015+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:53,699+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:54,202+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701cd, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:54,202+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701cc, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:59,022+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:59,027+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:57552
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:57:01,805+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:57:02,216+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:57:02,237+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:52924
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:57:04,027+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:57:04,216+0000 [epollEventLoopGroup-7-2] ERROR org.apache.zookeeper.server.NettyServerCnxnFactory - Unsuccessful handshake with session 0x0
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:57:04,568+0000 [epollEventLoopGroup-7-1] ERROR org.apache.zookeeper.server.NettyServerCnxnFactory - Unsuccessful handshake with session 0x0
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Flaky TLS Tests [pulsar-helm-chart]

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari closed issue #311: Flaky TLS Tests
URL: https://github.com/apache/pulsar-helm-chart/issues/311


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-helm-chart] michaeljmarshall commented on issue #311: Flaky TLS Tests

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #311:
URL: https://github.com/apache/pulsar-helm-chart/issues/311#issuecomment-1284619524

   Just discovered that one of the errors in the bookie-init job are expected:
   
   ```shell
               bin/apply-config-from-env.py conf/bookkeeper.conf; /pulsar/keytool/keytool.sh toolset ${HOSTNAME}.pulsar-ci-toolset.pulsar.svc.cluster.local true; if bin/bookkeeper shell whatisinstanceid; then
                   echo "bookkeeper cluster already initialized";
               else
                   bin/bookkeeper shell initnewcluster;
               fi
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Flaky TLS Tests [pulsar-helm-chart]

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on issue #311:
URL: https://github.com/apache/pulsar-helm-chart/issues/311#issuecomment-1912397064

   Resolved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org