You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/10/19 20:25:45 UTC
[GitHub] [pulsar-helm-chart] michaeljmarshall opened a new issue, #311: Flaky TLS Tests
michaeljmarshall opened a new issue, #311:
URL: https://github.com/apache/pulsar-helm-chart/issues/311
**Describe the bug**
`Pulsar Helm Chart (ZK & BK TLS Only) / lint-test (pull_request)` and `Precommit - Pulsar Helm Chart (ZK TLS Only) / lint-test (pull_request)` fail frequently. The errors are often different.
**Examples**
https://github.com/apache/pulsar-helm-chart/actions/runs/3280376295/attempts/3
In that example, I can see this error in the GitHub workflow logs:
```
2022-10-19T16:35:47.4105412Z pulsar 12m Warning Unhealthy pod/pulsar-ci-zookeeper-0 Liveness probe failed: 139980898686272:error:0200206F:system library:connect:Connection refused:../crypto/bio/b_sock2.c:110:
2022-10-19T16:35:47.4106017Z 139980898686272:error:2008A067:BIO routines:BIO_connect:connect error:../crypto/bio/b_sock2.c:111:
2022-10-19T16:35:47.4106458Z 139980898686272:error:0200206F:system library:connect:Connection refused:../crypto/bio/b_sock2.c:110:
2022-10-19T16:35:47.4106891Z 139980898686272:error:2008A067:BIO routines:BIO_connect:connect error:../crypto/bio/b_sock2.c:111:
2022-10-19T16:35:47.4107202Z connect:errno=111
```
The error indicates the server isn't listening on the port. However, when I look at the exported logs, I can see the following:
```
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T16:23:16,817+0000 [main] INFO org.apache.zookeeper.server.NettyServerCnxnFactory - binding to port 0.0.0.0/0.0.0.0:2281
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T16:23:16,823+0000 [main] INFO org.apache.zookeeper.server.NettyServerCnxnFactory - bound to port 2281
```
It seems like something might be wrong with the environment because sometimes other pods fail for other reasons.
**Additional context**
I'll continue to add examples to this issue as I find them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar-helm-chart] michaeljmarshall commented on issue #311: Flaky TLS Tests
Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #311:
URL: https://github.com/apache/pulsar-helm-chart/issues/311#issuecomment-1284568134
Here is another failure:
https://github.com/apache/pulsar-helm-chart/actions/runs/3280376295/jobs/5410958061
```
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] Exception in thread "main" com.google.common.util.concurrent.UncheckedExecutionException: org.apache.bookkeeper.bookie.BookieException$MetadataStoreException: Failed to get cluster instance id
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] at org.apache.bookkeeper.tools.cli.commands.bookies.InstanceIdCommand.apply(InstanceIdCommand.java:61)
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] at org.apache.bookkeeper.bookie.BookieShell$WhatIsInstanceId.runCmd(BookieShell.java:1495)
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] at org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:238)
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:2278)
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:2369)
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] Caused by: java.util.concurrent.ExecutionException: org.apache.bookkeeper.bookie.BookieException$MetadataStoreException: Failed to get cluster instance id
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithMetadataBookieDriver(MetadataDrivers.java:355)
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithRegistrationManager(MetadataDrivers.java:375)
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] at org.apache.bookkeeper.tools.cli.commands.bookies.InstanceIdCommand.apply(InstanceIdCommand.java:49)
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] ... 4 more
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] Caused by: org.apache.bookkeeper.bookie.BookieException$MetadataStoreException: Failed to get cluster instance id
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] at org.apache.bookkeeper.discover.ZKRegistrationManager.getClusterInstanceId(ZKRegistrationManager.java:429)
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] at org.apache.bookkeeper.tools.cli.commands.bookies.InstanceIdCommand.lambda$apply$0(InstanceIdCommand.java:52)
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] at org.apache.bookkeeper.meta.MetadataDrivers.lambda$runFunctionWithRegistrationManager$1(MetadataDrivers.java:375)
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithMetadataBookieDriver(MetadataDrivers.java:350)
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] ... 6 more
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for BookKeeper metadata
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] at org.apache.bookkeeper.discover.ZKRegistrationManager.getClusterInstanceId(ZKRegistrationManager.java:419)
[pod/pulsar-ci-bookie-1/pulsar-bookkeeper-verify-clusterid] ... 9 more
```
This failure is surprising given that the zk pod is described as `pod/pulsar-ci-zookeeper-0 1/1 Running 0 26m`. The ZK logs indicate a number expired sessions. Here is a copy of some o the logs:
```
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:08,204+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701bb, timeout of 30000ms exceeded
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:09,976+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:14,202+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701bc, timeout of 30000ms exceeded
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:18,205+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701be, timeout of 30000ms exceeded
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:19,273+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:20,851+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:22,467+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:23,782+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:26,203+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701c1, timeout of 30000ms exceeded
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:26,204+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701c2, timeout of 30000ms exceeded
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:28,953+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:28,954+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:44528
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:32,326+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:32,342+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:41404
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:34,427+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:37,350+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:37,502+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:38,202+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701c7, timeout of 30000ms exceeded
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:38,202+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701c6, timeout of 30000ms exceeded
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:38,203+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:46,204+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701c9, timeout of 30000ms exceeded
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:49,930+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:50,871+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:53,015+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:53,699+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:54,202+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701cd, timeout of 30000ms exceeded
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:54,202+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x1000005841701cc, timeout of 30000ms exceeded
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:59,022+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:56:59,027+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:57552
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:57:01,805+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:57:02,216+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:57:02,237+0000 [epollEventLoopGroup-7-2] INFO org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:52924
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:57:04,027+0000 [epollEventLoopGroup-7-1] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:57:04,216+0000 [epollEventLoopGroup-7-2] ERROR org.apache.zookeeper.server.NettyServerCnxnFactory - Unsuccessful handshake with session 0x0
[pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-10-19T09:57:04,568+0000 [epollEventLoopGroup-7-1] ERROR org.apache.zookeeper.server.NettyServerCnxnFactory - Unsuccessful handshake with session 0x0
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Flaky TLS Tests [pulsar-helm-chart]
Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari closed issue #311: Flaky TLS Tests
URL: https://github.com/apache/pulsar-helm-chart/issues/311
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar-helm-chart] michaeljmarshall commented on issue #311: Flaky TLS Tests
Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #311:
URL: https://github.com/apache/pulsar-helm-chart/issues/311#issuecomment-1284619524
Just discovered that one of the errors in the bookie-init job are expected:
```shell
bin/apply-config-from-env.py conf/bookkeeper.conf; /pulsar/keytool/keytool.sh toolset ${HOSTNAME}.pulsar-ci-toolset.pulsar.svc.cluster.local true; if bin/bookkeeper shell whatisinstanceid; then
echo "bookkeeper cluster already initialized";
else
bin/bookkeeper shell initnewcluster;
fi
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Flaky TLS Tests [pulsar-helm-chart]
Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on issue #311:
URL: https://github.com/apache/pulsar-helm-chart/issues/311#issuecomment-1912397064
Resolved.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org