You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/05/14 04:44:21 UTC

[GitHub] [pulsar-helm-chart] michaeljmarshall opened a new pull request, #266: Add bk, zk securityContext to support upgrade to non-root docker image

michaeljmarshall opened a new pull request, #266:
URL: https://github.com/apache/pulsar-helm-chart/pull/266

   Master Issue: https://github.com/apache/pulsar/issues/11269
   
   ### Motivation
   
   Apache Pulsar's docker images for 2.10.0 and above are non-root by default. In order to ensure there is a safe upgrade path, we need to expose the `securityContext` for the Bookkeeper and Zookeeper StatefulSets. Here is the relevant k8s documentation on this k8s feature: https://kubernetes.io/docs/tasks/configure-pod-container/security-context.
   
   Once released, all deployments using the default `values.yaml` configuration for the `securityContext` will pay a one time penalty on upgrade where the kubelet will recursively chown files to be root group writable. It's possible to temporarily avoid this penalty by setting `securityContext: {}`.
   
   ### Modifications
   
   * Add config blocks for the `bookkeeper.securityContext` and `zookeeper.securityContext`.
   * Default to `fsGroup: 0`. This is already the default group id in the docker image, and the docker image assumes the user has root group permission.
   * Default to `fsGroupChangePolicy: "OnRootMismatch"`. This configuration will work for all deployments where the user id is stable. If the user id switches between restarts, like it does in OpenShift, please set to `Always`.
   * Remove gc configuration writing to directory that the user lacks permission. (Perhaps we want to write to `/pulsar/log/bookie-gc.log`?) 
   * Add documentation to the README.
   
   ### Verifying this change
   
   I first attempted verification of this change with minikube. It did not work because minikube uses hostPath volumes by default. I then tested on EKS v1.21.9-eks-0d102a7. I tested by deploying the current, latest version of the helm chart (2.9.3) and then upgrading to this PR's version of the helm chart along with using the 2.10.0 docker image. I also tested upgrading from a default version 
   
   Test 1 is a plain upgrade using the default 2.9.3 version of the chart, then upgrading to this PR's version of the chart with the modification to use the 2.10.0 docker images. It worked as expected.
   
   ```bash
   $ helm install test apache/pulsar
   $ # Wait for chart to deploy, then run the following, which uses Pulsar version 2.10.0:
   $  helm upgrade test -f charts/pulsar/values.yaml charts/pulsar/
   ```
   
   Test 2 is a plain upgrade using the default 2.9.3 version of the chart, then an upgrade to this PR's version of the chart, then an upgrade to this PR's version of the chart using 2.10.0 docker images. There is a minor error described in the `README.md`. The solution is to chown the bookie's data directory.
   
   ```bash
   $ helm install test apache/pulsar
   $ # Wait for chart to deploy, then run the following, which uses Pulsar version 2.9.2:
   $  helm upgrade test -f charts/pulsar/values.yaml charts/pulsar/
   $ # Upgrade using Pulsar version 2.10.0
   $  helm upgrade test -f charts/pulsar/values.yaml charts/pulsar/
   ```
   
   ### GC Logging
   
   In my testing, I ran into the following errors when using `-Xlog:gc:/var/log/bookie-gc.log`:
   
   ```
   pulsar-bookkeeper-verify-clusterid [0.008s] Error opening log file '/var/log/bookie-gc.log': Permission denied
   pulsar-bookkeeper-verify-clusterid [0.008s] Initialization of output 'file=/var/log/bookie-gc.log' using options '(null)' failed.
   pulsar-bookkeeper-verify-clusterid [0.005s] Error opening log file '/var/log/bookie-gc.log': Permission denied
   pulsar-bookkeeper-verify-clusterid [0.006s] Initialization of output 'file=/var/log/bookie-gc.log' using options '(null)' failed.
   pulsar-bookkeeper-verify-clusterid Invalid -Xlog option '-Xlog:gc:/var/log/bookie-gc.log', see error log for details.
   pulsar-bookkeeper-verify-clusterid Error: Could not create the Java Virtual Machine.
   pulsar-bookkeeper-verify-clusterid Error: A fatal exception has occurred. Program will exit.
   pulsar-bookkeeper-verify-clusterid Invalid -Xlog option '-Xlog:gc:/var/log/bookie-gc.log', see error log for details.
   pulsar-bookkeeper-verify-clusterid Error: Could not create the Java Virtual Machine.
   pulsar-bookkeeper-verify-clusterid Error: A fatal exception has occurred. Program will exit.
   ```
   
   I resolved the error by removing the setting.
   
   ### OpenShift Observations
   
   I wanted to seamlessly support OpenShift, so I investigated using configuring the bookkeeper and zookeeper process with `umask 002` so that they would create files and directories that are group writable (OpenShift has a stable group id, but gives the process a random user id). That worked for most tools when switching the user id, but not for RocksDB, which creates a lock file at `/pulsar/data/bookkeeper/ledgers/current/ledgers/LOCK` with the permission `0644` ignoring the umask. Here is the relevant error:
   
   ```
   2022-05-14T03:45:06,903+0000  ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server
   java.io.IOException: Error open RocksDB database
       at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:88) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.lambda$static$0(KeyValueStorageRocksDB.java:62) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       at org.apache.bookkeeper.bookie.storage.ldb.LedgerMetadataIndex.<init>(LedgerMetadataIndex.java:68) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.<init>(SingleDirectoryDbLedgerStorage.java:169) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.newSingleDirectoryDbLedgerStorage(DbLedgerStorage.java:150) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.initialize(DbLedgerStorage.java:129) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       at org.apache.bookkeeper.bookie.Bookie.<init>(Bookie.java:818) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       at org.apache.bookkeeper.proto.BookieServer.newBookie(BookieServer.java:152) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       at org.apache.bookkeeper.proto.BookieServer.<init>(BookieServer.java:120) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       at org.apache.bookkeeper.server.service.BookieService.<init>(BookieService.java:52) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:304) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       at org.apache.bookkeeper.server.Main.doMain(Main.java:226) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       at org.apache.bookkeeper.server.Main.main(Main.java:208) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   Caused by: org.rocksdb.RocksDBException: while open a file for lock: /pulsar/data/bookkeeper/ledgers/current/ledgers/LOCK: Permission denied
       at org.rocksdb.RocksDB.open(Native Method) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
       at org.rocksdb.RocksDB.open(RocksDB.java:239) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
       at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:196) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
       ... 13 more
   ```
   
   As such, in order to support OpenShift, I exposed the `fsGroupChangePolicy`, which allows for OpenShift support, but not necessarily _seamless_ support.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-helm-chart] michaeljmarshall commented on pull request #266: Add bk, zk securityContext to support upgrade to non-root docker image

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on PR #266:
URL: https://github.com/apache/pulsar-helm-chart/pull/266#issuecomment-1143149449

   > Thanks @michaeljmarshall shouldn't the helm chart version change?
   
   @frankjkelly good question. I know our recent release procedure has tied individual PRs with version bumps, but it seems more appropriate to me that we should separate releases and features into separate PRs.
   
   Regarding releases, I think the Pulsar Community needs to revisit Helm Chart release procedures. We're operating in a gray area by performing releases that are not first voted upon on the dev mailing list. By my understanding, all Apache project releases are supposed to have a vote.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-helm-chart] michaeljmarshall merged pull request #266: Add bk, zk securityContext to support upgrade to non-root docker image

Posted by GitBox <gi...@apache.org>.
michaeljmarshall merged PR #266:
URL: https://github.com/apache/pulsar-helm-chart/pull/266


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-helm-chart] michaeljmarshall commented on pull request #266: Add bk, zk securityContext to support upgrade to non-root docker image

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on PR #266:
URL: https://github.com/apache/pulsar-helm-chart/pull/266#issuecomment-1128433988

   After talking with @lhotari, it looks like the issues are the known TLS/ZK 3.6.3 issues. I'll retry the tests a few times to see if they pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-helm-chart] maxsxu commented on pull request #266: Add bk, zk securityContext to support upgrade to non-root docker image

Posted by GitBox <gi...@apache.org>.
maxsxu commented on PR #266:
URL: https://github.com/apache/pulsar-helm-chart/pull/266#issuecomment-1133627153

   @michaeljmarshall Unfortunately, still not work while setting `securityContext: {}`.
   
   The Broker and Proxy keep initializing due to below error:
   
   ```
   WATCHER::
   WatchedEvent state:SyncConnected type:None path:null
   Node does not exist: /admin/clusters/pulsar
   2022-05-21T07:45:10,734+0000 [main] ERROR org.apache.zookeeper.util.ServiceUtils - Exiting JVM with code 1
   pulsar cluster pulsar isn't initialized yet ... check in 3 seconds ...
   ```
   
   Logs from the ZK Pod:
   
   ```
   org.apache.zookeeper.server.ServerCnxn$EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /10.129.5.225:38320, session = 0x2002e4f5ac42ab8
   at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
   at java.lang.Thread.run(Thread.java:829) [?:?]
   2022-05-21T10:05:52,110+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x2002e4f5ac42aa7, timeout of 30000ms exceeded
   2022-05-21T10:05:52,110+0000 [SessionTracker] INFO org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x3002e4f34832aa7, timeout of 30000ms exceeded
   2022-05-21T10:05:52,110+0000 [RequestThrottler] INFO org.apache.zookeeper.server.ZooKeeperServer - Submitting global closeSession request for session 0x2002e4f5ac42aa7
   2022-05-21T10:05:52,110+0000 [RequestThrottler] INFO org.apache.zookeeper.server.ZooKeeperServer - Submitting global closeSession request for session 0x3002e4f34832aa7
   2022-05-21T10:05:52,614+0000 [CommitProcessor:2] INFO org.apache.zookeeper.server.quorum.LeaderSessionTracker - Committing global session 0x2002e4f5ac42ab9
   2022-05-21T10:05:52,956+0000 [NIOWorkerThread-1] WARN org.apache.zookeeper.server.NIOServerCnxn - Unexpected exception
   ```
   
   From my observation, OpenShift will generates a random non-zero `fsGroup` for Pod when unspecified. So the group of PV (`/pulsar/data` directory) will be that random non-zero `fsGroup`, rather the root group.
   
   We can observe the following inside the ZK Pod:
   
   ```
   $ id
   uid=1001060000(1001060000) gid=0(root) groups=0(root),1001060000
   $ ls -al
   total 84
   drwxrwxr-x. 1 root       root          42 May 21 12:10 .
   dr-xr-xr-x. 1 root       root          53 May 21 12:10 ..
   -rw-r--r--. 1 root       root       32333 Jan 22  2020 LICENSE
   -rw-r--r--. 1 root       root        6612 Jan 22  2020 NOTICE
   -rw-r--r--. 1 root       root        1269 Jan 22  2020 README
   drwxr-xr-x. 3 root       root        4096 Mar 26 04:02 bin
   drwxrwxr-x. 1 root       root          28 Jan 22  2020 conf
   drwxr-xr-x. 2 root       root        4096 Mar 26 04:05 connectors
   drwxrwsr-x. 4 root       1001060000  4096 May 21 03:12 data
   drwxr-xr-x. 3 root       root         132 Mar 26 04:02 examples
   drwxr-xr-x. 4 root       root          66 Mar 26 04:02 instances
   drwxr-xr-x. 3 root       root       20480 Mar 26 04:02 lib
   drwxr-xr-x. 2 root       root        4096 Jan 22  2020 licenses
   drwxr-xr-x. 2 1001060000 root          50 May 21 12:10 logs
   drwxr-xr-x. 2 root       root          91 Mar 26 04:05 offloaders
   drwxr-xr-x. 2 root       root          66 Mar 26 04:02 pulsar-client
   ```
   
   So, as for the  _"the container user is always a member of the root group..."_, yes indeed, but not for the PV group.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-helm-chart] michaeljmarshall commented on pull request #266: Add bk, zk securityContext to support upgrade to non-root docker image

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on PR #266:
URL: https://github.com/apache/pulsar-helm-chart/pull/266#issuecomment-1144018376

   @frankjkelly - are you saying that a release is overwritten when we merge a PR without incrementing the version? I assumed it only published the version when the version number changed. If it is overwriting existing helm binaries, that definitely needs to be fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-helm-chart] frankjkelly commented on pull request #266: Add bk, zk securityContext to support upgrade to non-root docker image

Posted by GitBox <gi...@apache.org>.
frankjkelly commented on PR #266:
URL: https://github.com/apache/pulsar-helm-chart/pull/266#issuecomment-1144035157

   @michaeljmarshall I'm not sure if a new helm chart release is made even if the version number has not changed. But either way - even if no release is done - it's a source for confusion. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-helm-chart] michaeljmarshall commented on pull request #266: Add bk, zk securityContext to support upgrade to non-root docker image

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on PR #266:
URL: https://github.com/apache/pulsar-helm-chart/pull/266#issuecomment-1128425243

   The error comes from the `bookie-init` job in the Zookeeper TLS tests. We're seeing similar errors in https://github.com/apache/pulsar-helm-chart/pull/260. It fails running this code:
   
   ```
   bin/apply-config-from-env.py conf/bookkeeper.conf; /pulsar/keytool/keytool.sh toolset ${HOSTNAME}.pulsar-ci-toolset.pulsar.svc.cluster.local true; if bin/bookkeeper shell whatisinstanceid; then
     echo "bookkeeper cluster already initialized";
   else
     bin/bookkeeper shell initnewcluster;
   fi
   ```
   
   Specifically, the logs show that `bin/bookkeeper shell whatisinstanceid` connects to zookeeper, fails to find the instance id, and then `bin/bookkeeper shell initnewcluster` is run and times out while opening a connection to zookeeper. I looked at this a bit tonight, but couldn't find anything relevant. The one oddity in the bookie init logs is the last log line where it logs that it handles a new connection. That time is somewhat close to the time that the zookeeper expires an old connection. It could just be a coincidence.
   
   Bookie init logs:
   
   ```
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:49:40,679+0000 [main] INFO  org.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 1048575 Bytes
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:49:40,919+0000 [main] INFO  org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled=false
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:49:41,161+0000 [main-SendThread(pulsar-ci-zookeeper:2281)] INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server pulsar-ci-zookeeper/10.244.1.17:2281.
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:49:41,161+0000 [main-SendThread(pulsar-ci-zookeeper:2281)] INFO  org.apache.zookeeper.ClientCnxn - SASL config status: Will not attempt to authenticate using SASL (unknown error)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:21,359+0000 [main-SendThread(pulsar-ci-zookeeper:2281)] WARN  org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 40358ms for session id 0x0
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:22,663+0000 [main-SendThread(pulsar-ci-zookeeper:2281)] WARN  org.apache.zookeeper.ClientCnxn - An exception was thrown while closing send thread for session 0x0.
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed out, have not heard from server in 40358ms for session id 0x0
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1258) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:27,458+0000 [main] INFO  org.apache.zookeeper.ClientCnxnSocketNetty - channel is told closing
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:27,677+0000 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x0 closed
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:27,776+0000 [main] ERROR org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase - Failed to create zookeeper client to pulsar-ci-zookeeper:2281
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase.waitForConnection(ZooKeeperWatcherBase.java:159) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.zookeeper.ZooKeeperClient$Builder.build(ZooKeeperClient.java:260) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase.initialize(ZKMetadataDriverBase.java:207) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.meta.zk.ZKMetadataBookieDriver.initialize(ZKMetadataBookieDriver.java:60) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithMetadataBookieDriver(MetadataDrivers.java:345) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithRegistrationManager(MetadataDrivers.java:372) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.client.BookKeeperAdmin.initNewCluster(BookKeeperAdmin.java:1278) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.tools.cli.commands.bookies.InitCommand.apply(InitCommand.java:56) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.bookie.BookieShell$InitNewCluster.runCmd(BookieShell.java:334) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:238) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:2278) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:2369) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:27,432+0000 [globalEventExecutor-1-1] WARN  org.apache.zookeeper.ClientCnxnSocketNetty - future isn't success.
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] io.netty.util.concurrent.DefaultPromise$LeanCancellationException: null
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at io.netty.util.concurrent.DefaultPromise.cancel(...)(Unknown Source) ~[io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:27,722+0000 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x0
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] Exception in thread "main" com.google.common.util.concurrent.UncheckedExecutionException: Failed to create zookeeper client to pulsar-ci-zookeeper:2281
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.tools.cli.commands.bookies.InitCommand.apply(InitCommand.java:58)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.bookie.BookieShell$InitNewCluster.runCmd(BookieShell.java:334)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:238)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:2278)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:2369)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] Caused by: org.apache.bookkeeper.meta.exceptions.MetadataException: Failed to create zookeeper client to pulsar-ci-zookeeper:2281
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase.initialize(ZKMetadataDriverBase.java:227)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.meta.zk.ZKMetadataBookieDriver.initialize(ZKMetadataBookieDriver.java:60)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithMetadataBookieDriver(MetadataDrivers.java:345)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithRegistrationManager(MetadataDrivers.java:372)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.client.BookKeeperAdmin.initNewCluster(BookKeeperAdmin.java:1278)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.tools.cli.commands.bookies.InitCommand.apply(InitCommand.java:56)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	... 4 more
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase.waitForConnection(ZooKeeperWatcherBase.java:159)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.zookeeper.ZooKeeperClient$Builder.build(ZooKeeperClient.java:260)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	at org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase.initialize(ZKMetadataDriverBase.java:207)
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 	... 9 more
   [pod/pulsar-ci-bookie-init-ffxw6/pulsar-ci-bookie-init] 2022-05-14T04:50:28,965+0000 [epollEventLoopGroup-2-1] INFO  org.apache.zookeeper.ClientCnxnSocketNetty - SSL handler added for channel: [id: 0xbcd9c9b9]
   ```
   
   Zookeeper logs during the above timeout:
   
   ```
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:40,253+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e001f, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:44,255+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0020, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:49,717+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:49,900+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:52,254+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0022, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:52,575+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:52,643+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:54,252+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0025, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:54,582+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:54,586+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:43088
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:58,254+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0026, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:49:59,714+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:00,398+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:00,413+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:43090
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:03,414+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:08,145+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:08,252+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0028, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:08,331+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:10,202+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:10,254+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e002b, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:15,393+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:19,342+0000 [epollEventLoopGroup-7-2] ERROR org.apache.zookeeper.server.NettyServerCnxnFactory - Unsuccessful handshake with session 0x0
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:20,254+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e002e, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:20,255+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e002c, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:22,179+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-broker,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:23,699+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:24,531+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:24,713+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-zookeeper,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:24,721+0000 [epollEventLoopGroup-7-2] INFO  org.apache.zookeeper.server.NettyServerCnxn - Processing ruok command from /127.0.0.1:43092
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:28,256+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0031, timeout of 30000ms exceeded
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:28,630+0000 [epollEventLoopGroup-7-1] INFO  org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=pulsar-ci-bookie,OU=IT Department,O=StreamNative,ST=San Francisco,C=US' for Scheme 'x509'
   [pod/pulsar-ci-zookeeper-0/pulsar-ci-zookeeper] 2022-05-14T04:50:30,253+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x10000059c2e0032, timeout of 30000ms exceeded
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-helm-chart] michaeljmarshall commented on pull request #266: Add bk, zk securityContext to support upgrade to non-root docker image

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on PR #266:
URL: https://github.com/apache/pulsar-helm-chart/pull/266#issuecomment-1133142563

   @maxsxu - thank you for testing. Do you know if OpenShift works when you set `securityContext: {}` for zookeeper and bookkeeper? I had assumed (perhaps incorrectly) the PR's current security context would work because the OpenShift documentation for how to [create a docker image](https://docs.openshift.com/container-platform/4.10/openshift_images/create-images.html) to run on OpenShift explicitly says:
   
   > Because the container user is always a member of the root group, the container user can read and write these files.
   
   By setting `securityContext: {}`, the user should be a member of the root group, but I'm not sure that OpenShift will recursively make the persistent volumes root group writable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-helm-chart] frankjkelly commented on pull request #266: Add bk, zk securityContext to support upgrade to non-root docker image

Posted by GitBox <gi...@apache.org>.
frankjkelly commented on PR #266:
URL: https://github.com/apache/pulsar-helm-chart/pull/266#issuecomment-1143456998

   @michaeljmarshall Thanks for the information - all good points - esp. the Apache requirement. The challenge I think becomes however that two Pulsar deployments with the same Helm Chart version could potentially behave very differently due to different configurations despite the "immutability" of the images.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-helm-chart] michaeljmarshall commented on pull request #266: Add bk, zk securityContext to support upgrade to non-root docker image

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on PR #266:
URL: https://github.com/apache/pulsar-helm-chart/pull/266#issuecomment-1136704025

   @maxsxu - thanks for testing out my theory. I realize now that the `fsGroup` does not technically need to be `0` or `root` in this configuration. A user deploying to OpenShift can choose any user that is acceptable. The docker image will work correctly, because the `.conf` files are writable by the `root` group, and the configured `fsGroup` will own the PVCs. Essentially, OpenShift users will just need to select an `fsGroup` that passes the security context requirements.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org