You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by mm...@apache.org on 2022/06/14 03:11:19 UTC

[pulsar-helm-chart] branch master updated: Add bk, zk securityContext to support upgrade to non-root docker image (#266)

This is an automated email from the ASF dual-hosted git repository.

mmarshall pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar-helm-chart.git


The following commit(s) were added to refs/heads/master by this push:
     new 428736c  Add bk, zk securityContext to support upgrade to non-root docker image (#266)
428736c is described below

commit 428736c788edce253f164ee8d6a5beb9804323e1
Author: Michael Marshall <mi...@gmail.com>
AuthorDate: Mon Jun 13 22:11:13 2022 -0500

    Add bk, zk securityContext to support upgrade to non-root docker image (#266)
    
    Master Issue: https://github.com/apache/pulsar/issues/11269
    
    ### Motivation
    
    Apache Pulsar's docker images for 2.10.0 and above are non-root by default. In order to ensure there is a safe upgrade path, we need to expose the `securityContext` for the Bookkeeper and Zookeeper StatefulSets. Here is the relevant k8s documentation on this k8s feature: https://kubernetes.io/docs/tasks/configure-pod-container/security-context.
    
    Once released, all deployments using the default `values.yaml` configuration for the `securityContext` will pay a one time penalty on upgrade where the kubelet will recursively chown files to be root group writable. It's possible to temporarily avoid this penalty by setting `securityContext: {}`.
    
    ### Modifications
    
    * Add config blocks for the `bookkeeper.securityContext` and `zookeeper.securityContext`.
    * Default to `fsGroup: 0`. This is already the default group id in the docker image, and the docker image assumes the user has root group permission.
    * Default to `fsGroupChangePolicy: "OnRootMismatch"`. This configuration will work for all deployments where the user id is stable. If the user id switches between restarts, like it does in OpenShift, please set to `Always`.
    * Remove gc configuration writing to directory that the user lacks permission. (Perhaps we want to write to `/pulsar/log/bookie-gc.log`?)
    * Add documentation to the README.
    
    ### Verifying this change
    
    I first attempted verification of this change with minikube. It did not work because minikube uses hostPath volumes by default. I then tested on EKS v1.21.9-eks-0d102a7. I tested by deploying the current, latest version of the helm chart (2.9.3) and then upgrading to this PR's version of the helm chart along with using the 2.10.0 docker image. I also tested upgrading from a default version
    
    Test 1 is a plain upgrade using the default 2.9.3 version of the chart, then upgrading to this PR's version of the chart with the modification to use the 2.10.0 docker images. It worked as expected.
    
    ```bash
    $ helm install test apache/pulsar
    $ # Wait for chart to deploy, then run the following, which uses Pulsar version 2.10.0:
    $  helm upgrade test -f charts/pulsar/values.yaml charts/pulsar/
    ```
    
    Test 2 is a plain upgrade using the default 2.9.3 version of the chart, then an upgrade to this PR's version of the chart, then an upgrade to this PR's version of the chart using 2.10.0 docker images. There is a minor error described in the `README.md`. The solution is to chown the bookie's data directory.
    
    ```bash
    $ helm install test apache/pulsar
    $ # Wait for chart to deploy, then run the following, which uses Pulsar version 2.9.2:
    $  helm upgrade test -f charts/pulsar/values.yaml charts/pulsar/
    $ # Upgrade using Pulsar version 2.10.0
    $  helm upgrade test -f charts/pulsar/values.yaml charts/pulsar/
    ```
    
    ### GC Logging
    
    In my testing, I ran into the following errors when using `-Xlog:gc:/var/log/bookie-gc.log`:
    
    ```
    pulsar-bookkeeper-verify-clusterid [0.008s] Error opening log file '/var/log/bookie-gc.log': Permission denied
    pulsar-bookkeeper-verify-clusterid [0.008s] Initialization of output 'file=/var/log/bookie-gc.log' using options '(null)' failed.
    pulsar-bookkeeper-verify-clusterid [0.005s] Error opening log file '/var/log/bookie-gc.log': Permission denied
    pulsar-bookkeeper-verify-clusterid [0.006s] Initialization of output 'file=/var/log/bookie-gc.log' using options '(null)' failed.
    pulsar-bookkeeper-verify-clusterid Invalid -Xlog option '-Xlog:gc:/var/log/bookie-gc.log', see error log for details.
    pulsar-bookkeeper-verify-clusterid Error: Could not create the Java Virtual Machine.
    pulsar-bookkeeper-verify-clusterid Error: A fatal exception has occurred. Program will exit.
    pulsar-bookkeeper-verify-clusterid Invalid -Xlog option '-Xlog:gc:/var/log/bookie-gc.log', see error log for details.
    pulsar-bookkeeper-verify-clusterid Error: Could not create the Java Virtual Machine.
    pulsar-bookkeeper-verify-clusterid Error: A fatal exception has occurred. Program will exit.
    ```
    
    I resolved the error by removing the setting.
    
    ### OpenShift Observations
    
    I wanted to seamlessly support OpenShift, so I investigated using configuring the bookkeeper and zookeeper process with `umask 002` so that they would create files and directories that are group writable (OpenShift has a stable group id, but gives the process a random user id). That worked for most tools when switching the user id, but not for RocksDB, which creates a lock file at `/pulsar/data/bookkeeper/ledgers/current/ledgers/LOCK` with the permission `0644` ignoring the umask. Her [...]
    
    ```
    2022-05-14T03:45:06,903+0000  ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server
    java.io.IOException: Error open RocksDB database
        at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:88) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.lambda$static$0(KeyValueStorageRocksDB.java:62) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        at org.apache.bookkeeper.bookie.storage.ldb.LedgerMetadataIndex.<init>(LedgerMetadataIndex.java:68) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.<init>(SingleDirectoryDbLedgerStorage.java:169) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.newSingleDirectoryDbLedgerStorage(DbLedgerStorage.java:150) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.initialize(DbLedgerStorage.java:129) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        at org.apache.bookkeeper.bookie.Bookie.<init>(Bookie.java:818) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        at org.apache.bookkeeper.proto.BookieServer.newBookie(BookieServer.java:152) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        at org.apache.bookkeeper.proto.BookieServer.<init>(BookieServer.java:120) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        at org.apache.bookkeeper.server.service.BookieService.<init>(BookieService.java:52) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:304) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        at org.apache.bookkeeper.server.Main.doMain(Main.java:226) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        at org.apache.bookkeeper.server.Main.main(Main.java:208) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
    Caused by: org.rocksdb.RocksDBException: while open a file for lock: /pulsar/data/bookkeeper/ledgers/current/ledgers/LOCK: Permission denied
        at org.rocksdb.RocksDB.open(Native Method) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
        at org.rocksdb.RocksDB.open(RocksDB.java:239) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
        at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:196) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
        ... 13 more
    ```
    
    As such, in order to support OpenShift, I exposed the `fsGroupChangePolicy`, which allows for OpenShift support, but not necessarily _seamless_ support.
---
 README.md                                          | 51 ++++++++++++++++++++++
 .../pulsar/templates/bookkeeper-statefulset.yaml   |  4 ++
 charts/pulsar/templates/zookeeper-statefulset.yaml |  4 ++
 charts/pulsar/values.yaml                          |  9 +++-
 4 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index bdc48ce..7e51a9c 100644
--- a/README.md
+++ b/README.md
@@ -56,6 +56,7 @@ It includes support for:
         - [ ] Mutal TLS
         - [ ] Kerberos
     - [x] Authorization
+    - [x] Non-root broker, bookkeeper, proxy, and zookeeper containers (version 2.10.0 and above)
 - [x] Storage
     - [x] Non-persistence storage
     - [x] Persistence Volume
@@ -178,6 +179,56 @@ helm upgrade -f pulsar.yaml \
 
 For more detailed information, see our [Upgrading](http://pulsar.apache.org/docs/en/helm-upgrade/) guide.
 
+## Upgrading to 2.10.0 and above
+
+The 2.10.0+ Apache Pulsar docker image is a non-root container, by default. That complicates an upgrade to 2.10.0
+because the existing files are owned by the root user but are not writable by the root group. In order to leverage this
+new security feature, the Bookkeeper and Zookeeper StatefulSet [securityContexts](https://kubernetes.io/docs/tasks/configure-pod-container/security-context)
+are configurable in the `values.yaml`. They default to:
+
+```yaml
+  securityContext:
+    fsGroup: 0
+    fsGroupChangePolicy: "OnRootMismatch"
+```
+
+This configuration is ideal for regular Kubernetes clusters where the UID is stable across restarts. If the process
+UID is subject to change (like it is in OpenShift), you'll need to set `fsGroupChangePolicy: "Always"`.
+
+The official docker image assumes that it is run as a member of the root group.
+
+If you upgrade to the latest version of the helm chart before upgrading to Pulsar 2.10.0, then when you perform your
+first upgrade to version >= 2.10.0, you will need to set `fsGroupChangePolicy: "Always"` on the first upgrade and then
+set it back to `fsGroupChangePolicy: "OnRootMismatch"` on subsequent upgrades. This is because the root file won't
+mismatch permissions, but the RocksDB lock file will. If you have direct access to the persistent volumes, you can
+alternatively run `chgrp -R g+w /pulsar/data` before upgrading.
+
+Here is a sample error you can expect if the RocksDB lock file is not correctly owned by the root group:
+
+```text
+2022-05-14T03:45:06,903+0000  ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server
+java.io.IOException: Error open RocksDB database
+    at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:88) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.lambda$static$0(KeyValueStorageRocksDB.java:62) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    at org.apache.bookkeeper.bookie.storage.ldb.LedgerMetadataIndex.<init>(LedgerMetadataIndex.java:68) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.<init>(SingleDirectoryDbLedgerStorage.java:169) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.newSingleDirectoryDbLedgerStorage(DbLedgerStorage.java:150) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.initialize(DbLedgerStorage.java:129) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    at org.apache.bookkeeper.bookie.Bookie.<init>(Bookie.java:818) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    at org.apache.bookkeeper.proto.BookieServer.newBookie(BookieServer.java:152) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    at org.apache.bookkeeper.proto.BookieServer.<init>(BookieServer.java:120) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    at org.apache.bookkeeper.server.service.BookieService.<init>(BookieService.java:52) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:304) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    at org.apache.bookkeeper.server.Main.doMain(Main.java:226) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    at org.apache.bookkeeper.server.Main.main(Main.java:208) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+Caused by: org.rocksdb.RocksDBException: while open a file for lock: /pulsar/data/bookkeeper/ledgers/current/ledgers/LOCK: Permission denied
+    at org.rocksdb.RocksDB.open(Native Method) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
+    at org.rocksdb.RocksDB.open(RocksDB.java:239) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
+    at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:196) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
+    ... 13 more
+```
+
 ## Uninstall
 
 To uninstall the Pulsar Chart, run the following command:
diff --git a/charts/pulsar/templates/bookkeeper-statefulset.yaml b/charts/pulsar/templates/bookkeeper-statefulset.yaml
index db63c82..43c4ba0 100644
--- a/charts/pulsar/templates/bookkeeper-statefulset.yaml
+++ b/charts/pulsar/templates/bookkeeper-statefulset.yaml
@@ -104,6 +104,10 @@ spec:
     {{- if and .Values.rbac.enabled .Values.rbac.psp }}
       serviceAccountName: "{{ template "pulsar.fullname" . }}-{{ .Values.bookkeeper.component }}"
     {{- end}}
+      {{- if .Values.bookkeeper.securityContext }}
+      securityContext:
+{{ toYaml .Values.bookkeeper.securityContext | indent 8 }}
+      {{- end }}
       initContainers:
       # This initContainer will wait for bookkeeper initnewcluster to complete
       # before deploying the bookies
diff --git a/charts/pulsar/templates/zookeeper-statefulset.yaml b/charts/pulsar/templates/zookeeper-statefulset.yaml
index 4313f7f..12640df 100644
--- a/charts/pulsar/templates/zookeeper-statefulset.yaml
+++ b/charts/pulsar/templates/zookeeper-statefulset.yaml
@@ -101,6 +101,10 @@ spec:
     {{- if and .Values.rbac.enabled .Values.rbac.psp }}
       serviceAccountName: "{{ template "pulsar.fullname" . }}-{{ .Values.zookeeper.component }}"
     {{- end }}
+      {{- if .Values.zookeeper.securityContext }}
+      securityContext:
+{{ toYaml .Values.zookeeper.securityContext | indent 8 }}
+      {{- end }}
       containers:
       - name: "{{ template "pulsar.fullname" . }}-{{ .Values.zookeeper.component }}"
         image: "{{ .Values.images.zookeeper.repository }}:{{ .Values.images.zookeeper.tag }}"
diff --git a/charts/pulsar/values.yaml b/charts/pulsar/values.yaml
index 2193169..4c6b5c4 100644
--- a/charts/pulsar/values.yaml
+++ b/charts/pulsar/values.yaml
@@ -361,6 +361,10 @@ zookeeper:
   #     readOnly: true
   extraVolumes: []
   extraVolumeMounts: []
+  # Ensures 2.10.0 non-root docker image works correctly.
+  securityContext:
+    fsGroup: 0
+    fsGroupChangePolicy: "OnRootMismatch"
   volumes:
     # use a persistent volume or emptyDir
     persistence: true
@@ -489,6 +493,10 @@ bookkeeper:
   #     readOnly: true
   extraVolumes: []
   extraVolumeMounts: []
+  # Ensures 2.10.0 non-root docker image works correctly.
+  securityContext:
+    fsGroup: 0
+    fsGroupChangePolicy: "OnRootMismatch"
   volumes:
     # use a persistent volume or emptyDir
     persistence: true
@@ -572,7 +580,6 @@ bookkeeper:
       -Xlog:safepoint
       -Xlog:gc+heap=trace
       -verbosegc
-      -Xlog:gc:/var/log/bookie-gc.log
     # configure the memory settings based on jvm memory settings
     dbStorage_writeCacheMaxSizeMb: "32"
     dbStorage_readAheadCacheMaxSizeMb: "32"