You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@bigtop.apache.org by yw...@apache.org on 2019/10/14 06:36:10 UTC

[bigtop] branch cnb updated: BIGTOP-3239: Provisioning the storage on kubernetes via Rook

This is an automated email from the ASF dual-hosted git repository.

ywkim pushed a commit to branch cnb
in repository https://gitbox.apache.org/repos/asf/bigtop.git


The following commit(s) were added to refs/heads/cnb by this push:
     new 898259d  BIGTOP-3239: Provisioning the storage on kubernetes via Rook
898259d is described below

commit 898259d1d41663e1d25ccd36fee094bf102d8629
Author: Youngwoo Kim <yw...@apache.org>
AuthorDate: Mon Oct 14 15:35:23 2019 +0900

    BIGTOP-3239: Provisioning the storage on kubernetes via Rook
---
 README.md                            |  82 +++++++++++++++--
 bigtop.bom                           |  12 ++-
 storage/rook/ceph/cluster-test.yaml  |  57 ++++++++++++
 storage/rook/ceph/cluster.yaml       | 170 +++++++++++++++++++++++++++++++++++
 storage/rook/minio/object-store.yaml |  72 +++++++++++++++
 5 files changed, 383 insertions(+), 10 deletions(-)

diff --git a/README.md b/README.md
index 5952ea0..a254467 100755
--- a/README.md
+++ b/README.md
@@ -25,7 +25,7 @@ TBD
 
 Prerequisites:
 - Vagrant
-- Java 
+- Java
 
 ## Set up 3-Node Kubernetes cluster via Kubespray on local machine
 ```
@@ -51,6 +51,72 @@ kubernetes-dashboard is running at https://172.17.8.101:6443/api/v1/namespaces/k
 To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
 ```
 
+## Storage
+You need to install ```lvm2``` package for Rook-Ceph:
+```
+# Centos
+sudo yum install -y lvm2
+
+# Ubuntu
+sudo apt-get install -y lvm2
+```
+Refer to https://rook.io/docs/rook/v1.1/k8s-pre-reqs.html for prerequisites on Rook
+
+Run ```download``` task to get Rook binary:
+```
+$ ./gradlew rook-clean rook-download && cd dl/ && tar xvfz rook-1.1.2.tar.gz
+```
+
+Create Rook operator:
+```
+$ kubectl create -f dl/rook-1.1.2/cluster/examples/kubernetes/ceph/common.yaml
+$ kubectl create -f dl/rook-1.1.2/cluster/examples/kubernetes/ceph/operator.yaml
+$ kubectl -n rook-ceph get pod
+```
+
+Create Ceph cluster:
+```
+# test
+$ kubectl create -f storage/rook/ceph/cluster-test.yaml
+
+# production
+$ kubectl create -f storage/rook/ceph/cluster.yaml
+
+$ kubectl get pod -n rook-ceph
+```
+
+Deploy Ceph Toolbox:
+```
+$ kubectl create -f dl/rook-1.1.2/cluster/examples/kubernetes/ceph/toolbox.yaml
+$ kubectl -n rook-ceph get pod -l "app=rook-ceph-tools"
+$ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
+
+# ceph status &&
+ceph osd status &&
+ceph df &&
+rados df &&
+```
+Refer to https://rook.io/docs/rook/v1.1/ceph-toolbox.html for more details.
+
+Create a StorageClass for Ceph RBD:
+```
+$ kubectl create -f dl/rook-1.1.2/cluster/examples/kubernetes/ceph/csi/rbd/storageclass.yaml
+kubectl get storageclass
+rook-ceph-block
+```
+
+Create Minio operator:
+```
+$ kubectl create -f dl/rook-1.1.2/cluster/examples/kubernetes/minio/operator.yaml
+
+#
+$ kubectl -n rook-minio-system get pod
+```
+```
+$ kubectl create -f storage/rook/minio/object-store.yaml
+$ kubectl -n rook-minio get pod -l app=minio,objectstore=my-store
+```
+
 # Cloud Native Bigtop
 This is the content for the talk given by jay vyas and sid mani @ apachecon 2019 in Las Vegas,  you can watch it here  https://www.youtube.com/watch?v=LUCE63q !
 
@@ -61,7 +127,7 @@ helm install stable/nfs-server-provisioner ; kubectl patch storageclass nfs -p '
 Minio:  kubectl -n minio create secret generic my-minio-secret --from-literal=accesskey=minio --from-literal=secretkey=minio123
 helm install --set existingSecret=my-minio-secret stable/minio --namespace=minio --name=minio
 Nifi: helm repo add cetic https://cetic.github.io/helm-charts ; helm install nifi --namespace=minio
-Kafka:  helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator $ helm install --name my-kafka incubator/kafka , kubectl edit statefulset kafka 
+Kafka:  helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator $ helm install --name my-kafka incubator/kafka , kubectl edit statefulset kafka
  envFrom:
         - configMapRef:
             name: kafka-cm
@@ -177,25 +243,23 @@ In particular, this repo modifies stock helm charts in a variety of ways to make
 1. We don't use stable/spark because its *old*.  Instead we use microsofts spark, which comes integrated
 with zepplin properly.
 2. We use configmaps for configuration of *spark*.  For spark, this allows us to inject
-different types of configuration stuff from the kuberentes level, rather then baking them into the image (note that 
+different types of configuration stuff from the kuberentes level, rather then baking them into the image (note that
 you cant just inject a single file from a config map, b/c it overwrites the whole directory).  This allows us
 to inject minio access properties into spark itself, while also injecting other config.
-3. For Kafka, we config map the environment variables so that we can use the same zookeeper instance as 
+3. For Kafka, we config map the environment variables so that we can use the same zookeeper instance as
 NiFi.  
 4. For Presto, the configuration parameters for workers/masters are all injected also via config map.  We use
 a fork of https://github.com/dharmeshkakadia/presto-kubernetes for this change (PR's are submitted to make this upstream).
 5. For minio there arent any major changes needed out of the box, except using emptyDir for storage if you dont have a volume provisioner.
 6. For HBase, we also reuse the same zookeeper instance that is used via NIFI and kafka.  For now we use the nifi zk deployment but at some point we will make ZK a first class citizen.
 
-============================================ 
+============================================
 
 Notes and Ideas
- 
-# Inspiration 
+
+# Inspiration
 
 Recently saw https://github.com/dacort/damons-data-lake.
 - A problem set that is increasingly relevant: lots of sources, real time, unstructured warehouse/lake.
 - No upstream plug-and-play alternative to cloud native services stack.
 - Infrastructure, storage, networking is the hardest part.
-
-
diff --git a/bigtop.bom b/bigtop.bom
index 99ab5c8..af45d12 100644
--- a/bigtop.bom
+++ b/bigtop.bom
@@ -133,7 +133,17 @@ bigtop {
       url     { site = "https://github.com/kubernetes-sigs/kubespray/archive/"
                 archive = site }
     }
-
+    'rook' {
+      name    = "rook"
+      pkg     = "rook"
+      relNotes = "Rook is an open source cloud-native storage orchestrator for Kubernetes"
+      website = "http://https://github.com/rook/rook"
+      version { base = '1.1.2'; pkg = base; release = 1 }
+      tarball { destination = "$name-${version.base}.tar.gz"
+                source      = "v${version.base}.tar.gz" }
+      url     { site = "https://github.com/rook/rook/archive"
+                archive = site }
+    }
     'zookeeper' {
       name    = 'zookeeper'
       pkg     = name
diff --git a/storage/rook/ceph/cluster-test.yaml b/storage/rook/ceph/cluster-test.yaml
new file mode 100644
index 0000000..4920fbf
--- /dev/null
+++ b/storage/rook/ceph/cluster-test.yaml
@@ -0,0 +1,57 @@
+#################################################################################################################
+# Define the settings for the rook-ceph cluster with settings that should only be used in a test environment.
+# A single filestore OSD will be created in the dataDirHostPath.
+# For example, to create the cluster:
+#   kubectl create -f common.yaml
+#   kubectl create -f operator.yaml
+#   kubectl create -f cluster-test.yaml
+#################################################################################################################
+
+apiVersion: ceph.rook.io/v1
+kind: CephCluster
+metadata:
+  name: rook-ceph
+  namespace: rook-ceph
+spec:
+  cephVersion:
+    image: ceph/ceph:v14.2.4-20190917
+    allowUnsupported: true
+  dataDirHostPath: /var/lib/rook
+  skipUpgradeChecks: false
+  mon:
+    count: 1
+    allowMultiplePerNode: true
+  dashboard:
+    enabled: true
+    ssl: true
+  monitoring:
+    enabled: false  # requires Prometheus to be pre-installed
+    rulesNamespace: rook-ceph
+  network:
+    hostNetwork: false
+  rbdMirroring:
+    workers: 0
+  mgr:
+    modules:
+    # the pg_autoscaler is only available on nautilus or newer. remove this if testing mimic.
+    - name: pg_autoscaler
+      enabled: true
+  storage:
+    useAllNodes: true
+    useAllDevices: false
+    deviceFilter:
+    config:
+      databaseSizeMB: "1024" # this value can be removed for environments with normal sized disks (100 GB or larger)
+      journalSizeMB: "1024"  # this value can be removed for environments with normal sized disks (20 GB or larger)
+      osdsPerDevice: "1" # this value can be overridden at the node or device level
+    directories:
+    - path: /var/lib/rook
+#    nodes:
+#    - name: "minikube"
+#      directories:
+#      - path: "/data/rook-dir"
+#      devices:
+#      - name: "sdb"
+#      - name: "nvme01" # multiple osds can be created on high performance devices
+#        config:
+#          osdsPerDevice: "5"
diff --git a/storage/rook/ceph/cluster.yaml b/storage/rook/ceph/cluster.yaml
new file mode 100644
index 0000000..67cacf9
--- /dev/null
+++ b/storage/rook/ceph/cluster.yaml
@@ -0,0 +1,170 @@
+#################################################################################################################
+# Define the settings for the rook-ceph cluster with common settings for a production cluster.
+# All nodes with available raw devices will be used for the Ceph cluster. At least three nodes are required
+# in this example. See the documentation for more details on storage settings available.
+
+# For example, to create the cluster:
+#   kubectl create -f common.yaml
+#   kubectl create -f operator.yaml
+#   kubectl create -f cluster.yaml
+#################################################################################################################
+
+apiVersion: ceph.rook.io/v1
+kind: CephCluster
+metadata:
+  name: rook-ceph
+  namespace: rook-ceph
+spec:
+  cephVersion:
+    # The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw).
+    # v13 is mimic, v14 is nautilus, and v15 is octopus.
+    # RECOMMENDATION: In production, use a specific version tag instead of the general v14 flag, which pulls the latest release and could result in different
+    # versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/.
+    image: ceph/ceph:v14.2.4-20190917
+    # Whether to allow unsupported versions of Ceph. Currently mimic and nautilus are supported, with the recommendation to upgrade to nautilus.
+    # Octopus is the version allowed when this is set to true.
+    # Do not set to true in production.
+    allowUnsupported: false
+  # The path on the host where configuration files will be persisted. Must be specified.
+  # Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster.
+  # In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment.
+  dataDirHostPath: /var/lib/rook
+  # Whether or not upgrade should continue even if a check fails
+  # This means Ceph's status could be degraded and we don't recommend upgrading but you might decide otherwise
+  # Use at your OWN risk
+  # To understand Rook's upgrade process of Ceph, read https://rook.io/docs/rook/master/ceph-upgrade.html#ceph-version-upgrades
+  skipUpgradeChecks: false
+  # set the amount of mons to be started
+  mon:
+    count: 3
+    allowMultiplePerNode: false
+  mgr:
+    modules:
+    # Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules
+    # are already enabled by other settings in the cluster CR and the "rook" module is always enabled.
+    # - name: pg_autoscaler
+    #   enabled: true
+  # enable the ceph dashboard for viewing cluster status
+  dashboard:
+    enabled: true
+    # serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
+    # urlPrefix: /ceph-dashboard
+    # serve the dashboard at the given port.
+    # port: 8443
+    # serve the dashboard using SSL
+    ssl: true
+  # enable prometheus alerting for cluster
+  monitoring:
+    # requires Prometheus to be pre-installed
+    enabled: false
+    # namespace to deploy prometheusRule in. If empty, namespace of the cluster will be used.
+    # Recommended:
+    # If you have a single rook-ceph cluster, set the rulesNamespace to the same namespace as the cluster or keep it empty.
+    # If you have multiple rook-ceph clusters in the same k8s cluster, choose the same namespace (ideally, namespace with prometheus
+    # deployed) to set rulesNamespace for all the clusters. Otherwise, you will get duplicate alerts with multiple alert definitions.
+    rulesNamespace: rook-ceph
+  network:
+    # toggle to use hostNetwork
+    hostNetwork: false
+  rbdMirroring:
+    # The number of daemons that will perform the rbd mirroring.
+    # rbd mirroring must be configured with "rbd mirror" from the rook toolbox.
+    workers: 0
+  # To control where various services will be scheduled by kubernetes, use the placement configuration sections below.
+  # The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and
+  # tolerate taints with a key of 'storage-node'.
+#  placement:
+#    all:
+#      nodeAffinity:
+#        requiredDuringSchedulingIgnoredDuringExecution:
+#          nodeSelectorTerms:
+#          - matchExpressions:
+#            - key: role
+#              operator: In
+#              values:
+#              - storage-node
+#      podAffinity:
+#      podAntiAffinity:
+#      tolerations:
+#      - key: storage-node
+#        operator: Exists
+# The above placement information can also be specified for mon, osd, and mgr components
+#    mon:
+# Monitor deployments may contain an anti-affinity rule for avoiding monitor
+# collocation on the same node. This is a required rule when host network is used
+# or when AllowMultiplePerNode is false. Otherwise this anti-affinity rule is a
+# preferred rule with weight: 50.
+#    osd:
+#    mgr:
+  annotations:
+#    all:
+#    mon:
+#    osd:
+# If no mgr annotations are set, prometheus scrape annotations will be set by default.
+#   mgr:
+  resources:
+# The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
+#    mgr:
+#      limits:
+#        cpu: "500m"
+#        memory: "1024Mi"
+#      requests:
+#        cpu: "500m"
+#        memory: "1024Mi"
+# The above example requests/limits can also be added to the mon and osd components
+#    mon:
+#    osd:
+  storage: # cluster level storage configuration and selection
+    useAllNodes: true
+    useAllDevices: true
+    deviceFilter:
+    location:
+    config:
+      # The default and recommended storeType is dynamically set to bluestore for devices and filestore for directories.
+      # Set the storeType explicitly only if it is required not to use the default.
+      # storeType: bluestore
+      # metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.
+      # databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB
+      # journalSizeMB: "1024"  # uncomment if the disks are 20 GB or smaller
+      # osdsPerDevice: "1" # this value can be overridden at the node or device level
+      # encryptedDevice: "true" # the default value for this option is "false"
+# Cluster level list of directories to use for filestore-based OSD storage. If uncomment, this example would create an OSD under the dataDirHostPath.
+    #directories:
+    #- path: /var/lib/rook
+# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
+# nodes below will be used as storage resources.  Each node's 'name' field should match their 'kubernetes.io/hostname' label.
+#    nodes:
+#    - name: "172.17.4.101"
+#      directories: # specific directories to use for storage can be specified for each node
+#      - path: "/rook/storage-dir"
+#      resources:
+#        limits:
+#          cpu: "500m"
+#          memory: "1024Mi"
+#        requests:
+#          cpu: "500m"
+#          memory: "1024Mi"
+#    - name: "172.17.4.201"
+#      devices: # specific devices to use for storage can be specified for each node
+#      - name: "sdb"
+#      - name: "nvme01" # multiple osds can be created on high performance devices
+#        config:
+#          osdsPerDevice: "5"
+#      config: # configuration can be specified at the node level which overrides the cluster level config
+#        storeType: filestore
+#    - name: "172.17.4.301"
+#      deviceFilter: "^sd."
+  # The section for configuring management of daemon disruptions during upgrade or fencing.
+  disruptionManagement:
+    # If true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically
+    # via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph-managed-disruptionbudgets.md). The operator will
+    # block eviction of OSDs by default and unblock them safely when drains are detected.
+    managePodBudgets: false
+    # A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the
+    # default DOWN/OUT interval) when it is draining. This is only relevant when  `managePodBudgets` is `true`. The default value is `30` minutes.
+    osdMaintenanceTimeout: 30
+    # If true, the operator will create and manage MachineDisruptionBudgets to ensure OSDs are only fenced when the cluster is healthy.
+    # Only available on OpenShift.
+    manageMachineDisruptionBudgets: false
+    # Namespace in which to watch for the MachineDisruptionBudgets.
+    machineDisruptionBudgetNamespace: openshift-machine-api
diff --git a/storage/rook/minio/object-store.yaml b/storage/rook/minio/object-store.yaml
new file mode 100644
index 0000000..200aea3
--- /dev/null
+++ b/storage/rook/minio/object-store.yaml
@@ -0,0 +1,72 @@
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: rook-minio
+---
+apiVersion: v1
+kind: Secret
+metadata:
+  name: minio-my-store-access-keys
+  namespace: rook-minio
+type: Opaque
+data:
+  # Base64 encoded string: "TEMP_DEMO_ACCESS_KEY"
+  username: VEVNUF9ERU1PX0FDQ0VTU19LRVk=
+  # Base64 encoded string: "TEMP_DEMO_SECRET_KEY"
+  password: VEVNUF9ERU1PX1NFQ1JFVF9LRVk=
+---
+apiVersion: minio.rook.io/v1alpha1
+kind: ObjectStore
+metadata:
+  name: my-store
+  namespace: rook-minio
+spec:
+  scope:
+    nodeCount: 4
+    # You can have multiple PersistentVolumeClaims in the volumeClaimTemplates list.
+    # Be aware though that all PersistentVolumeClaim Templates will be used for each intance (see nodeCount).
+    volumeClaimTemplates:
+    - metadata:
+        name: rook-minio-data1
+      spec:
+        accessModes: [ "ReadWriteOnce" ]
+        # Set the storage class that will be used, otherwise Kubernetes' default storage class will be used.
+        #storageClassName: "my-storage-class"
+        storageClassName: "rook-ceph-block"
+        resources:
+          requests:
+            storage: "8Gi"
+    #- metadata:
+    #    name: rook-minio-data2
+    #  spec:
+    #    accessModes: [ "ReadWriteOnce" ]
+    #    # Uncomment and specify your StorageClass, otherwise
+    #    # the cluster admin defined default StorageClass will be used.
+    #    #storageClassName: "your-cluster-storageclass"
+    #    resources:
+    #      requests:
+    #        storage: "8Gi"
+  # A key value list of annotations
+  annotations:
+  #  key: value
+  placement:
+    tolerations:
+    nodeAffinity:
+    podAffinity:
+    podAnyAffinity:
+  credentials:
+    name: minio-my-store-access-keys
+    namespace: rook-minio
+  clusterDomain:
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: minio-my-store
+  namespace: rook-minio
+spec:
+  type: NodePort
+  ports:
+    - port: 9000
+  selector:
+    app: minio