You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2019/08/07 17:54:54 UTC
[GitHub] [couchdb] sergey-safarov opened a new issue #2102: kubernetes: cluster cannot find peer nodes after statefulset recreation

sergey-safarov opened a new issue #2102: kubernetes: cluster cannot find peer nodes after statefulset recreation 
URL: https://github.com/apache/couchdb/issues/2102
 
 
   ## Description
   In kubernetes environment dns names of statefulset pods is created dynamically. When one CouchDB daemons start, other may be not available. And dns lookup will fail.
   Some time later dns record will created for all started CouchDB demons, but cluster still not able join all nodes to cluster.
   
   ## Steps to Reproduce
   
   I have configured CouchDB cluster in kubernetes environment using this service and statefulset yaml files
   
   *Service*
   ```yaml
   # file contains database headless service
   # creates kubernetes dns records for database daemons
   # required for database nodes discovery
   apiVersion: v1
   kind: Service
   metadata:
     name: db
   spec:
     type: ClusterIP
     clusterIP: None
     selector:
       app: db
   ```
   
   *StateFulSet*
   ```yaml
    file contains database daemons
   apiVersion: apps/v1
   kind: StatefulSet
   metadata:
     name: db
     labels:
       app: db
   spec:
     podManagementPolicy: Parallel
     serviceName: db
     replicas: 5
     selector:
       matchLabels:
         app: db
     template:
       metadata:
         labels:
           app: db
       spec:
         restartPolicy: Always
         containers:
         - name: node
           image: couchdb:2.3.1
           imagePullPolicy: IfNotPresent
           env:
           - name: NODE_NETBIOS_NAME
             valueFrom:
               fieldRef:
                 fieldPath: metadata.name
           - name: NODENAME
             value: $(NODE_NETBIOS_NAME).db
           - name: COUCHDB_SECRET
             value: monster
           - name: ERL_FLAGS
             value: "-name couchdb"
           - name: ERL_FLAGS
             value: "-setcookie monster"
           volumeMounts:
             - name: pvc
               mountPath: /opt/couchdb/data
           livenessProbe:
             failureThreshold: 3
             httpGet:
               path: /
               port: 5984
               scheme: HTTP
             periodSeconds: 10
             successThreshold: 1
             timeoutSeconds: 1
           readinessProbe:
             failureThreshold: 3
             httpGet:
               path: /_up
               port: 5984
               scheme: HTTP
             periodSeconds: 10
             successThreshold: 1
             timeoutSeconds: 1
     volumeClaimTemplates:
     - metadata:
         name: pvc
       spec:
         accessModes: ["ReadWriteOnce"]
         resources:
           requests:
             storage: 128Gi
         volumeName: db
   ```
   I check cluster memberships and found all nodes online
   ```sh
   [safarov@safarov-dell EKS]$ kubectl exec -it db-0  -- curl http://db-0.db:5984/_membership
   {"all_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
   ``` 
   Then I delete statefulset and check all pods is deleted
   ```sh
   [safarov@safarov-dell yaml]$ kubectl delete -f 02-db.yaml 
   statefulset.apps "db" deleted
   [safarov@safarov-dell yaml]$ kubectl get pods -l "app=db"
   No resources found.
   ```
   Then I create statefullset again and check all pods is ready
   ```sh
   [safarov@safarov-dell yaml]$ kubectl create -f 02-db.yaml 
   statefulset.apps/db created
   [safarov@safarov-dell yaml]$ kubectl get pods -l "app=db"
   NAME   READY   STATUS    RESTARTS   AGE
   db-0   1/1     Running   0          40s
   db-1   1/1     Running   0          40s
   db-2   1/1     Running   0          40s
   db-3   1/1     Running   0          40s
   db-4   1/1     Running   0          40s
   ```
   And then check cluster membership again
   ```sh
   [safarov@safarov-dell yaml]$ kubectl exec -it db-0  -- curl http://127.0.0.1:5984/_membership
   {"all_nodes":["couchdb@db-0.db","couchdb@db-2.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
   ```
   As you can see `db-0` pod not see pods `db-1`, `db-3` and `db-4`. But `db-0` pod can ask other nodes membership via dns name in url.
   ```sh
   [safarov@safarov-dell yaml]$ kubectl exec -it db-0  -- /bin/bash
   root@db-0:/# curl http://db-1.db:5984/_membership
   {"all_nodes":["couchdb@db-1.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
   root@db-0:/# curl http://db-2.db:5984/_membership
   {"all_nodes":["couchdb@db-0.db","couchdb@db-2.db","couchdb@db-3.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
   root@db-0:/# curl http://db-3.db:5984/_membership
   {"all_nodes":["couchdb@db-2.db","couchdb@db-3.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
   root@db-0:/# curl http://db-4.db:5984/_membership
   {"all_nodes":["couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
   ```
   As you can see couchdb cluster is broken.
   If I delete pod one by one, then statefullset is create pods again and new pod will be able resolv dns names of other nodes.
   ```sh
   [safarov@safarov-dell yaml]$ kubectl delete pod db-0
   pod "db-0" deleted
   [safarov@safarov-dell yaml]$ kubectl delete pod db-1
   pod "db-1" deleted
   [safarov@safarov-dell yaml]$ kubectl delete pod db-2
   pod "db-2" deleted
   [safarov@safarov-dell yaml]$ kubectl delete pod db-3
   pod "db-3" deleted
   [safarov@safarov-dell yaml]$ kubectl delete pod db-4
   pod "db-4" deleted
   [safarov@safarov-dell yaml]$ kubectl get pods -l "app=db"
   NAME   READY   STATUS              RESTARTS   AGE
   db-0   1/1     Running             0          54s
   db-1   1/1     Running             0          48s
   db-2   1/1     Running             0          33s
   db-3   0/1     Running             0          14s
   db-4   0/1     ContainerCreating   0          7s
   ```
   And not all nodes is joined properly
   ```sh
   [safarov@safarov-dell yaml]$ kubectl exec -it db-0  -- /bin/bash
   root@db-0:/# curl http://db-0.db:5984/_membership
   {"all_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
   root@db-0:/# curl http://db-1.db:5984/_membership
   {"all_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
   root@db-0:/# curl http://db-2.db:5984/_membership
   {"all_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
   root@db-0:/# curl http://db-3.db:5984/_membership
   {"all_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
   root@db-0:/# curl http://db-4.db:5984/_membership
   {"all_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
   ```
   
   ## Expected Behaviour
   
   1. If peer dns record is created after CouchDB daemon started, then retry connect to peer nodes.
   2. All cluster nodes as able to connect to other after statefulset recreation.
   
   ## Your Environment
   Kubernetes 1.13, Amazon
   DockerHub couchdb:2.3.1 image.
   
   ```sh
   root@db-0:/# cat /etc/os-release 
   PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
   NAME="Debian GNU/Linux"
   VERSION_ID="9"
   VERSION="9 (stretch)"
   ID=debian
   HOME_URL="https://www.debian.org/"
   SUPPORT_URL="https://www.debian.org/support"
   BUG_REPORT_URL="https://bugs.debian.org/"
   ```
   
   ## Additional context
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services