You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2019/08/07 17:54:54 UTC
[GitHub] [couchdb] sergey-safarov opened a new issue #2102: kubernetes:
cluster cannot find peer nodes after statefulset recreation
sergey-safarov opened a new issue #2102: kubernetes: cluster cannot find peer nodes after statefulset recreation
URL: https://github.com/apache/couchdb/issues/2102
## Description
In kubernetes environment dns names of statefulset pods is created dynamically. When one CouchDB daemons start, other may be not available. And dns lookup will fail.
Some time later dns record will created for all started CouchDB demons, but cluster still not able join all nodes to cluster.
## Steps to Reproduce
I have configured CouchDB cluster in kubernetes environment using this service and statefulset yaml files
*Service*
```yaml
# file contains database headless service
# creates kubernetes dns records for database daemons
# required for database nodes discovery
apiVersion: v1
kind: Service
metadata:
name: db
spec:
type: ClusterIP
clusterIP: None
selector:
app: db
```
*StateFulSet*
```yaml
file contains database daemons
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: db
labels:
app: db
spec:
podManagementPolicy: Parallel
serviceName: db
replicas: 5
selector:
matchLabels:
app: db
template:
metadata:
labels:
app: db
spec:
restartPolicy: Always
containers:
- name: node
image: couchdb:2.3.1
imagePullPolicy: IfNotPresent
env:
- name: NODE_NETBIOS_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NODENAME
value: $(NODE_NETBIOS_NAME).db
- name: COUCHDB_SECRET
value: monster
- name: ERL_FLAGS
value: "-name couchdb"
- name: ERL_FLAGS
value: "-setcookie monster"
volumeMounts:
- name: pvc
mountPath: /opt/couchdb/data
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: 5984
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 3
httpGet:
path: /_up
port: 5984
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
volumeClaimTemplates:
- metadata:
name: pvc
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 128Gi
volumeName: db
```
I check cluster memberships and found all nodes online
```sh
[safarov@safarov-dell EKS]$ kubectl exec -it db-0 -- curl http://db-0.db:5984/_membership
{"all_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
```
Then I delete statefulset and check all pods is deleted
```sh
[safarov@safarov-dell yaml]$ kubectl delete -f 02-db.yaml
statefulset.apps "db" deleted
[safarov@safarov-dell yaml]$ kubectl get pods -l "app=db"
No resources found.
```
Then I create statefullset again and check all pods is ready
```sh
[safarov@safarov-dell yaml]$ kubectl create -f 02-db.yaml
statefulset.apps/db created
[safarov@safarov-dell yaml]$ kubectl get pods -l "app=db"
NAME READY STATUS RESTARTS AGE
db-0 1/1 Running 0 40s
db-1 1/1 Running 0 40s
db-2 1/1 Running 0 40s
db-3 1/1 Running 0 40s
db-4 1/1 Running 0 40s
```
And then check cluster membership again
```sh
[safarov@safarov-dell yaml]$ kubectl exec -it db-0 -- curl http://127.0.0.1:5984/_membership
{"all_nodes":["couchdb@db-0.db","couchdb@db-2.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
```
As you can see `db-0` pod not see pods `db-1`, `db-3` and `db-4`. But `db-0` pod can ask other nodes membership via dns name in url.
```sh
[safarov@safarov-dell yaml]$ kubectl exec -it db-0 -- /bin/bash
root@db-0:/# curl http://db-1.db:5984/_membership
{"all_nodes":["couchdb@db-1.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
root@db-0:/# curl http://db-2.db:5984/_membership
{"all_nodes":["couchdb@db-0.db","couchdb@db-2.db","couchdb@db-3.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
root@db-0:/# curl http://db-3.db:5984/_membership
{"all_nodes":["couchdb@db-2.db","couchdb@db-3.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
root@db-0:/# curl http://db-4.db:5984/_membership
{"all_nodes":["couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
```
As you can see couchdb cluster is broken.
If I delete pod one by one, then statefullset is create pods again and new pod will be able resolv dns names of other nodes.
```sh
[safarov@safarov-dell yaml]$ kubectl delete pod db-0
pod "db-0" deleted
[safarov@safarov-dell yaml]$ kubectl delete pod db-1
pod "db-1" deleted
[safarov@safarov-dell yaml]$ kubectl delete pod db-2
pod "db-2" deleted
[safarov@safarov-dell yaml]$ kubectl delete pod db-3
pod "db-3" deleted
[safarov@safarov-dell yaml]$ kubectl delete pod db-4
pod "db-4" deleted
[safarov@safarov-dell yaml]$ kubectl get pods -l "app=db"
NAME READY STATUS RESTARTS AGE
db-0 1/1 Running 0 54s
db-1 1/1 Running 0 48s
db-2 1/1 Running 0 33s
db-3 0/1 Running 0 14s
db-4 0/1 ContainerCreating 0 7s
```
And not all nodes is joined properly
```sh
[safarov@safarov-dell yaml]$ kubectl exec -it db-0 -- /bin/bash
root@db-0:/# curl http://db-0.db:5984/_membership
{"all_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
root@db-0:/# curl http://db-1.db:5984/_membership
{"all_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
root@db-0:/# curl http://db-2.db:5984/_membership
{"all_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
root@db-0:/# curl http://db-3.db:5984/_membership
{"all_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
root@db-0:/# curl http://db-4.db:5984/_membership
{"all_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"],"cluster_nodes":["couchdb@db-0.db","couchdb@db-1.db","couchdb@db-2.db","couchdb@db-3.db","couchdb@db-4.db"]}
```
## Expected Behaviour
1. If peer dns record is created after CouchDB daemon started, then retry connect to peer nodes.
2. All cluster nodes as able to connect to other after statefulset recreation.
## Your Environment
Kubernetes 1.13, Amazon
DockerHub couchdb:2.3.1 image.
```sh
root@db-0:/# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
```
## Additional context
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services