You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@apisix.apache.org by GitBox <gi...@apache.org> on 2021/02/25 11:23:42 UTC

[GitHub] [apisix] GBXing opened a new issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

GBXing opened a new issue #3673:
URL: https://github.com/apache/apisix/issues/3673


   ### Issue description
   Deploy Apisix and ETCD using K8S, configure 3 ETCD nodes by domain name in the Apisix configuration file, one of the ETCD nodes died, resulting in an Apisix error, the log indicates that the ETCD node domain name cannot be resolved.Is resty.etcd unable to determine if the configured nodes are normal?What's the solution
   
   ### Environment
   * apisix version (cmd: `apisix version`): 2.2
   * OS (cmd: `uname -a`): CentOS
   * OpenResty / Nginx version (cmd: `nginx -V` or `openresty -V`): 1.19.3.1
   * etcd version, if have (cmd: run `curl http://127.0.0.1:9090/v1/server_info` to get the info from server-info API): v3.4.0
   * apisix-dashboard version, if have:
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] tokers commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
tokers commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787561607


   > @Yiyiyimu I may be using it in a wrong way, do I need to add any additional configuration in config.yaml for the health check of etcd?
   
   Then you just need `etcd.apisix.svc.cluster.local` to access the etcd cluster, the `etcd-1.etcd.apisix.svc.cluster.local` is invalid, neither it's a POD domain name nor a SRV record.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] membphis commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
membphis commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-805002157


   any news? @Yiyiyimu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
spacewander commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-785879497


   Maybe we can enable this for APISIX: https://github.com/api7/lua-resty-etcd/pull/109


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] tokers commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
tokers commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787573795


   > @tokers service name is etcd,namespace is apisix, other nodes can be accessed normally
   
   As per https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-hostname-and-subdomain-fields, the FQDN has `A/AAAA` record only if the pod has `hostname` and `subdomain` fields and the `subdomain` has the same value to the headless service, I'm not sure whether you set these in etcd's Statefulset templates fields.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] Yiyiyimu commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
Yiyiyimu commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-785885676


   @GBXing 
   You could give it a try on master branch (`apisix:dev` for docker tag) after #3676 got merged if it's urgent. 
   Or if it's more urgent you could try to [build docker image from local code](https://github.com/apache/apisix-docker#build-an-image-from-source) to have a test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] nanamikon commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
nanamikon commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-833162754


   Any news?  We found the similar porblem,   some of nodes (not all) found this error message,  my etcd version is 3.4.13.
   ```
   stack traceback:
           ...p/huya-nginx-proxy//deps/share/lua/5.1/resty/etcd/v3.lua:652: in function 'res_func'
           /data/app/huya-nginx-proxy/apisix/core/config_etcd.lua:131: in function 'waitdir'
           /data/app/huya-nginx-proxy/apisix/core/config_etcd.lua:318: in function 'sync_data'
           /data/app/huya-nginx-proxy/apisix/core/config_etcd.lua:546: in function </data/app/huya-nginx-proxy/apisix/core/config_etcd.lua:536>
           [C]: in function 'xpcall'
   ```
   
   But admin api is ok provided by these nodes ,  and they can not recover forever


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] tokers edited a comment on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
tokers edited a comment on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-786541651


   @GBXing `etcd-1.etcd.apisix.svc.cluster.local` the FQDN seems not valid, what are the service name and the namespace?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] moonming commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
moonming commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-785859978


   @Yiyiyimu will chaos mesh cover this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] Yiyiyimu commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
Yiyiyimu commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-785863392


   > will chaos mesh cover this?
   
   I'll make a test tomorrow


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] GBXing commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
GBXing commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787109703


   @tokers service name is etcd,namespace is apisix, other nodes can be accessed normally


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] GBXing commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
GBXing commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787925770


   @Yiyiyimu I use version 2.2 of default-config.yaml, this is my config.yaml:
   
   	apisix:
   	  node_listen: 9080                # APISIX listening port
   	  enable_admin: true
   	  enable_admin_cors: true          # Admin API support CORS response headers.
   	  enable_debug: false
   	  enable_dev_mode: true           # Sets nginx worker_processes to 1 if set to true
   	  enable_reuseport: true           # Enable nginx SO_REUSEPORT switch if set to true.
   	  enable_ipv6: true
   	  config_center: etcd              # etcd: use etcd to store the config value
   	  allow_admin:
   	  admin_key:
   		-
   		  name: "admin"
   		  key: edd1c9f034335f136f87ad84b625c8f1
   		  role: admin                 # admin: manage all configuration data
   									  # viewer: only can view configuration data
   		-
   		  name: "viewer"
   		  key: 4054f7cf07e344346cd3f287985e76a2
   		  role: viewer
   
   	nginx_config:
   	  error_log: "logs/error.log"
   	  error_log_level: "warn"
   	  http:
   	      lua_shared_dicts:
   		  shared-datamap: 50m
   
   	etcd:
   	  host:
   		- "http://etcd-node1:2379"
   		- "http://etcd-node2:2379"
   		- "http://etcd-node3:2379"
   		# - "http://127.0.0.1:2379"
   		# - "http://172.17.0.1:2379"   
   	  prefix: "/apisix"           
   	  timeout: 30
   	  tls:
   		verify: true
   	  # resync_delay: 5             
   	  # user: root                  
   	  # password: 5tHkHhYkjr6cQY    
   
   
   
   	plugins:                          # plugin list (sorted in alphabetical order)
   	  - api-breaker
   	  - authz-keycloak
   	  - basic-auth
   	  - batch-requests
   	  - consumer-restriction
   	  - cors
   	  - echo
   	  # - error-log-logger
   	  # - example-plugin
   	  - fault-injection
   	  - grpc-transcode
   	  - hmac-auth
   	  - http-logger
   	  - ip-restriction
   	  - jwt-auth
   	  - kafka-logger
   	  - key-auth
   	  - limit-conn
   	  - limit-count
   	  - limit-req
   	  # - log-rotate
   	  # - node-status
   	  - openid-connect
   	  - prometheus
   	  - proxy-cache
   	  - proxy-mirror
   	  - proxy-rewrite
   	  - redirect
   	  - referer-restriction
   	  - request-id
   	  - request-validation
   	  - response-rewrite
   	  - serverless-post-function
   	  - serverless-pre-function
   	  # - skywalking
   	  - sls-logger
   	  - syslog
   	  - tcp-logger
   	  - udp-logger
   	  - uri-blocker
   	  - wolf-rbac
   	  - zipkin
   	  - server-info
   	  - traffic-split
   
   	plugin_attr:
   	  log-rotate:
   		interval: 3600
   		max_kept: 168
   	  skywalking:
   		service_name: APISIX
   		service_instance_name: "APISIX Instance Name"
   		endpoint_addr: http://127.0.0.1:12800
   	  prometheus:
   		export_uri: /apisix/prometheus/metrics
   	  server-info:
   		report_interval: 60
   		report_ttl: 3600
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] Yiyiyimu commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
Yiyiyimu commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787710873


   > v3.lua:593: attempt to index field 'result' (a nil value)
   is this a bug of apisix?
   
   Yes I do think so, we missed some test coverage, so no related error messages are prepared for this 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] membphis commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
membphis commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-798930161


   ping @Yiyiyimu 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] Yiyiyimu commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
Yiyiyimu commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787712261


   > I'm not sure whether you set these in etcd's Statefulset templates fields.
   
   Hi @tokers it seems @GBXing deploy etcd with docker but not k8s, so that might not be the solution


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] Yiyiyimu commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
Yiyiyimu commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787094518


   Hi @GBXing, I tried to reproduce the problem, and the result seems a bit different from what you get. My reproduce step is:
   
   1. Configure etcd host with domain name:
   
       ```
       DNS_IP=$(kubectl get svc -n kube-system -l k8s-app=kube-dns -o 'jsonpath={..spec.clusterIP}')
       echo "dns_resolver:
         - ${DNS_IP}
       etcd:
         host:
           - \\"<http://etcd-cluster-client.default.svc.cluster.local:2379>\\" " > ./conf/config.yaml
       ```
   
   2. Setup APISIX and everything works as expect
   3. Kill leader/follower pod of etcd ( gives me the same result ), and the error log would produce:
   
       ```
       # Multiple of
       2021/02/27 15:30:19 [error] 49#49: *114289 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:593: attempt to index field 'result' (a nil value)
       stack traceback:
       /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:593: in function 'res_func'
       /usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
       /usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
       /usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
       [C]: in function 'xpcall'
       /usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/global_rules, context: ngx.timer
   
       # Multiple of
       2021/02/27 15:30:38 [error] 53#53: *113602 [lua] config_etcd.lua:544: failed to fetch data from etcd: connection refused, etcd key: /apisix/ssl, context: ngx.timer
       ```
   
       With etcd-operator, etcd got unreachable for seconds and returned back to normal.
   
   Is there any places I missed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] Yiyiyimu commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
Yiyiyimu commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787711777


   @GBXing Got it, the error log is the same. Could you also show your `config.yaml`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] GBXing edited a comment on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
GBXing edited a comment on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787435279


   @Yiyiyimu I used docker locally to start the etcd cluster and Apisix and pause one of the etcd nodes, just like the k8s environment
   
   its my exception log:
   
   	2021/02/28 10:21:59 [error] 54#54: *13 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   	stack traceback:
   		/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   		/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   		/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   		/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   		[C]: in function 'xpcall'
   		/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/upstreams, context: ngx.timer
   	2021/02/28 10:21:59 [error] 54#54: *6 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   	stack traceback:
   		/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   		/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   		/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   		/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   		[C]: in function 'xpcall'
   		/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/ssl, context: ngx.timer
   	2021/02/28 10:21:59 [error] 53#53: *36 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   	stack traceback:
   		/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   		/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   		/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   		/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   		[C]: in function 'xpcall'
   		/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/upstreams, context: ngx.timer
   	2021/02/28 10:21:59 [error] 53#53: *27 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   	stack traceback:
   		/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   		/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   		/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   		/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   		[C]: in function 'xpcall'
   		/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/global_rules, context: ngx.timer
   	2021/02/28 10:22:28 [error] 53#53: *28 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/services, context: ngx.timer
   	2021/02/28 10:22:29 [error] 54#54: *12 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/consumers, context: ngx.timer
   	2021/02/28 10:23:03 [error] 54#54: *4 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/routes, context: ngx.timer
   	2021/02/28 10:23:04 [error] 53#53: *35 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/consumers, context: ngx.timer
   	2021/02/28 10:23:18 [error] 54#54: *445 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/ssl, context: ngx.timer
   
   and calling the admin api returns:
   
   	{
   		"error_msg": "etcd-node2 could not be resolved (3: Host not found)"
   	}
   
   by printing the log I found that no health check for etcd was started and no health_check.init () method was called


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] moonming commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
moonming commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787205587


   v3.lua:593: attempt to index field 'result' (a nil value)
   
   
   is this a bug of apisix?
   
   
   GBXing <no...@github.com>于2021年2月28日 周日上午1:53写道:
   
   > @Yiyiyimu <https://github.com/Yiyiyimu> I may be using it in a wrong way,
   > do I need to add any additional configuration in config.yaml for the health
   > check of etcd?
   >
   > —
   > You are receiving this because you commented.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/apisix/issues/3673#issuecomment-787109936>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AGJZBK3SOBVUCYLPGTNIDETTBEWQRANCNFSM4YGKIJPA>
   > .
   >
   -- 
   Thanks,
   Ming Wen
   Twitter: _WenMing
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] tokers commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
tokers commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-786541651


   @GBXing `etcd-1.etcd.apisix.svc.cluster.local` the FQDN seems not valid, what are the service name and the namespace.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander closed issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
spacewander closed issue #3673:
URL: https://github.com/apache/apisix/issues/3673


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] Yiyiyimu edited a comment on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
Yiyiyimu edited a comment on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787094518


   Hi @GBXing, I tried to reproduce the problem, and my reproduce steps are:
   
   1. Configure etcd host with domain name:
   
       ```
       DNS_IP=$(kubectl get svc -n kube-system -l k8s-app=kube-dns -o 'jsonpath={..spec.clusterIP}')
       echo "dns_resolver:
         - ${DNS_IP}
       etcd:
         host:
           - \\"<http://etcd-cluster-client.default.svc.cluster.local:2379>\\" " > ./conf/config.yaml
       ```
   
   2. Setup APISIX and everything works as expect
   3. Kill leader/follower pod of etcd ( gives me the same result ), and the error log would produce:
   
       ```
       # Multiple of
       2021/02/27 15:30:19 [error] 49#49: *114289 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:593: attempt to index field 'result' (a nil value)
       stack traceback:
       /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:593: in function 'res_func'
       /usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
       /usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
       /usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
       [C]: in function 'xpcall'
       /usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/global_rules, context: ngx.timer
   
       # Multiple of
       2021/02/27 15:30:38 [error] 53#53: *113602 [lua] config_etcd.lua:544: failed to fetch data from etcd: connection refused, etcd key: /apisix/ssl, context: ngx.timer
       ```
   
       With etcd-operator, etcd got unreachable for seconds and returned back to normal.
   
   Is the error log the same with what you met


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] tokers commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
tokers commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787717328


   > > I'm not sure whether you set these in etcd's Statefulset templates fields.
   > 
   > Hi @tokers it seems @GBXing deploy etcd with docker but not k8s, so that might not be the solution
   
   I see, but that the case I'm confusing on 😂


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] GBXing commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
GBXing commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787433605


   @moonming Yes, the etcd cluster outputs the exception log after the nodes are paused
   `2021/02/28 10:21:59 [error] 54#54: *13 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   stack traceback:
   	/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   	/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   	/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   	/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   	[C]: in function 'xpcall'
   	/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/upstreams, context: ngx.timer
   2021/02/28 10:21:59 [error] 54#54: *6 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   stack traceback:
   	/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   	/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   	/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   	/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   	[C]: in function 'xpcall'
   	/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/ssl, context: ngx.timer`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] GBXing commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
GBXing commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787109936


   @Yiyiyimu I may be using it in a wrong way, do I need to add any additional configuration in config.yaml for the health check of  etcd?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] tokers removed a comment on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
tokers removed a comment on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787561607


   > @Yiyiyimu I may be using it in a wrong way, do I need to add any additional configuration in config.yaml for the health check of etcd?
   
   Then you just need `etcd.apisix.svc.cluster.local` to access the etcd cluster, the `etcd-1.etcd.apisix.svc.cluster.local` is invalid, neither it's a POD domain name nor a SRV record.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] GBXing commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
GBXing commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-786399263


   @Yiyiyimu 
   I incorporated resty.etcd 1.4.4 into the  code test. The native works with IP configuration, but the domain name configuration in the K8S environment still gets an error:etcd-1.etcd.apisix.svc.cluster.local could not be resolved (3: Host not found)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] GBXing commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
GBXing commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787435279


   I used docker locally to start the etcd cluster and Apisix and pause one of the etcd nodes, just like the k8s environment
   
   its my exception log:
   
   	2021/02/28 10:21:59 [error] 54#54: *13 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   	stack traceback:
   		/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   		/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   		/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   		/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   		[C]: in function 'xpcall'
   		/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/upstreams, context: ngx.timer
   	2021/02/28 10:21:59 [error] 54#54: *6 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   	stack traceback:
   		/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   		/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   		/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   		/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   		[C]: in function 'xpcall'
   		/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/ssl, context: ngx.timer
   	2021/02/28 10:21:59 [error] 53#53: *36 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   	stack traceback:
   		/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   		/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   		/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   		/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   		[C]: in function 'xpcall'
   		/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/upstreams, context: ngx.timer
   	2021/02/28 10:21:59 [error] 53#53: *27 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   	stack traceback:
   		/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   		/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   		/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   		/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   		[C]: in function 'xpcall'
   		/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/global_rules, context: ngx.timer
   	2021/02/28 10:22:28 [error] 53#53: *28 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/services, context: ngx.timer
   	2021/02/28 10:22:29 [error] 54#54: *12 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/consumers, context: ngx.timer
   	2021/02/28 10:23:03 [error] 54#54: *4 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/routes, context: ngx.timer
   	2021/02/28 10:23:04 [error] 53#53: *35 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/consumers, context: ngx.timer
   	2021/02/28 10:23:18 [error] 54#54: *445 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/ssl, context: ngx.timer
   
   and calling the admin api returns:
   
   	{
   		"error_msg": "etcd-node2 could not be resolved (3: Host not found)"
   	}


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] GBXing edited a comment on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
GBXing edited a comment on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-787435279


   @Yiyiyimu I used docker locally to start the etcd cluster and Apisix and pause one of the etcd nodes, just like the k8s environment
   
   its my exception log:
   
   	2021/02/28 10:21:59 [error] 54#54: *13 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   	stack traceback:
   		/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   		/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   		/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   		/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   		[C]: in function 'xpcall'
   		/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/upstreams, context: ngx.timer
   	2021/02/28 10:21:59 [error] 54#54: *6 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   	stack traceback:
   		/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   		/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   		/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   		/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   		[C]: in function 'xpcall'
   		/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/ssl, context: ngx.timer
   	2021/02/28 10:21:59 [error] 53#53: *36 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   	stack traceback:
   		/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   		/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   		/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   		/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   		[C]: in function 'xpcall'
   		/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/upstreams, context: ngx.timer
   	2021/02/28 10:21:59 [error] 53#53: *27 [lua] config_etcd.lua:566: failed to fetch data from etcd: /usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: attempt to index field 'result' (a nil value)
   	stack traceback:
   		/usr/local/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:649: in function 'res_func'
   		/usr/local/apisix/apisix/core/config_etcd.lua:125: in function 'waitdir'
   		/usr/local/apisix/apisix/core/config_etcd.lua:305: in function 'sync_data'
   		/usr/local/apisix/apisix/core/config_etcd.lua:540: in function </usr/local/apisix/apisix/core/config_etcd.lua:530>
   		[C]: in function 'xpcall'
   		/usr/local/apisix/apisix/core/config_etcd.lua:530: in function </usr/local/apisix/apisix/core/config_etcd.lua:521>,  etcd key: /apisix/global_rules, context: ngx.timer
   	2021/02/28 10:22:28 [error] 53#53: *28 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/services, context: ngx.timer
   	2021/02/28 10:22:29 [error] 54#54: *12 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/consumers, context: ngx.timer
   	2021/02/28 10:23:03 [error] 54#54: *4 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/routes, context: ngx.timer
   	2021/02/28 10:23:04 [error] 53#53: *35 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/consumers, context: ngx.timer
   	2021/02/28 10:23:18 [error] 54#54: *445 [lua] config_etcd.lua:544: failed to fetch data from etcd: etcd-node2 could not be resolved (3: Host not found),  etcd key: /apisix/ssl, context: ngx.timer
   
   and calling the admin api returns:
   
   	{
   		"error_msg": "etcd-node2 could not be resolved (3: Host not found)"
   	}


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] Yiyiyimu commented on issue #3673: request help: A node in the K8S environment ETCD cluster died, causing Apisix to fail

Posted by GitBox <gi...@apache.org>.
Yiyiyimu commented on issue #3673:
URL: https://github.com/apache/apisix/issues/3673#issuecomment-833164013


   > Any news? We found the similar porblem, some of nodes (not all) found this error message, my etcd version is 3.4.13.
   
   @nanamikon will add PR to solve it this week
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org