You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/10 07:26:20 UTC

[GitHub] [airflow] gakhrejah opened a new issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

gakhrejah opened a new issue #12229:
URL: https://github.com/apache/airflow/issues/12229


   Hi Team,
   
   We are getting below error Logs while running the Apache Airflow On AWS EKS .
   All the Pods(Tasks) are in completed state but not removed by Airflow. I had to do manual restart of scheduler it everything works for 2-3 days. Then again all the tasks are stuck .
   
   ERROR LOGS
   [2020-11-10 07:00:07,752] {{kubernetes_executor.py:447}} ERROR - Error while health checking kube watcher process. Process died for unknown reasons
   [2020-11-10 07:00:07,765] {{kubernetes_executor.py:351}} INFO - Event: and now my watch begins starting at resource_version: 107544455
   [2020-11-10 07:00:07,782] {{kubernetes_executor.py:342}} ERROR - Unknown error in KubernetesJobWatcher. Failing
   Traceback (most recent call last):
     File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 340, in run
       self.worker_uuid, self.kube_config)
     File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 364, in _run
       **kwargs):
     File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 177, in stream
       status=obj['code'], reason=reason)
   kubernetes.client.exceptions.ApiException: (410)
   Reason: Gone: too old resource version: 107544455 (108550177)
   
   Process KubernetesJobWatcher-135237:
   Traceback (most recent call last):
     File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
       self.run()
     File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 340, in run
       self.worker_uuid, self.kube_config)
     File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 364, in _run
       **kwargs):
     File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 177, in stream
       status=obj['code'], reason=reason)
   kubernetes.client.exceptions.ApiException: (410)
   Reason: Gone: too old resource version: 107544455 (108550177)
   
   AIRFLOW_VERSION=1.10.9
   ENVIRONMENT: QA| PROD
   Docker Image : python:3.7-slim-buster
   
   Please let us know if you require any more information and how we can resolve this issue . We have also tried to upgrade the AIRFLOW version to 1.10.10 but no luck.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-728989521


   Yup, this will be fixed in 1.10.13. Already fixed in Master by https://github.com/apache/airflow/pull/11974


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-1013838979


   > @kaxil and all other friends - this is something that still happening in v2.2.3.
   
   This issue has long been closed. If you see similar issue (I assume with resource too old), and have some logs. please open a new issue with all the details because it's very likely this is completely unrelated issue.
   
   By specifying "this is something that still happening in v2.2.3" you basically do not tell - what happens, what logs, how often, is this an intermitten issue etc. There is no way we can even attempt to answer your question without knowing all the details.
   
   So if you have similar issue. Please open a new issue and provide all details - or better - if you are not sure if this is an airflow issue at all, open a [Github Discussion](https://github.com/apache/airflow/discussions) instead (still provide all the details there - maybe this is a K8S deployment issue that someone can help you solve there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] edikmkoyan commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
edikmkoyan commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-750942782


   it isn't fixed in 1.10.13


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gdtroszak commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
gdtroszak commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-726804090


   We're observing the same thing.
   
   Airflow version 1.10.9
   k8s API server version v1.15.12


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-750944320


   > it isn't fixed in 1.10.13
   
   https://github.com/apache/airflow/blob/1.10.13/setup.py#L313
   
   It is fixed, check the link I posted


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] itayB commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
itayB commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-1013835043


   @kaxil and all other friends - this is something that still happening in v2.2.3.
   The installed version is:
   ```
   $ pip freeze | grep kubernetes
   apache-airflow-providers-cncf-kubernetes==3.0.1
   kubernetes==21.7.0
   ```
   
   Do I still need to downgrade that much?
   I see that this limitation [has removed](https://github.com/apache/airflow/pull/18797) recently - will it solve the issue in the upcoming Airflow version?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-780200413


   > > > it isn't fixed in 1.10.13
   > > 
   > > 
   > > https://github.com/apache/airflow/blob/1.10.13/setup.py#L313
   > > It is fixed, check the link I posted
   > 
   > I used helmchart to helm install the airflow, didn't used the setup.py anyhow, I guess the docker image used has the wrong version ok the k8s client. I have the issue with the airflow 2.0.1.
   
   setup.py is used when you or the tool you use run `pip install apache-airflow`. The docker image seems to have the correct version, check below:
   
   ```
   ❯ docker run  -it apache/airflow:2.0.1-python3.6 bash
   airflow@646279d8d88a:/opt/airflow$ pip freeze | grep kubernetes
   apache-airflow-providers-cncf-kubernetes==1.0.0
   kubernetes==11.0.0
   airflow@646279d8d88a:/opt/airflow$
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] garacio commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
garacio commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-725969213


   Same in bare metal k8s installation
   
   ```
   [airflow-6fb4b8f58c-2jszc airflow] [2020-11-05 08:13:34,833] {kubernetes_executor.py:293} ERROR - Unknown error in KubernetesJobWatcher. Failing 
   [airflow-6fb4b8f58c-2jszc airflow] Traceback (most recent call last): 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/site-packages/airflow/executors/kubernetes_executor.py", line 287, in run 
   [airflow-6fb4b8f58c-2jszc airflow]     self.worker_uuid, self.kube_config) 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/site-packages/airflow/executors/kubernetes_executor.py", line 323, in _run 
   [airflow-6fb4b8f58c-2jszc airflow]     for event in list_worker_pods(): 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 177, in stream 
   [airflow-6fb4b8f58c-2jszc airflow]     status=obj['code'], reason=reason) 
   [airflow-6fb4b8f58c-2jszc airflow] kubernetes.client.exceptions.ApiException: (410) 
   [airflow-6fb4b8f58c-2jszc airflow] Reason: Expired: too old resource version: 42945421 (43412510) 
   [airflow-6fb4b8f58c-2jszc airflow]  
   [airflow-6fb4b8f58c-2jszc airflow] Process KubernetesJobWatcher-66040: 
   [airflow-6fb4b8f58c-2jszc airflow] Traceback (most recent call last): 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap 
   [airflow-6fb4b8f58c-2jszc airflow]     self.run() 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/site-packages/airflow/executors/kubernetes_executor.py", line 287, in run 
   [airflow-6fb4b8f58c-2jszc airflow]     self.worker_uuid, self.kube_config) 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/site-packages/airflow/executors/kubernetes_executor.py", line 323, in _run 
   [airflow-6fb4b8f58c-2jszc airflow]     for event in list_worker_pods(): 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 177, in stream 
   [airflow-6fb4b8f58c-2jszc airflow]     status=obj['code'], reason=reason) 
   [airflow-6fb4b8f58c-2jszc airflow] kubernetes.client.exceptions.ApiException: (410) 
   [airflow-6fb4b8f58c-2jszc airflow] Reason: Expired: too old resource version: 42945421 (43412510) 
   ```
   aitflow version 1.10.12


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-728984039


   @kaxil @ashb  -> looks like we should limit the k8s client to <12.0.0 IMHO. WDYT ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] bhavaniravi commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
bhavaniravi commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-728795933


   Same issue with airflow 1.10.10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-728984039


   @kaxil @ash -> looks like we should limit the k8s client to <12.0.0 IMHO. WDYT ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gdtroszak commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
gdtroszak commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-728921321


   [This issue](https://github.com/apache/airflow/issues/11841) seems to outline a workaround. It essentially amounts to downgrading the k8s client to `v11.0.0`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] edikmkoyan commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
edikmkoyan commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-779059848


   > > it isn't fixed in 1.10.13
   > 
   > https://github.com/apache/airflow/blob/1.10.13/setup.py#L313
   > 
   > It is fixed, check the link I posted
   
   I used helmchart to helm install the airflow, didn't used the setup.py anyhow, I guess the docker image used has the wrong version ok the k8s client. I have the issue with the airflow 2.0.1. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-724516981


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #12229:
URL: https://github.com/apache/airflow/issues/12229


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] garacio edited a comment on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
garacio edited a comment on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-725969213


   Same in bare metal k8s installation
   
   ```
   [airflow-6fb4b8f58c-2jszc airflow] [2020-11-05 08:13:34,833] {kubernetes_executor.py:293} ERROR - Unknown error in KubernetesJobWatcher. Failing 
   [airflow-6fb4b8f58c-2jszc airflow] Traceback (most recent call last): 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/site-packages/airflow/executors/kubernetes_executor.py", line 287, in run 
   [airflow-6fb4b8f58c-2jszc airflow]     self.worker_uuid, self.kube_config) 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/site-packages/airflow/executors/kubernetes_executor.py", line 323, in _run 
   [airflow-6fb4b8f58c-2jszc airflow]     for event in list_worker_pods(): 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 177, in stream 
   [airflow-6fb4b8f58c-2jszc airflow]     status=obj['code'], reason=reason) 
   [airflow-6fb4b8f58c-2jszc airflow] kubernetes.client.exceptions.ApiException: (410) 
   [airflow-6fb4b8f58c-2jszc airflow] Reason: Expired: too old resource version: 42945421 (43412510) 
   [airflow-6fb4b8f58c-2jszc airflow]  
   [airflow-6fb4b8f58c-2jszc airflow] Process KubernetesJobWatcher-66040: 
   [airflow-6fb4b8f58c-2jszc airflow] Traceback (most recent call last): 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap 
   [airflow-6fb4b8f58c-2jszc airflow]     self.run() 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/site-packages/airflow/executors/kubernetes_executor.py", line 287, in run 
   [airflow-6fb4b8f58c-2jszc airflow]     self.worker_uuid, self.kube_config) 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/site-packages/airflow/executors/kubernetes_executor.py", line 323, in _run 
   [airflow-6fb4b8f58c-2jszc airflow]     for event in list_worker_pods(): 
   [airflow-6fb4b8f58c-2jszc airflow]   File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 177, in stream 
   [airflow-6fb4b8f58c-2jszc airflow]     status=obj['code'], reason=reason) 
   [airflow-6fb4b8f58c-2jszc airflow] kubernetes.client.exceptions.ApiException: (410) 
   [airflow-6fb4b8f58c-2jszc airflow] Reason: Expired: too old resource version: 42945421 (43412510) 
   ```
   aitflow version 1.10.12
   k8s version v1.17.5


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gakhrejah commented on issue #12229: ERROR - Unknown error in KubernetesJobWatcher. Failing

Posted by GitBox <gi...@apache.org>.
gakhrejah commented on issue #12229:
URL: https://github.com/apache/airflow/issues/12229#issuecomment-728824927


   Hi All,
   
   Can anybody let me know , how we can resolve this issue . It seems like this is still an open issue with Airflow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org