You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/17 05:20:33 UTC

[GitHub] [airflow] rmanvar-indeed opened a new issue #13129: Reattach to kubernetes pod only if it's running

rmanvar-indeed opened a new issue #13129:
URL: https://github.com/apache/airflow/issues/13129


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   **Description**
   
   From https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/operators/kubernetes_pod_operator.py#L291 Airflow K8PodOperator tries to re-attach to pod based on labels, but this fails if pod isn't in running state which for us is often the case because reason for previous trial to fail is pod not being in a running / healthy state. 
   Returned error for K8 api is `{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container "base" in pod "XXXXXX" is terminated","reason":"BadRequest","code":400}` 
   
   **Use case / motivation**
   
   <!-- What do you want to happen?
   
   Rather than telling us how you might implement this solution, try to take a
   step back and describe what you are trying to achieve.
   
   -->
   Airflow will try to attach only if the Pod is in running state. 
   
   **Are you willing to submit a PR?**
   
   <!--- We accept contributions! -->
   Yup. sound like an easy fix where we replace the `client.list_namespaced_pod` with some other modified version which filters pods in running state. 
   
   **Related Issues**
   
   <!-- Is there currently another issue associated with this? -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal edited a comment on issue #13129: Reattach to kubernetes pod only if it's running

Posted by GitBox <gi...@apache.org>.
eladkal edited a comment on issue #13129:
URL: https://github.com/apache/airflow/issues/13129#issuecomment-1035970929


   This issue is reported against Airflow 1.10 which is EOL and possibly resolved by https://github.com/apache/airflow/pull/11368
   If the issue still happens in latest Airflow version please open a new github issue with reproduce steps


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] krishanj20 commented on issue #13129: Reattach to kubernetes pod only if it's running

Posted by GitBox <gi...@apache.org>.
krishanj20 commented on issue #13129:
URL: https://github.com/apache/airflow/issues/13129#issuecomment-824891897


   Hi, I also have the same issue. In my case I'm using the init container, which leads to the pod being in a podinitialising state with a carbon copy of @ziliangpeng has. Airflow knowledge is the most amazing, but happy to help where I can. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal edited a comment on issue #13129: Reattach to kubernetes pod only if it's running

Posted by GitBox <gi...@apache.org>.
eladkal edited a comment on issue #13129:
URL: https://github.com/apache/airflow/issues/13129#issuecomment-1035970929


   This issue is reported against Airflow 1.10 which is EOL.
   If the issue still happens in latest Airflow version please open a new github issue with reproduce steps


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rmanvar-indeed commented on issue #13129: Reattach to kubernetes pod only if it's running

Posted by GitBox <gi...@apache.org>.
rmanvar-indeed commented on issue #13129:
URL: https://github.com/apache/airflow/issues/13129#issuecomment-826792933


   I think https://github.com/apache/airflow/pull/11368 should resolve cases when the pod tried to be re-attached to has failed. ( the PR was added in airflow 1.10.13 ) However, for cases when pod is still in initializing phase, this issue would persist. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ziliangpeng commented on issue #13129: Reattach to kubernetes pod only if it's running

Posted by GitBox <gi...@apache.org>.
ziliangpeng commented on issue #13129:
URL: https://github.com/apache/airflow/issues/13129#issuecomment-789324654


   I got a similar issue where it tries to re-attach to a pod while it's initializing..
   
   ```
   [2021-03-02 21:22:00,796] {taskinstance.py:1455} ERROR - (400)
   Reason: Bad Request
   HTTP response headers: HTTPHeaderDict({'Audit-Id': '23926992-dc19-4a0c-8d4b-8f17cdc91bd2', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 02 Mar 2021 21:22:00 GMT', 'Content-Length': '280'})
   HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \\"base\\" in pod \\"afj-prem-range-image-test-job-utils-20210302102909-dump-da7wf1s.0cf16bbe75a0401786514f8629164efe\\" is waiting to start: ContainerCreating","reason":"BadRequest","code":400}\n'
   Traceback (most recent call last):
     File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task
       result = task_copy.execute(context=context)
     File "/usr/local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 335, in execute
       labels, try_numbers_match, launcher, pod_list.items[0]
     File "/usr/local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 374, in handle_pod_overlap
       final_state, result = self.monitor_launched_pod(launcher, pod)
     File "/usr/local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 513, in monitor_launched_pod
       (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
     File "/usr/local/lib/python3.7/site-packages/airflow/kubernetes/pod_launcher.py", line 132, in monitor_pod
       logs = self.read_pod_logs(pod, timestamps=True, since_seconds=read_logs_since_sec)
     File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 329, in wrapped_f
       return self.call(f, *args, **kw)
     File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 409, in call
       do = self.iter(retry_state=retry_state)
     File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 368, in iter
       raise retry_exc.reraise()
     File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 186, in reraise
       raise self.last_attempt.result()
     File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
       return self.__get_result()
     File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
       raise self._exception
     File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 412, in call
       result = fn(*args, **kwargs)
     File "/usr/local/lib/python3.7/site-packages/airflow/kubernetes/pod_launcher.py", line 222, in read_pod_logs
       **additional_kwargs,
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 19199, in read_namespaced_pod_log
       (data) = self.read_namespaced_pod_log_with_http_info(name, namespace, **kwargs)  # noqa: E501
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 19305, in read_namespaced_pod_log_with_http_info
       collection_formats=collection_formats)
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 345, in call_api
       _preload_content, _request_timeout)
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 176, in __call_api
       _request_timeout=_request_timeout)
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 366, in request
       headers=headers)
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 241, in GET
       query_params=query_params)
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 231, in request
       raise ApiException(http_resp=r)
   kubernetes.client.rest.ApiException: (400)
   Reason: Bad Request
   HTTP response headers: HTTPHeaderDict({'Audit-Id': '23926992-dc19-4a0c-8d4b-8f17cdc91bd2', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 02 Mar 2021 21:22:00 GMT', 'Content-Length': '280'})
   HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \\"base\\" in pod \\"afj-prem-range-image-test-job-utils-20210302102909-dump-da7wf1s.0cf16bbe75a0401786514f8629164efe\\" is waiting to start: ContainerCreating","reason":"BadRequest","code":400}\n'
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal closed issue #13129: Reattach to kubernetes pod only if it's running

Posted by GitBox <gi...@apache.org>.
eladkal closed issue #13129:
URL: https://github.com/apache/airflow/issues/13129


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] krishanj20 edited a comment on issue #13129: Reattach to kubernetes pod only if it's running

Posted by GitBox <gi...@apache.org>.
krishanj20 edited a comment on issue #13129:
URL: https://github.com/apache/airflow/issues/13129#issuecomment-824891897


   Hi, I also have the same issue. In my case I'm using the init container, which leads to the pod being in a podinitialising state with a carbon copy of @ziliangpeng has. My Airflow knowledge isn't the most amazing, but happy to help where I can. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ziliangpeng edited a comment on issue #13129: Reattach to kubernetes pod only if it's running

Posted by GitBox <gi...@apache.org>.
ziliangpeng edited a comment on issue #13129:
URL: https://github.com/apache/airflow/issues/13129#issuecomment-789324654


   I got a similar issue where it tries to re-attach to a pod while it's initializing..
   
   ```
   [2021-03-02 21:22:00,796] {taskinstance.py:1455} ERROR - (400)
   Reason: Bad Request
   HTTP response headers: HTTPHeaderDict({'Audit-Id': '23926992-dc19-4a0c-8d4b-8f17cdc91bd2', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 02 Mar 2021 21:22:00 GMT', 'Content-Length': '280'})
   HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \\"base\\" in pod \\"afj-prem-range-image-test-job-utils-20210302102909-dump-da7wf1s.0cf16bbe75a0401786514f8629164efe\\" is waiting to start: ContainerCreating","reason":"BadRequest","code":400}\n'
   Traceback (most recent call last):
     File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task
       result = task_copy.execute(context=context)
     File "/usr/local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 335, in execute
       labels, try_numbers_match, launcher, pod_list.items[0]
     File "/usr/local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 374, in handle_pod_overlap
       final_state, result = self.monitor_launched_pod(launcher, pod)
     File "/usr/local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 513, in monitor_launched_pod
       (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
     File "/usr/local/lib/python3.7/site-packages/airflow/kubernetes/pod_launcher.py", line 132, in monitor_pod
       logs = self.read_pod_logs(pod, timestamps=True, since_seconds=read_logs_since_sec)
     File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 329, in wrapped_f
       return self.call(f, *args, **kw)
     File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 409, in call
       do = self.iter(retry_state=retry_state)
     File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 368, in iter
       raise retry_exc.reraise()
     File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 186, in reraise
       raise self.last_attempt.result()
     File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
       return self.__get_result()
     File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
       raise self._exception
     File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 412, in call
       result = fn(*args, **kwargs)
     File "/usr/local/lib/python3.7/site-packages/airflow/kubernetes/pod_launcher.py", line 222, in read_pod_logs
       **additional_kwargs,
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 19199, in read_namespaced_pod_log
       (data) = self.read_namespaced_pod_log_with_http_info(name, namespace, **kwargs)  # noqa: E501
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 19305, in read_namespaced_pod_log_with_http_info
       collection_formats=collection_formats)
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 345, in call_api
       _preload_content, _request_timeout)
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 176, in __call_api
       _request_timeout=_request_timeout)
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 366, in request
       headers=headers)
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 241, in GET
       query_params=query_params)
     File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 231, in request
       raise ApiException(http_resp=r)
   kubernetes.client.rest.ApiException: (400)
   Reason: Bad Request
   HTTP response headers: HTTPHeaderDict({'Audit-Id': '23926992-dc19-4a0c-8d4b-8f17cdc91bd2', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 02 Mar 2021 21:22:00 GMT', 'Content-Length': '280'})
   HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \\"base\\" in pod \\"afj-prem-range-image-test-job-utils-20210302102909-dump-da7wf1s.0cf16bbe75a0401786514f8629164efe\\" is waiting to start: ContainerCreating","reason":"BadRequest","code":400}\n'
   
   ```
   
   A fix to this would be nice.
   
   @rmanvar-indeed are you still working on a fix?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #13129: Reattach to kubernetes pod only if it's running

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #13129:
URL: https://github.com/apache/airflow/issues/13129#issuecomment-1035970929


   This issue is reported against Airflow 1.10 which is EOL.
   If the issue still happens in latest Airflow version please open a new github issue with repreduce steps


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org