You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/08 21:23:57 UTC

[GitHub] [airflow] mfjackson opened a new issue #13579: Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

mfjackson opened a new issue #13579:
URL: https://github.com/apache/airflow/issues/13579


   **Apache Airflow version**: 2.0.0
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): 1.19.4
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: Google Cloud Platform/GKE
   
   **What happened**:
   
   I successfully cleared the state of a failed task using the graph view UI, but when I attempted to re-run the cleared task instance in graph view manually by selecting the task instance and clicking "Run", I received the following error:
   
   ```
   Something bad has happened.
   Please consider letting us know by creating a bug report using GitHub.
   
   Python version: 3.8.7
   Airflow version: 2.0.0
   Node: mr_node
   -------------------------------------------------------------------------------
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
       response = self.full_dispatch_request()
     File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
       rv = self.handle_user_exception(e)
     File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
       reraise(exc_type, exc_value, tb)
     File "/home/airflow/.local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
       raise value
     File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
       rv = self.dispatch_request()
     File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
       return self.view_functions[rule.endpoint](**req.view_args)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/auth.py", line 34, in decorated
       return func(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/decorators.py", line 60, in wrapper
       return f(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/views.py", line 1366, in run
       executor.start()
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 493, in start
       raise AirflowException("Could not get scheduler_job_id")
   airflow.exceptions.AirflowException: Could not get scheduler_job_id
   ```
   
   **What you expected to happen**:
   
   I expected the task instance to be scheduled and begin running again.
   
   **How to reproduce it**:
   
   Configure Airflow 2.0.0 to run on GCP, clear the state of a finished task instance using the UI (I was able to reproduce the error on a task instance maked "Success" as well), and again use the Web UI to "Run" the task.
   
   **Anything else we need to know**:
   
   One important item to note is that when I _only_ clear task instance and do not attempt to run it manually using the UI, the task does queue and is placing in a `running` state, but quickly fails with the following error:
   
   ```
   [2021-01-08 21:08:16,140] {taskinstance.py:1396} ERROR - (0)
   Reason: Handshake status 500 Internal Server Error
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/ws_client.py", line 296, in websocket_call
       client = WSClient(configuration, get_websocket_url(url), headers, capture_all)
     File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/ws_client.py", line 94, in __init__
       self.sock.connect(url, header=header)
     File "/home/airflow/.local/lib/python3.8/site-packages/websocket/_core.py", line 226, in connect
       self.handshake_response = handshake(self.sock, *addrs, **options)
     File "/home/airflow/.local/lib/python3.8/site-packages/websocket/_handshake.py", line 80, in handshake
       status, resp = _get_resp_headers(sock)
     File "/home/airflow/.local/lib/python3.8/site-packages/websocket/_handshake.py", line 165, in _get_resp_headers
       raise WebSocketBadStatusException("Handshake status %d %s", status, status_message, resp_headers)
   websocket._exceptions.WebSocketBadStatusException: Handshake status 500 Internal Server Error
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1086, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1260, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1300, in _execute_task
       result = task_copy.execute(context=context)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 335, in execute
       final_state, result = self.handle_pod_overlap(
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 375, in handle_pod_overlap
       final_state, result = self.monitor_launched_pod(launcher, pod)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 513, in monitor_launched_pod
       (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/kubernetes/pod_launcher.py", line 151, in monitor_pod
       result = self._extract_xcom(pod)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/kubernetes/pod_launcher.py", line 246, in _extract_xcom
       resp = kubernetes_stream(
     File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/stream.py", line 35, in stream
       return func(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 841, in connect_get_namespaced_pod_exec
       (data) = self.connect_get_namespaced_pod_exec_with_http_info(name, namespace, **kwargs)  # noqa: E501
     File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 927, in connect_get_namespaced_pod_exec_with_http_info
       return self.api_client.call_api(
     File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 340, in call_api
       return self.__call_api(resource_path, method,
     File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 172, in __call_api
       response_data = self.request(
     File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/stream.py", line 30, in _intercept_request_call
       return ws_client.websocket_call(config, *args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/stream/ws_client.py", line 302, in websocket_call
       raise ApiException(status=0, reason=str(e))
   kubernetes.client.rest.ApiException: (0)
   Reason: Handshake status 500 Internal Server Error
   ```
   
   Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13579: Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13579:
URL: https://github.com/apache/airflow/issues/13579#issuecomment-786687665


   Duplicate of https://github.com/apache/airflow/issues/13805 and closed by https://github.com/apache/airflow/pull/14160


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13579: Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13579:
URL: https://github.com/apache/airflow/issues/13579#issuecomment-764793212


   Looks like a bug with Kubernetes Executor. Related issue: https://github.com/apache/airflow/issues/13805
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13579: Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13579:
URL: https://github.com/apache/airflow/issues/13579#issuecomment-764793212


   Looks like a bug with Kubernetes Executor. Related issue: https://github.com/apache/airflow/issues/13805
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mfjackson commented on issue #13579: Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

Posted by GitBox <gi...@apache.org>.
mfjackson commented on issue #13579:
URL: https://github.com/apache/airflow/issues/13579#issuecomment-764799032


   @kaxil I am also using 11.0.0 of the kubernetes python client


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13579: Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13579:
URL: https://github.com/apache/airflow/issues/13579#issuecomment-767700027


   If you "clear" the task - -Scheduler will pick it up and run. Why did you click on "Run" again.
   
   Can you try only clearing and see what happens please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13579: Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13579:
URL: https://github.com/apache/airflow/issues/13579#issuecomment-763194265


   what's the version of kubernetes python client:
   
   ```
   pip show kubernetes
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vlinhdh16 commented on issue #13579: Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

Posted by GitBox <gi...@apache.org>.
vlinhdh16 commented on issue #13579:
URL: https://github.com/apache/airflow/issues/13579#issuecomment-764461935


   I got this bug as well, running airflow 2.0.0. `pip show kubernetes` returns version 11.0.0.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mfjackson commented on issue #13579: Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

Posted by GitBox <gi...@apache.org>.
mfjackson commented on issue #13579:
URL: https://github.com/apache/airflow/issues/13579#issuecomment-764799032


   @kaxil I am also using 11.0.0 of the kubernetes python client


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #13579: Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #13579:
URL: https://github.com/apache/airflow/issues/13579


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vlinhdh16 commented on issue #13579: Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

Posted by GitBox <gi...@apache.org>.
vlinhdh16 commented on issue #13579:
URL: https://github.com/apache/airflow/issues/13579#issuecomment-764461935


   I got this bug as well, running airflow 2.0.0. `pip show kubernetes` returns version 11.0.0.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mfjackson commented on issue #13579: Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

Posted by GitBox <gi...@apache.org>.
mfjackson commented on issue #13579:
URL: https://github.com/apache/airflow/issues/13579#issuecomment-776933380


   Sorry for the delayed response!
   
   @kaxil when I clear the task instance state and allow the scheduler to pick up the task instance again it runs just fine.
   
   @dimberman I am setting `is_delete_operator_pod` as `false` in my KPOs. When clearing the task instance and allowing the scheduler to pick up the task again, everything works as expected. It was only when I clicked "Run" immediately after clearing the task that I ran into this error.
   
   Given that this is expected behavior and it seems like I was just using the "Run" button incorrectly, we can probably close this issue.
   
   Thanks for your help!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #13579: Re-running KubernetesPodOperator task results in AirflowException using Airflow 2.0

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #13579:
URL: https://github.com/apache/airflow/issues/13579#issuecomment-767736791


   Hi @mfjackson,
   
   So I think there are two different errors here.
   
   The first error is that the Run button is broken for the KubernetesExecutor (as noted in #13805). We should have a fix for that soon.
   
   The second error is specific to the k8sPodOperator.
   
   Basically the "500 bad handshake" return means that kubernetes is trying to speak to a pod that is not currently running. Usually the sidecar container for XCOM stays running until the worker tells it to stop, regardless of task status.
   
   So my guess is for some reason the failed task pod is still up even though it has completed. You might have "is_delete_operator_pod" set to false in either your kubernetespodoperator or in your airflow.cfg. 
   
   I think that deleting the task in the UI, but not deleting the pod in k8s, might be causing this confusion. Can you try deleting the pod and then trying again?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org