You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/08 00:58:53 UTC

[GitHub] [airflow] takersk opened a new issue, #24309: Intermittent failures when deleting pods

takersk opened a new issue, #24309:
URL: https://github.com/apache/airflow/issues/24309

   ### Apache Airflow version
   
   2.3.1
   
   ### What happened
   
   Intermittent error when deleting pods after pod `state=SUCCEEDED`
   
   ```
   [2022-06-08, 07:49:40 KST] {kubernetes_pod.py:434} INFO - Deleting pod: hive-kcai-dim-gift-product-cat-4d6acdf27cab46edbb7652fc8d224c90
   [2022-06-08, 07:49:40 KST] {taskinstance.py:1890} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 390, in execute
       follow=True,
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 245, in fetch_container_logs
       last_log_time = consume_logs(since_time=last_log_time, follow=follow)
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 221, in consume_logs
       follow=follow,
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 324, in wrapped_f
       return self(f, *args, **kw)
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 404, in __call__
       do = self.iter(retry_state=retry_state)
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 360, in iter
       raise retry_exc.reraise()
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 193, in reraise
       raise self.last_attempt.result()
     File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
       return self.__get_result()
     File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
       raise self._exception
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 407, in __call__
       result = fn(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 332, in read_pod_logs
       **additional_kwargs,
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 23747, in read_namespaced_pod_log
       return self.read_namespaced_pod_log_with_http_info(name, namespace, **kwargs)  # noqa: E501
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 23880, in read_namespaced_pod_log_with_http_info
       collection_formats=collection_formats)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
       _preload_content, _request_timeout, _host)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
       _request_timeout=_request_timeout)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 377, in request
       headers=headers)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 244, in GET
       query_params=query_params)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 234, in request
       raise ApiException(http_resp=r)
   kubernetes.client.exceptions.ApiException: (422)
   Reason: Unprocessable Entity
   HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 07 Jun 2022 22:49:40 GMT', 'Content-Length': '490'})
   HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"PodLogOptions \\"hive-kcai-dim-gift-product-cat-4d6acdf27cab46edbb7652fc8d224c90\\" is invalid: sinceSeconds: Invalid value: -64: must be greater than 0","reason":"Invalid","details":{"name":"hive-kcai-dim-gift-product-cat-4d6acdf27cab46edbb7652fc8d224c90","kind":"PodLogOptions","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: -64: must be greater than 0","field":"sinceSeconds"}]},"code":422}\n'
   
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 403, in execute
       remote_pod=remote_pod,
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 426, in cleanup
       f'Pod {pod and pod.metadata.name} returned a failure:{error_message}\n{remote_pod}'
   airflow.exceptions.AirflowException: Pod hive-kcai-dim-gift-product-cat-4d6acdf27cab46edbb7652fc8d224c90 returned a failure:
   None
   [2022-06-08, 07:49:40 KST] {taskinstance.py:1401} INFO - Marking task as FAILED. dag_id=common_kudu_to_hdfs_dag, task_id=hive_kcai_dim_gift_product_cate_task, execution_date=20220607T195742, start_date=20220607T224848, end_date=20220607T224940
   [2022-06-08, 07:49:40 KST] {standard_task_runner.py:97} ERROR - Failed to execute job 970 for task hive_kcai_dim_gift_product_cate_task (Pod hive-kcai-dim-gift-product-cat-4d6acdf27cab46edbb7652fc8d224c90 returned a failure:
   None; 56)
   ```
   
   ### What you think should happen instead
   
   Marking task as SUCCESS
   
   ### How to reproduce
   
   see above
   
   ### Operating System
   
   k8s version : v1.17.12
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==3.4.0
   apache-airflow-providers-celery==2.1.4
   apache-airflow-providers-cncf-kubernetes==4.0.2
   apache-airflow-providers-docker==2.7.0
   apache-airflow-providers-elasticsearch==3.0.3
   apache-airflow-providers-ftp==2.1.2
   apache-airflow-providers-google==7.0.0
   apache-airflow-providers-grpc==2.0.4
   apache-airflow-providers-hashicorp==2.2.0
   apache-airflow-providers-http==2.1.2
   apache-airflow-providers-imap==2.2.3
   apache-airflow-providers-microsoft-azure==3.9.0
   apache-airflow-providers-mysql==2.2.3
   apache-airflow-providers-odbc==2.0.4
   apache-airflow-providers-postgres==4.1.0
   apache-airflow-providers-redis==2.0.4
   apache-airflow-providers-sendgrid==2.0.4
   apache-airflow-providers-sftp==2.6.0
   apache-airflow-providers-slack==4.2.3
   apache-airflow-providers-sqlite==2.1.3
   apache-airflow-providers-ssh==2.4.4
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jjournet commented on issue #24309: Intermittent failures when deleting pods

Posted by GitBox <gi...@apache.org>.
jjournet commented on issue #24309:
URL: https://github.com/apache/airflow/issues/24309#issuecomment-1229619386

   @takersk 
   so after configuring NTP, the issue was gone. It came back a few days later and I realized the servers I added to my cluster were not configured for NTP. So it confirms that the issue for me was related to time synchro.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] takersk commented on issue #24309: Intermittent failures when deleting pods

Posted by GitBox <gi...@apache.org>.
takersk commented on issue #24309:
URL: https://github.com/apache/airflow/issues/24309#issuecomment-1229776557

   @potiuk 
   In addition, v2.1.4 does not occur. only v2.2~ 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jjournet commented on issue #24309: Intermittent failures when deleting pods

Posted by GitBox <gi...@apache.org>.
jjournet commented on issue #24309:
URL: https://github.com/apache/airflow/issues/24309#issuecomment-1172706028

   @potiuk I had unrelated warning in the scheduler log (scheduled time in the future), and realized my NTP was not configured, and I had almost 1min difference between some of the nodes.
   I configured NTP and now all my nodes are within a few ms of each other. I'll test and check if it fixes the issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #24309: Intermittent failures when deleting pods

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #24309: Intermittent failures when deleting pods
URL: https://github.com/apache/airflow/issues/24309


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] heliharry commented on issue #24309: Intermittent failures when deleting pods

Posted by GitBox <gi...@apache.org>.
heliharry commented on issue #24309:
URL: https://github.com/apache/airflow/issues/24309#issuecomment-1151935806

   Hit similar issue:
   
   [2022-06-10, 04:37:34 UTC] {pod_manager.py:253} WARNING - Pod aaa-xxx-baef53b3f96440a8abfc171d3a683e3d log read interrupted but container base still running
   [2022-06-10, 04:37:38 UTC] {kubernetes_pod.py:433} INFO - Deleting pod: aaa-xxx-baef53b3f96440a8abfc171d3a683e3d
   [2022-06-10, 04:37:38 UTC] {taskinstance.py:1889} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 389, in execute
       follow=True,
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 244, in fetch_container_logs
       last_log_time = consume_logs(since_time=last_log_time, follow=follow)
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 221, in consume_logs
       follow=follow,
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 324, in wrapped_f
       return self(f, *args, **kw)
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 404, in __call__
       do = self.iter(retry_state=retry_state)
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 360, in iter
       raise retry_exc.reraise()
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 193, in reraise
       raise self.last_attempt.result()
     File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
       return self.__get_result()
     File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
       raise self._exception
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 407, in __call__
       result = fn(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 331, in read_pod_logs
       **additional_kwargs,
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 23747, in read_namespaced_pod_log
       return self.read_namespaced_pod_log_with_http_info(name, namespace, **kwargs)  # noqa: E501
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 23880, in read_namespaced_pod_log_with_http_info
       collection_formats=collection_formats)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
       _preload_content, _request_timeout, _host)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
       _request_timeout=_request_timeout)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 377, in request
       headers=headers)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 244, in GET
       query_params=query_params)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 234, in request
       raise ApiException(http_resp=r)
   kubernetes.client.exceptions.ApiException: (422)
   Reason: Unprocessable Entity
   HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Fri, 10 Jun 2022 04:37:38 GMT', 'Content-Length': '462'})
   HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"PodLogOptions \\"aaa-xxx-baef53b3f96440a8abfc171d3a683e3d\\" is invalid: sinceSeconds: Invalid value: -2: must be greater than 0","reason":"Invalid","details":{"name":"aaa-xxx-baef53b3f96440a8abfc171d3a683e3d","kind":"PodLogOptions","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: -2: must be greater than 0","field":"sinceSeconds"}]},"code":422}\n'
   
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 402, in execute
       remote_pod=remote_pod,
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 425, in cleanup
       f'Pod {pod and pod.metadata.name} returned a failure:{error_message}\n{remote_pod}'
   airflow.exceptions.AirflowException: Pod aaa-xxx-baef53b3f96440a8abfc171d3a683e3d returned a failure:
   None
   [2022-06-10, 04:37:38 UTC] {taskinstance.py:1400} INFO - Marking task as FAILED. dag_id=xxx_branch, task_id=aaa_xxx, execution_date=20220610T042959, start_date=20220610T043629, end_date=20220610T043738
   [2022-06-10, 04:37:38 UTC] {standard_task_runner.py:97} ERROR - Failed to execute job 9 for task aaa_xxx (Pod aaa-xxx-baef53b3f96440a8abfc171d3a683e3d returned a failure:
   None; 115)
   [2022-06-10, 04:37:38 UTC] {local_task_job.py:156} INFO - Task exited with return code 1
   [2022-06-10, 04:37:38 UTC] {local_task_job.py:273} INFO - 0 downstream tasks scheduled from follow-on schedule check


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jjournet commented on issue #24309: Intermittent failures when deleting pods

Posted by GitBox <gi...@apache.org>.
jjournet commented on issue #24309:
URL: https://github.com/apache/airflow/issues/24309#issuecomment-1171576419

   hello,
   
   same issue at the end of a task, airflow 2.3.0
   
   ```
   [2022-06-30, 16:04:36 UTC] {kubernetes_pod.py:433} INFO - Deleting pod: airflow-test-pod-3528101c2e404b3dbf2cae3531523af7
   [2022-06-30, 16:04:36 UTC] {taskinstance.py:1889} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 389, in execute
       follow=True,
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 244, in fetch_container_logs
       last_log_time = consume_logs(since_time=last_log_time, follow=follow)
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 221, in consume_logs
       follow=follow,
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 324, in wrapped_f
       return self(f, *args, **kw)
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 404, in __call__
       do = self.iter(retry_state=retry_state)
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 360, in iter
       raise retry_exc.reraise()
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 193, in reraise
       raise self.last_attempt.result()
     File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
       return self.__get_result()
     File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
       raise self._exception
     File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 407, in __call__
       result = fn(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 331, in read_pod_logs
       **additional_kwargs,
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 23747, in read_namespaced_pod_log
       return self.read_namespaced_pod_log_with_http_info(name, namespace, **kwargs)  # noqa: E501
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 23880, in read_namespaced_pod_log_with_http_info
       collection_formats=collection_formats)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
       _preload_content, _request_timeout, _host)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
       _request_timeout=_request_timeout)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 377, in request
       headers=headers)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 244, in GET
       query_params=query_params)
     File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 234, in request
       raise ApiException(http_resp=r)
   kubernetes.client.exceptions.ApiException: (422)
   Reason: Unprocessable Entity
   HTTP response headers: HTTPHeaderDict({'Audit-Id': 'f96bf433-3e0a-48d8-b4eb-a6559c4980b2', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Thu, 30 Jun 2022 16:04:34 GMT', 'Content-Length': '464'})
   HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"PodLogOptions \\"airflow-test-pod-3528101c2e404b3dbf2cae3531523af7\\" is invalid: sinceSeconds: Invalid value: -105: must be greater than 0","reason":"Invalid","details":{"name":"airflow-test-pod-3528101c2e404b3dbf2cae3531523af7","kind":"PodLogOptions","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: -105: must be greater than 0","field":"sinceSeconds"}]},"code":422}\n'
   
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 402, in execute
       remote_pod=remote_pod,
     File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 425, in cleanup
       f'Pod {pod and pod.metadata.name} returned a failure:{error_message}\n{remote_pod}'
   airflow.exceptions.AirflowException: Pod airflow-test-pod-3528101c2e404b3dbf2cae3531523af7 returned a failure:
   None
   [2022-06-30, 16:04:36 UTC] {taskinstance.py:1400} INFO - Marking task as FAILED. dag_id=Amadeus_MIDT, task_id=input_2_raw, execution_date=20220630T155812, start_date=20220630T160428, end_date=20220630T160436
   [2022-06-30, 16:04:36 UTC] {standard_task_runner.py:97} ERROR - Failed to execute job 28 for task input_2_raw (Pod airflow-test-pod-3528101c2e404b3dbf2cae3531523af7 returned a failure:
   None; 63)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] takersk commented on issue #24309: Intermittent failures when deleting pods

Posted by GitBox <gi...@apache.org>.
takersk commented on issue #24309:
URL: https://github.com/apache/airflow/issues/24309#issuecomment-1229773344

   @jjournet 
   thank you answer!! 
   I am using kubernetespodoperator, is ntp applied to both cluster and container image?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] takersk commented on issue #24309: Intermittent failures when deleting pods

Posted by GitBox <gi...@apache.org>.
takersk commented on issue #24309:
URL: https://github.com/apache/airflow/issues/24309#issuecomment-1229612180

   @potiuk 
   I checked that the time is synchronized on all machines.
   
   Are there any bug fixes after v2.3.1?
   
   @jjournet 
   Is the issue fixed after ntp sync?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #24309: Intermittent failures when deleting pods

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #24309:
URL: https://github.com/apache/airflow/issues/24309#issuecomment-1230384597

   converting into discussion 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #24309: Intermittent failures when deleting pods

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #24309:
URL: https://github.com/apache/airflow/issues/24309#issuecomment-1172380949

   Do you have time synchronized on all your machines? It looks like your servers do not have NTP or similar and some of the machines there might lag behind @takersk @jjournet @heliharry ?
   
   Can you please verify and confirm if my guess why it can be is right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #24309: Intermittent failures when deleting pods

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #24309:
URL: https://github.com/apache/airflow/issues/24309#issuecomment-1149329298

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #24309: Intermittent failures when deleting pods

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #24309:
URL: https://github.com/apache/airflow/issues/24309#issuecomment-1230384063

   > @potiuk In addition, v2.1.4 does not occur. only v2.2~
   
   Clearly our time is not synchronized if you continue getting those. 
   
   I believe tou shoudl not configure chrony in your images. This does not matter. You should synchronise time in the HOSTS of your k8s cluster, but I also believe it might be different in different clusters so you need to consult your K8S documentation and possibly you shoudl search google and check various guildelines about it and find one that is good for your case. I just run simple query and found https://medium.com/goglides/ntp-in-a-kubernetes-cluster-4c6c3e5c0c14  - but this is a random link that migt or might not be good for your cluster. So do your own search and investigation.
   
   The fact that it does not occur in 2.1 does not matter. We are using different k8s libraries and clients in different versions of Airlfow, so it could be because the library is more picky about it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jjournet commented on issue #24309: Intermittent failures when deleting pods

Posted by GitBox <gi...@apache.org>.
jjournet commented on issue #24309:
URL: https://github.com/apache/airflow/issues/24309#issuecomment-1230331485

   ntp is only applied to hosts, I don't have any ntp configuration in the containers.
   It actually doesn't make sense: you are running on containers, which means the kernel is shared, hence the time is shared too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org