You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/10 18:17:48 UTC

[GitHub] [airflow] michalmisiewicz opened a new pull request #11406: Add option to enable TCP keepalive mechanism for communication with Kubernetes API

michalmisiewicz opened a new pull request #11406:
URL: https://github.com/apache/airflow/pull/11406


   When running Airflow on Cloud Kubernetes  Services like Azure Kubernetes Service, KubernetesExecutor can hangs indefinitely on pod submission due to idle connection time-out. For example in Azure Firewall, idle timeout is set to [4 minutes](https://docs.microsoft.com/en-us/azure/firewall/firewall-faq#what-is-the-tcp-idle-timeout-for-azure-firewall). Simmilar problem can be observe on AWS.
   
   [Kubernetes client](https://github.com/kubernetes-client/python/tree/master/kubernetes) under the hood is using urllib3 PoolManager which introduce connections reuse. On idle timeout, RST package is send back to client which cause urllib3 to hang indefinitely. 
   
   This PR introduced fix based on TCP keepalive mechanism. Unfortunately Kubernetes client does not support providing socket options when instantiating `CoreV1Api` instance.
   
   It was nightmare to find out why request were hanging. After all I have run fix for one month on production without errors. Now I can sleep peacefully... 🛌 
   
   Closes #10636 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #11406: Add option to enable TCP keepalive for communication with Kubernetes API

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #11406:
URL: https://github.com/apache/airflow/pull/11406#issuecomment-706719782


   > "please ping @michalmisiewicz or @dimberman in the PR if you want to modify this function"
   
   I would suggest putting link to this PR so others may find why the comment is there 😉 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalmisiewicz commented on pull request #11406: Add option to enable TCP keepalive for communication with Kubernetes API

Posted by GitBox <gi...@apache.org>.
michalmisiewicz commented on pull request #11406:
URL: https://github.com/apache/airflow/pull/11406#issuecomment-706592193


   @dimberman This problem is hard to recreate without real infrastructure. Do you have idea how it can be tested ?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11406: Add option to enable TCP keepalive for communication with Kubernetes API

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11406:
URL: https://github.com/apache/airflow/pull/11406#issuecomment-706594621


   [The Workflow run](https://github.com/apache/airflow/actions/runs/299511673) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalmisiewicz commented on pull request #11406: Add option to enable TCP keepalive for communication with Kubernetes API

Posted by GitBox <gi...@apache.org>.
michalmisiewicz commented on pull request #11406:
URL: https://github.com/apache/airflow/pull/11406#issuecomment-706590835


   @dimberman @turbaszek


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalmisiewicz commented on pull request #11406: Add option to enable TCP keepalive for communication with Kubernetes API

Posted by GitBox <gi...@apache.org>.
michalmisiewicz commented on pull request #11406:
URL: https://github.com/apache/airflow/pull/11406#issuecomment-707183163


   @potiuk thanks. `pre-commit` rocks 🚀.  CI is green now.
   @dimberman @turbaszek I've added reference to PR.
   
   Are we ready to go ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11406: Add option to enable TCP keepalive for communication with Kubernetes API

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11406:
URL: https://github.com/apache/airflow/pull/11406#issuecomment-706595031


   [The Workflow run](https://github.com/apache/airflow/actions/runs/299514753) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on pull request #11406: Add option to enable TCP keepalive for communication with Kubernetes API

Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #11406:
URL: https://github.com/apache/airflow/pull/11406#issuecomment-706605122


   Yeah that's fine. Maybe just add a comment saying "please ping @michalmisiewicz or @dimberman if you want to modify this function"


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on pull request #11406: Add option to enable TCP keepalive for communication with Kubernetes API

Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #11406:
URL: https://github.com/apache/airflow/pull/11406#issuecomment-706590962


   oooo coool!!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11406: Add option to enable TCP keepalive for communication with Kubernetes API

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11406:
URL: https://github.com/apache/airflow/pull/11406#issuecomment-706594523


   Yeah. I'd imagine it's difficult to test - but there are some static check problems. @michalmisiewicz  - I recommend to install pre-commits :) (described in STATIC_CHECKS.rst)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman merged pull request #11406: Add option to enable TCP keepalive for communication with Kubernetes API

Posted by GitBox <gi...@apache.org>.
dimberman merged pull request #11406:
URL: https://github.com/apache/airflow/pull/11406


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman edited a comment on pull request #11406: Add option to enable TCP keepalive for communication with Kubernetes API

Posted by GitBox <gi...@apache.org>.
dimberman edited a comment on pull request #11406:
URL: https://github.com/apache/airflow/pull/11406#issuecomment-706605122


   Yeah that's fine. Maybe just add a comment saying "please ping @michalmisiewicz or @dimberman in the PR if you want to modify this function"


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org