You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/25 09:14:11 UTC

[GitHub] [airflow] alittlesliceoftom opened a new issue #13888: Kubernetes Launchers Log Check in Time is Not Configurable - Excess Logs Slow Airflow

alittlesliceoftom opened a new issue #13888:
URL: https://github.com/apache/airflow/issues/13888


   The Kube pod monitor runs every second (+overhead) as referenced below:
   
   https://github.com/apache/airflow/blob/31b956c6c22476d109c45c99d8a325c5c1e0fd45/airflow/kubernetes/pod_launcher.py#L137
   
   Whilst this is ok in many situations, any slow, long running job ends up with excessive logs, it would be useful to be able to pass through an argument to set the log refresh frequency. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #13888: Kubernetes Launchers Log Check in Time is Not Configurable - Excess Logs Slow Airflow

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #13888:
URL: https://github.com/apache/airflow/issues/13888#issuecomment-766670553


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alittlesliceoftom edited a comment on issue #13888: Kubernetes Launchers Log Check in Time is Not Configurable - Excess Logs Slow Airflow

Posted by GitBox <gi...@apache.org>.
alittlesliceoftom edited a comment on issue #13888:
URL: https://github.com/apache/airflow/issues/13888#issuecomment-891757648


   Hey @kdunee a very crude one for #14259, I just replace the command I use with one that runs an echo every 30 secs in addition to the main task.
   
   So I prepend this boilerplate to my kube pod calls...
   
   `cmd_boilerplate = "while true; do echo heartbeat; sleep 30; done &"`
   
   Then my KubePodCalls look like: 
   
   ```
   kubernetes_pod.KubernetesPodOperator(
           name="name",
           # ? The ID specified for the task.
           task_id="task_id",
           cmds=bash,
           arguments=[
               "-c",
               f"{cmd_boilerplate} do your thing here",
           ],
           get_logs=True,
           **default_kube_args,
       )
   ```
   
   Publishing a bad answer to the internet in hopes of getting a better one :D


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alittlesliceoftom commented on issue #13888: Kubernetes Launchers Log Check in Time is Not Configurable - Excess Logs Slow Airflow

Posted by GitBox <gi...@apache.org>.
alittlesliceoftom commented on issue #13888:
URL: https://github.com/apache/airflow/issues/13888#issuecomment-833490622


   Hi both, 
   Thanks so much for the response. Reading your point @jedcunningham , you're right and I don't get that log return. 
   
   I think I actually should have referenced https://github.com/apache/airflow/blob/31b956c6c22476d109c45c99d8a325c5c1e0fd45/airflow/kubernetes/pod_launcher.py#L156
   As this is what is called when get_logs = False. 
   
   I actually don't have the issue now as I now always get_logs=True. 
   
   That said I think there may be benefit in configurability on the sleep time in the condition that you are not getting logs, if only to keep the airflow logs a bit shorter. Typically I was using get_logs=False as a hack in response to #14259 . I now do a different hack so I can get the logs. 
   
   I think there is some benefit in making the 2s state update configurable , but it might be marginal. I suspect most users want to get logs in most situations. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alittlesliceoftom commented on issue #13888: Kubernetes Launchers Log Check in Time is Not Configurable - Excess Logs Slow Airflow

Posted by GitBox <gi...@apache.org>.
alittlesliceoftom commented on issue #13888:
URL: https://github.com/apache/airflow/issues/13888#issuecomment-891757648


   Hey @kdunee a very crude one for #14259, I just replace the command I use with one that runs an echo every 30 secs in addition to the main task.
   
   So I prepend this boilerplate to my kube pod calls...
   
   `cmd_boilerplate = "while true; do echo heartbeat; sleep 30; done &"`
   
   Then my KubePodCalls look like: 
   
   ```
   kubernetes_pod.KubernetesPodOperator(
           name="name",
           # ? The ID specified for the task.
           task_id="task_id",
           cmds=bash,
           arguments=[
               "-c",
               f"{cmd_boilerplate} do your thing here",
           ],
           get_logs=True,
           **default_kube_args,
       )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on issue #13888: Kubernetes Launchers Log Check in Time is Not Configurable - Excess Logs Slow Airflow

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #13888:
URL: https://github.com/apache/airflow/issues/13888#issuecomment-813712563


   @alittlesliceoftom, are you seeing a bunch of occurrences of this log line?
   
   https://github.com/apache/airflow/blob/c73052fcb1d39dd8fa4012721f025c375f67f72c/airflow/providers/cncf/kubernetes/utils/pod_launcher.py#L141
   
   We follow the log stream by passing `follow=True` to `read_namespaced_pod_log`, so unless the connection keeps getting reset, we shouldn't be hitting that sleep all that often.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #13888: Kubernetes Launchers Log Check in Time is Not Configurable - Excess Logs Slow Airflow

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #13888:
URL: https://github.com/apache/airflow/issues/13888#issuecomment-766670553


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jhtimmins commented on issue #13888: Kubernetes Launchers Log Check in Time is Not Configurable - Excess Logs Slow Airflow

Posted by GitBox <gi...@apache.org>.
jhtimmins commented on issue #13888:
URL: https://github.com/apache/airflow/issues/13888#issuecomment-833228966


   @alittlesliceoftom Following up on this. Are you still interested in working on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kdunee commented on issue #13888: Kubernetes Launchers Log Check in Time is Not Configurable - Excess Logs Slow Airflow

Posted by GitBox <gi...@apache.org>.
kdunee commented on issue #13888:
URL: https://github.com/apache/airflow/issues/13888#issuecomment-899438075


   Thanks @alittlesliceoftom! As bad at it is, I'm afraid it's the best answer on the internet and the only one that helped me ;) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kdunee commented on issue #13888: Kubernetes Launchers Log Check in Time is Not Configurable - Excess Logs Slow Airflow

Posted by GitBox <gi...@apache.org>.
kdunee commented on issue #13888:
URL: https://github.com/apache/airflow/issues/13888#issuecomment-887386830


   @alittlesliceoftom Hi, can you please share what hack are you using? I'm currently considering disabling `get_logs` to avoid https://github.com/apache/airflow/issues/14259? ;)
   
   > I now do a different hack so I can get the logs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org