You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/08 06:27:21 UTC

[GitHub] [airflow] EliMor commented on issue #17490: KubernetesJobOperator

EliMor commented on issue #17490:
URL: https://github.com/apache/airflow/issues/17490#issuecomment-894750630


   Hi there! 
   
   Thanks for your feedback. I admit I'd need to take a little bit more time to look at 'pod_template_file.' My memory is foggy but these trees look familiar to me. 
   
   To clarify there're a few things I wanted to ensure our use of yaml + Jinja would accomplish for free with the KJO. 
   
   1. We could pass in the location of the yaml template files just as we could for other templates  (template_searchpath)
   2. We could move away entirely from using Python objects for kube related things, I do not want to import k8s ever in a Dag. (as you noted!)
   3. We could pass in variables to the KJO to be rendered by Jinja in the yaml template
   4. Also with Jinja magic we could reuse **_multiple_** yaml templates to render a **_single_** Job (Pod) yaml file similar to how one would do for web work. 
   
   If 'pod_template_file' accomplishes this I'm a happy camper, albeit very confused.
   
   As far as why a Job and not a Pod, to my (limited) knowledge of kube, the extra abstraction of the 'Job' type also allows for parallelism out-of-the-box (See [Kube Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/) ). If I have a use case where I want 10X pods to run simultaneously would I need to surface that to the task level on an airflow Dag? Does that not consume more resources on the Airflow level than just letting Airflow manage a single Job abstraction as a single task and defer to kube to handle the pods? 
   
   Totally understand not wanting to add confusion. I'm confused more than half the time I try anything these days! 
   Homework assignment for me, reinvestigate the limitations of pod_template_file. 
   
   For one, I do recall also experiencing some bugs with how the logs were being forwarded from the pod to Airflow using the KPO. If the pod just slept for a minute or something and then completed, whatever it was that was tracking the pod for log streaming seemed to just drop and the task would never complete and then continue. 
   Entirely different issue, possibly resolved now! 
   
   If there's anywhere else I could offer some clarity or otherwise be helpful please let me know! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org