You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/13 13:14:50 UTC

[GitHub] [airflow] lboudard edited a comment on issue #17490: KubernetesJobOperator

lboudard edited a comment on issue #17490:
URL: https://github.com/apache/airflow/issues/17490#issuecomment-898448207


   I agree on this subject, currently pod operator is missing some very handy features that [kubernetes job controller](https://kubernetes.io/docs/concepts/workloads/controllers/job/) implements such as time to live after success/failure that are really handy.
   I also agree on the fact that the usage of kubernetes executor vs kubernetes pod operator is not very clear yet.
   In our use case, since we have very different dags types living in the same airflow instance, so we use multiple images that are scheduled through pod operators (that we used before kubernetes executor and taskflow api appeared).
   Say for instance one image to parse new batches of data and another one to train models on it in another dag.
   That is not ideal since the workflow dependencies are not properly binded in code but rather to expected data checkpoints, say instead of having
   ```
   read_file | parse | feature_engineering | train_model
   read_file | archive
   ```
   that describe direct data dependencies in code (say the airflow taskflow way, or equivalently in spark or apache beam), we rather have
   ```
   schedule_parse_file_and_store(raw_data_batch_location)
   schedule_feature_engineer(raw_data_batch_location)
   schedule_train_model(feature_engineered_batch_location)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org