You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/03/05 08:59:50 UTC

[GitHub] [airflow] roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor

roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-595108351
 
 
   > Hello,
   > 
   > your pull request seems very interesting. Currently, I am using Airflow and Spark Operator on Kubernetes separately and this PR seems a very good solution to use them together.
   > 
   > Dealing with the code I would like to understand some points:
   > 
   > 1. Why are two different tasks used for creating and checking the status of Spark Application (**SparkKubernetesOperator** and **SparkKubernetesSensor**) ?
   >    Is it possible to merge them in a single Airflow operator ? In this way you can use only a single task in the DAG to manage a Spark Application.
   > 2. when a Custom resource is created/watched, I noticed the following part:
   > 
   > ```
   > group="sparkoperator.k8s.io",
   > version="v1beta2",
   > plural="sparkapplications",
   > ```
   > 
   > Why are these variables hard-coded here instead of retrieving them from the spark application definition (yaml) ?
   > 
   > Having a generic airflow operator that is able to manage all kinds of custom resources and not only the spark applications would be easier to re-use.
   > 
   > Thanks,
   > Alessio
   
   Thanks for your interest in my PR!!!
   1. I've seen the division to separate Operator and sensor also in the AWS EMR operator, and this is the way that we are currently working in my company, also the sensor has the built-in poke methods that check the condition and sleep. I know there is a way to call the sensor from the operator directly so  let's say given a flag in the operator include_sensor=True after the creation done i'll call the sensor class. @mik-laj what is your opinion on this?
   2. I want to make this operator/sensor specific to sparkOperator so that's why I used hardcoded values so if you feeding the operator with the wrong YAML it won't pass. as for general CRD operator:
   I made a proposal on the mailing list: https://lists.apache.org/thread.html/r36fddadf0791718716829875e9a60674ef61a0ca598ed4e952d115fc%40%3Cdev.airflow.apache.org%3E and it seems that the right way is to make a more specific operator. but after this PR will pass I'll make another one with general CRD operator/sensor(the Kubernetes hook thanks to @mik-laj already has general CRD methods) 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services