You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/29 09:19:23 UTC

[GitHub] [airflow] thinhnd2104 opened a new issue, #24730: Google CloudRun job operator

thinhnd2104 opened a new issue, #24730:
URL: https://github.com/apache/airflow/issues/24730

   ### Description
   
   Like AWS ECS, In Google Cloud Service has Cloud Run Job beta. So does anyone need to use this feature from GCS?
   
   ### Use case/motivation
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #24730: Google CloudRun job operator

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #24730: Google CloudRun job operator
URL: https://github.com/apache/airflow/issues/24730


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1243990279

   Includes: 
   * intro
   * step-by-step-guide how to approach contribution
   * links to <10 minutes setup of development environme
   * quick contribution guides which provide quick-and-dirty way how to start with screenshots depending on which IDE you use (PyCharm/VCcode or even gitpod or codespaces if you feel like developing in remote env) if you just want to "do" without reading too much of "why and how".
   
   I think that is a good starting point - and you can choose the learning path that is best for you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1243055682

   > Oh yeah, I would like to have this operator. I am using Cloud Run Jobs to execute light data processing and scripting. It would be really nice to have a dedicated operator to trigger this flow in Cloud Composer (apart from the bash operator).
   
   Why not contribute it then ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mharrisb1 commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
mharrisb1 commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1369849278

   @VinceLegendre all great thoughts.
   
   I think most of my plugin is obsolete once someone can get https://github.com/googleapis/python-run to build correctly with the rest of the Google Cloud providers code https://pypi.org/project/apache-airflow-providers-google/. The only issue is resolving protobuf versions between the 2 (see https://github.com/googleapis/python-run/issues/70). The Google team will not solve this on their side so someone will need to solve it in the Google providers code.
   
   The official python-run library is definitely preferred over my custom client. Then yes, taking the same approach as other Google Cloud providers would be the goal. And exactly as you pointed out: extend `GoogleBaseHook` for auth, etc.
   
   Once the protobuf issue is resolved then it should be an easy path to just implement operators for all CRUD and execution options. I think sensors, custom links, etc. are nice to have but could potentially be introduced in subsequent versions if someone doesn't want to implement it all at once. I would though consider all the CRUD operators as part of the completion requirements since that allows full control over the resource lifecycle.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jmantegazza commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
jmantegazza commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1238236291

   Oh yeah, I would like to have this operator. I am using Cloud Run Jobs to execute light data processing and scripting. It would be really nice to have a dedicated operator to trigger this flow in Cloud Composer (apart from the bash operator).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] v-hunt commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
v-hunt commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1336525038

   What I've found: this guy created a custom Airflow plugin for Cloud Run Jobs: [link](https://github.com/mharrisb1/airflow-google-cloud-run-plugin)
   Possibly, he solved this problem. I'm going to look deeper into this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] v-hunt commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
v-hunt commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1336523277

   Hi guys,
   I'm sorry for not responding for a while. (I'm in Ukraine, so I think you understand why).
   The question is - is it still actual? If yes, I can contribute. But I can't promise it will be too fast.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mohithg commented on issue #24730: Google CloudRun job operator

Posted by "mohithg (via GitHub)" <gi...@apache.org>.
mohithg commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1415819174

   When will the official CloudRun Job operator be ready to use in production? Is there an alternative for this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] VinceLegendre commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
VinceLegendre commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1369838697

   After a closer look a this [plugin](https://github.com/mharrisb1/airflow-google-cloud-run-plugin), here are some thoughts :
   
   - `CloudRunHook` should extend `GoogleBaseHook`to ease authentication and GCP configuration
   - As [gcloud CLI](https://cloud.google.com/run/docs/create-jobs#command-line) propose the `--execute-now` flag when creating jobs, I think the following logic/naming convention may be more straightforward :
       -  Have a `CloudRunCreateJobOperator` with `execute_now`, `update_if_exists` and `delete_on_exit` capabilities, to allow job definition, run and deletion from Airflow
       - Have a separate `CloudRunExecuteJobOperator`, allowing one to execute a pre-created job in a GCP project
    - `CloudRunListJobs` and `CloudRunDeleteJob` operators to complete CRUD capabilities for jobs, as introduced in the plugin documentation
    - Regarding executions, is the **DELETE** operation mandatory to support ? I may miss some use cases here
   
   Happy to have this issue assigned if necessary @v-hunt , as I think this would be a game changer for many GCP users!
   I guess the required development would be mostly based on @mharrisb1 's really good plugin (congrats btw :clap: )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] VinceLegendre commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
VinceLegendre commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1369896931

   @mharrisb1  Is the build issue you mention specific to cloud-run v2 API ?
   As v2 does not seem to support jobs & executions CRUD for the moment, maybe we can stick to v1 for the time being ?
   
   If so, v1 API seems to build correctly with the rest of google cloud providers, at least locally. `CloudRunJobHook.get_conn` worked well in breeze with this piece of code : https://github.com/VinceLegendre/airflow/blob/add_google_cloud_run_execute_job_operator/airflow/providers/google/cloud/hooks/cloud_run.py#L166


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] yan-hic commented on issue #24730: Google CloudRun job operator

Posted by "yan-hic (via GitHub)" <gi...@apache.org>.
yan-hic commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1501945091

   Glad this sparks a lot of interest. 
   
   One thought once the operators have migrated to the official SDK is to consider a new `CloudRunExecutor` as an alternative to k8s - in a different github thread. 
   
   It could combine with parallelism: inject arbitrary py code + number of tasks to run concurrently, with default of one (=current executor behavior). 
   
   I have a few use cases where 100+ tasks run in parallel and I don't need/want each to be defined as an airflow task (would kill the UI, among others). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jmantegazza commented on issue #24730: Google CloudRun job operator

Posted by "jmantegazza (via GitHub)" <gi...@apache.org>.
jmantegazza commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1415841231

   The alternative is to use a BashOperator with the gcloud command.
   
   El El vie, 3 de feb. de 2023 a la(s) 09:41, Mohith G <
   ***@***.***> escribió:
   
   > When will the official CloudRun Job operator be ready to use in
   > production? Is there an alternative for this?
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/airflow/issues/24730#issuecomment-1415819174>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AZEPOZUYLVM3M2DJOBK7CXLWVT4GHANCNFSM52FAD4XQ>
   > .
   > You are receiving this because you commented.Message ID:
   > ***@***.***>
   >
   -- 
   
   
   *Juan MantegazzaLead Data Engineer*
   
   *E: ***@***.*** ***@***.***>W: www.zubale.com
   <http://www.zubale.com/>*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] corridordigital commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
corridordigital commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1317153607

   Possible solution with this [PR](https://github.com/apache/airflow/pull/27638)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] r-richmond commented on issue #24730: Google CloudRun job operator

Posted by "r-richmond (via GitHub)" <gi...@apache.org>.
r-richmond commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1439340843

   > It also currently won't build with apache-airflow-providers-google because of incompatible protobuf support (see https://github.com/googleapis/python-run/issues/70).
   
   >The only issue is resolving protobuf versions between the 2 (see https://github.com/googleapis/python-run/issues/70). The Google team will not solve this on their side so someone will need to solve it in the Google providers code.
   
   FWIW I have https://github.com/apache/airflow/pull/29644 open which solves the protobuf==3.2.0 issue. However, I've hit some CI/CD issues that I'm not sure how to solve. If someone wants to take a look / take over / pass some suggestions I'm all for it. This protobuf pin is the source of many of my headaches. 
   
   [Link](https://apache-airflow.slack.com/archives/CCPRP7943/p1676932204181159) to slack airflow thread if that makes it easier to discuss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] v-hunt commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
v-hunt commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1288913335

   I think, I also can contribute on this, if required


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] VinceLegendre commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
VinceLegendre commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1362837879

   Hey guys, just found out about this issue !
   I am interested in contributing to this work and [I submitted a PR](https://github.com/apache/airflow/pull/28525) to **manage existing jobs execution only** for the moment.
   
   What I have in mind is to design such an operator in the same way as `DataflowStartFlexTemplateOperator`.
   
   Would you have any thoughts on this ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jmantegazza commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
jmantegazza commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1243780911

   > > Oh yeah, I would like to have this operator. I am using Cloud Run Jobs to execute light data processing and scripting. It would be really nice to have a dedicated operator to trigger this flow in Cloud Composer (apart from the bash operator).
   > 
   > Why not contribute it then ?
   
   I would love to. But I have not a single idea on how to do it. Is there an established process to work on it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] EamonKeane commented on issue #24730: Google CloudRun job operator

Posted by "EamonKeane (via GitHub)" <gi...@apache.org>.
EamonKeane commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1627116423

   Cloud run jobs can now last up to 24 hours, making this viable for the vast majority of tasks.
   
   https://cloud.google.com/run/docs/create-jobs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1173034245

   feel free to submit PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mharrisb1 commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
mharrisb1 commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1336303854

   ## Brief Thoughts/Notes
   
   I've done a little bit of work on this and here are some notes.
   
   A PR for this feature should include operators for both:
   
   - **Cloud Run services**: Used to run code that responds to web requests, or events.
   - **Cloud Run jobs**: Used to run code that performs work (a job) and quits when the work is done.
   
   [Source](https://cloud.google.com/run/docs/overview/what-is-cloud-run).
   
   It looks like https://github.com/apache/airflow/pull/27638 only include the services operator. This would be a good start but it looks like it also uses [transport](https://github.com/apache/airflow/blob/45ce866b35f824d1ee1208ce1d624203570832bc/airflow/providers/google/cloud/hooks/cloud_run.py#L23) instead of the official client. Other GCP operators (e.g. tasks and others) use the official clients so it would be best to go the same route with this one.
   
   In my mind the biggest benefit comes from the jobs operators since that would allow users who do not want to deal with/manage K8s to use Cloud Run Jobs with arbitrary containers.
   
   The Google team is great and recently released support for jobs in the official [Cloud Run Python client](https://github.com/googleapis/python-run) (see https://github.com/googleapis/python-run/pull/65) but it won't be available until [v0.5.0](https://github.com/googleapis/python-run/pull/61) with no ETA. It also currently won't build with `apache-airflow-providers-google` because of incompatible protobuf support (see https://github.com/googleapis/python-run/issues/70).
   
   I created my own plugin for this if anyone is interested in using Cloud Run Jobs (currently does not support services) before these issues are resolved and I plan to use the official client once that is ready.
   
   https://github.com/mharrisb1/airflow-google-cloud-run-plugin
   
   Please note that this plugin will only supported until this is available in Airflow.
   
   ## Requirements Proposal
   
   I think Cloud Run Services and Jobs would be great additions to GCP resources in Airflow. I think a PR to add these features should cover the following:
   
   - [ ] `CloudRunHook` - manage w/ create, read, update, delete, list resources including services, revisions, jobs, executions, and task resources (see https://github.com/googleapis/python-run/tree/main/google/cloud/run_v2/services)
   - [ ] Custom links for resources (I'm still fuzzy around what should be needed here). See other GCP resource custom link examples https://github.com/apache/airflow/tree/main/airflow/providers/google/cloud/links
   - [ ] CRUD-based operators for all resources (see task operators as an example https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/operators/tasks.py)
   - [ ] Sensors 🤷 (again, fuzzy on this one)
   
   Some additional thoughts:
   - There seems to be a pretty low quota for sequential requests so any ping mechanism should try to respect this otherwise tasks will fail often.
   - For Cloud Run jobs operators specifically, it would be nice if instead of only having CRUD-based operators, the main job run operator could also have an option to "create if not exists" and "deleted on exit" to avoid extra tasks. This is simply a personal preference (I added it to my plugin (see [example](https://github.com/mharrisb1/airflow-google-cloud-run-plugin#simple-job-lifecycle)).
   
   Would love to contribute and collaborate with anyone on this. I do think we're blocked on progress until v0.5.0 of the official client is released w/ support for compatible protofbuf lib but we can definitely go ahead and start progressing on this in preparation for those to be released in the near future (also go comment/like this issue to increase awareness to Google team https://github.com/googleapis/python-run/issues/70).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1243986315

   https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on issue #24730: Google CloudRun job operator

Posted by GitBox <gi...@apache.org>.
o-nikolas commented on issue #24730:
URL: https://github.com/apache/airflow/issues/24730#issuecomment-1291472273

   > I think, I also can contribute on this, if required
   
   We haven't heard back from Juan, so assigning to @v-hunt, thanks for taking this one!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org