You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/04/13 18:03:35 UTC
[GitHub] [airflow] mik-laj opened a new issue #8272: Cloud Life Sciences
operators and hooks
mik-laj opened a new issue #8272: Cloud Life Sciences operators and hooks
URL: https://github.com/apache/airflow/issues/8272
**Description**
Hello,
Airflow has extensive support for [many GCP services](https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#gcp-google-cloud-platform). However, we lack integration with the [Cloud Life Sciences](https://cloud.google.com/life-sciences) service.
I think it would be very useful to have an operator which [run the pipeline](https://cloud.google.com/life-sciences/docs/reference/rest/v2beta/projects.locations.pipelines/run) and then waits for the result. This will facilitate the use of Airflow to orchestrate tasks that require a GPU. You can currently perform a [similar job](https://cloud.google.com/cloud-build/docs/api/reference/rest/v1/projects.builds/create) using Cloud Build, but this service does not support GPU.
Before starting work, I recommend reading the [GCP Service Airflow Integration Guide]
(https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit)
The final work should contain:
* How to guide - (Similar to https://airflow.readthedocs.io/en/latest/howto/operator/gcp/natural_language.html)
* API Reference: - (Similar to https://airflow.readthedocs.io/en/latest/_api/airflow/providers/google/cloud/operators/natural_language/index.html#airflow.providers.google.cloud.operators.natural_language.CloudNaturalLanguageAnalyzeEntitiesOperator)
* Example DAG: https://github.com/apache/airflow/tree/master/airflow/providers/google/cloud/example_dags
* Operator
* Hook
* Unit test
* System tests (Similar to https://github.com/apache/airflow/blob/master/tests/providers/google/cloud/operators/test_natural_language_system.py)
If you haven't used the GCP yet, after creating the account you [will get $300](https://cloud.google.com/free), which will allow you to get to know this service better.
The implementation of this task will allow a better understanding of GCP services, as well as learn methods of testing and documenting the code that is required by the community. If anyone is interested in this task, I am willing to provide all the necessary tips and information.
**Use case / motivation**
N/A
**Related Issues**
N/A
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] ephraimbuddy commented on issue #8272: Cloud Life
Sciences operator and hook
Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615427650
Thank you so much. These are more than enough. I really appreciate!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] ephraimbuddy commented on issue #8272: Cloud Life
Sciences operator and hook
Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615209256
I'll like to work on this. I have sent request to view the [GCP Service Airflow Integration Guide](https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit).
Thanks
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] ephraimbuddy edited a comment on issue #8272: Cloud Life
Sciences operator and hook
Posted by GitBox <gi...@apache.org>.
ephraimbuddy edited a comment on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615209256
Hi, I'll like to work on this. I have sent request to view the [GCP Service Airflow Integration Guide](https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit).
Thanks
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] mik-laj edited a comment on issue #8272: Cloud Life
Sciences operator and hook
Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615426469
@ephraimbuddy Google Cloud has two types of libraries.
* Native python library - https://github.com/googleapis/google-cloud-python It exists for most, but not for all services. These are recommended. libraries. Most often they use Protobuf for communication.
* Discovery based - https://github.com/googleapis/google-api-python-client These are libraries that are automatically generated based on the API specification (called the discovery document) at the time of use There are always Googlle services for everyone and they have all the options - it's always fresh. For communication uses HTTP only
We don't have a native library for this library, so we need to use [google-api-client-python](https://github.com/googleapis/google-api-python-client).. In order to initialize the library, you should use the following code.
```python
from googleapiclient.discovery import build
service = build('lifesciences', 'v2beta', ...)
```
Unfortunately, there is no documentation for this library, but you can build a client and check what methods exist in this API using ipdb
Documentation for other service is available here:
https://github.com/googleapis/google-api-python-client/blob/master/docs/dyn/index.md
Here is an example of how to check documentation for dataflow.
```python
from googleapiclient.discovery import build
dataflow_service = build('dataflow', 'v1b3')
projects_resource = dataflow_service.projects()
locations_resource = projects_resource.locations()
flex_templates_resource = locations_resource.flexTemplates()
print(flex_templates_resource.launch.__doc__)
```
These APIs are automatically generated based on the REST API, so you can check the general idea and required arguments in the REST API documentation for the Life Science service.
https://cloud.google.com/life-sciences/docs/reference/rest
If you looking for example hook, you should look at Cloud Build:
https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/cloud_build.py
It still uses discovery-based client
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] ephraimbuddy commented on issue #8272: Cloud Life
Sciences operator and hook
Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615416283
Hi @mik-laj , Please can you point me to the python library for this Cloud Life Science. I have been looking for it and can't find it. Sorry for any inconveniences this may cause.
Thanks
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] mik-laj commented on issue #8272: Cloud Life Sciences
operator and hook
Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615426469
@ephraimbuddy Google Cloud has two types of libraries.
* Native python library - https://github.com/googleapis/google-cloud-python It exists for most, but not for all services. These are recommended. libraries. Most often they use Protobuf for communication.
* Discovery based - https://github.com/googleapis/google-api-python-client These are libraries that are automatically generated based on the API specification (called the discovery document) at the time of use There are always Googlle services for everyone and they have all the options - it's always fresh. For communication uses HTTP only
We don't have a native library for this library, so we need to use [google-api-client-python](https://github.com/googleapis/google-api-python-client).. In order to initialize the library, you should use the following code.
```python
from googleapiclient.discovery import build
service = build('lifesciences', 'v2beta', ...)
```
Unfortunately, there is no documentation for this library, but you can build a client and check what methods exist in this API using ipdb
Documentation for other service is available here:
https://github.com/googleapis/google-api-python-client/blob/master/docs/dyn/index.md
Here is an example of how to check documentation for dataflow.
```python
from googleapiclient.discovery import build
dataflow_service = build('dataflow', 'v1b3')
projects_resource = dataflow_service.projects()
locations_resource = projects_resource.locations()
flex_templates_resource = locations_resource.flexTemplates()
print(flex_templates_resource.launch.__doc__)
```
These APIs are automatically generated based on the REST API, so you can check the general idea and required arguments in the REST API documentation for the Life Science service.
https://cloud.google.com/life-sciences/docs/reference/rest
If you looking for example hook, you should look at Cloud Build:
https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/cloud_build.py
It still uses discovery.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] ephraimbuddy commented on issue #8272: Cloud Life
Sciences operator and hook
Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615940531
Please assign this to me, I'm now working on it. Thanks
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services