You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/04/13 18:03:35 UTC

[GitHub] [airflow] mik-laj opened a new issue #8272: Cloud Life Sciences operators and hooks

mik-laj opened a new issue #8272: Cloud Life Sciences operators and hooks
URL: https://github.com/apache/airflow/issues/8272
 
 
   **Description**
   
   Hello,
   
   Airflow has extensive support for [many GCP services](https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#gcp-google-cloud-platform).  However, we lack integration with the [Cloud Life Sciences](https://cloud.google.com/life-sciences) service. 
   
   I think it would be very useful to have an operator which [run the pipeline](https://cloud.google.com/life-sciences/docs/reference/rest/v2beta/projects.locations.pipelines/run) and then waits for the result.  This will facilitate the use of Airflow to orchestrate tasks that require a GPU. You can currently perform a [similar job](https://cloud.google.com/cloud-build/docs/api/reference/rest/v1/projects.builds/create) using Cloud Build, but this service does not support GPU.
   
   Before starting work, I recommend reading the [GCP Service Airflow Integration Guide]
   (https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit)
   
   The final work should contain:
   * How to guide - (Similar to https://airflow.readthedocs.io/en/latest/howto/operator/gcp/natural_language.html)
   * API Reference: - (Similar to https://airflow.readthedocs.io/en/latest/_api/airflow/providers/google/cloud/operators/natural_language/index.html#airflow.providers.google.cloud.operators.natural_language.CloudNaturalLanguageAnalyzeEntitiesOperator)
   * Example DAG: https://github.com/apache/airflow/tree/master/airflow/providers/google/cloud/example_dags
   * Operator
   * Hook
   * Unit test
   * System tests (Similar to https://github.com/apache/airflow/blob/master/tests/providers/google/cloud/operators/test_natural_language_system.py)
   
   If you haven't used the GCP yet, after creating the account you [will get $300](https://cloud.google.com/free), which will allow you to get to know this service better.
   
   The implementation of this task will allow a better understanding of GCP services, as well as learn methods of testing and documenting the code that is required by the community.  If anyone is interested in this task, I am willing to provide all the necessary tips and information. 
   
   **Use case / motivation**
   
   N/A
   
   **Related Issues**
   
   N/A

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on issue #8272: Cloud Life Sciences operator and hook

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615427650
 
 
   Thank you so much. These are more than enough. I really appreciate!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on issue #8272: Cloud Life Sciences operator and hook

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615209256
 
 
   I'll like to work on this. I have sent request to view the [GCP Service Airflow Integration Guide](https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit).
   Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy edited a comment on issue #8272: Cloud Life Sciences operator and hook

Posted by GitBox <gi...@apache.org>.
ephraimbuddy edited a comment on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615209256
 
 
   Hi, I'll like to work on this. I have sent request to view the [GCP Service Airflow Integration Guide](https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit).
   Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] mik-laj edited a comment on issue #8272: Cloud Life Sciences operator and hook

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615426469
 
 
   @ephraimbuddy Google Cloud has two types of libraries. 
   * Native python library - https://github.com/googleapis/google-cloud-python  It exists for most, but not for all services. These are recommended. libraries. Most often they use Protobuf for communication.
   * Discovery based - https://github.com/googleapis/google-api-python-client These are libraries that are automatically generated based on the API specification (called the discovery document) at the time of use There are always Googlle services for everyone and they have all the options - it's always fresh. For communication uses HTTP only
   
   We don't have a native library for this library, so we need to use [google-api-client-python](https://github.com/googleapis/google-api-python-client).. In order to initialize the library, you should use the following code.
   ```python
   from googleapiclient.discovery import build
   service = build('lifesciences', 'v2beta', ...)
   ```
   Unfortunately, there is no documentation for this library, but you can build a client and check what methods exist in this API using ipdb
   Documentation for other service is available here:
   https://github.com/googleapis/google-api-python-client/blob/master/docs/dyn/index.md
   
   Here is an example of how to check documentation for dataflow.
   ```python
   from googleapiclient.discovery import build
   dataflow_service = build('dataflow', 'v1b3')
   projects_resource = dataflow_service.projects()
   locations_resource = projects_resource.locations()
   flex_templates_resource = locations_resource.flexTemplates()
   
   print(flex_templates_resource.launch.__doc__)
   ```
   These APIs are automatically generated based on the REST API, so you can check the general idea and required arguments in the REST API documentation for the Life Science service.
   https://cloud.google.com/life-sciences/docs/reference/rest
   
   If you looking for example hook, you should look at Cloud Build:
   https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/cloud_build.py
   It still uses discovery-based client
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on issue #8272: Cloud Life Sciences operator and hook

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615416283
 
 
   Hi @mik-laj , Please can you point me to the python library for this Cloud Life Science. I have been looking for it and can't find it. Sorry for any inconveniences this may cause.
   Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] mik-laj commented on issue #8272: Cloud Life Sciences operator and hook

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615426469
 
 
   @ephraimbuddy Google Cloud has two types of libraries. 
   * Native python library - https://github.com/googleapis/google-cloud-python  It exists for most, but not for all services. These are recommended. libraries. Most often they use Protobuf for communication.
   * Discovery based - https://github.com/googleapis/google-api-python-client These are libraries that are automatically generated based on the API specification (called the discovery document) at the time of use There are always Googlle services for everyone and they have all the options - it's always fresh. For communication uses HTTP only
   
   We don't have a native library for this library, so we need to use [google-api-client-python](https://github.com/googleapis/google-api-python-client).. In order to initialize the library, you should use the following code.
   ```python
   from googleapiclient.discovery import build
   service = build('lifesciences', 'v2beta', ...)
   ```
   Unfortunately, there is no documentation for this library, but you can build a client and check what methods exist in this API using ipdb
   Documentation for other service is available here:
   https://github.com/googleapis/google-api-python-client/blob/master/docs/dyn/index.md
   
   Here is an example of how to check documentation for dataflow.
   ```python
   from googleapiclient.discovery import build
   dataflow_service = build('dataflow', 'v1b3')
   projects_resource = dataflow_service.projects()
   locations_resource = projects_resource.locations()
   flex_templates_resource = locations_resource.flexTemplates()
   
   print(flex_templates_resource.launch.__doc__)
   ```
   These APIs are automatically generated based on the REST API, so you can check the general idea and required arguments in the REST API documentation for the Life Science service.
   https://cloud.google.com/life-sciences/docs/reference/rest
   
   If you looking for example hook, you should look at Cloud Build:
   https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/cloud_build.py
   It still uses discovery.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on issue #8272: Cloud Life Sciences operator and hook

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #8272: Cloud Life Sciences operator and hook
URL: https://github.com/apache/airflow/issues/8272#issuecomment-615940531
 
 
   Please assign this to me, I'm now working on it. Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services