You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/04 07:27:49 UTC

[GitHub] [airflow] otourzan opened a new issue #12804: Regional Dataproc Workflow Template Instantiate Fails

otourzan opened a new issue #12804:
URL: https://github.com/apache/airflow/issues/12804


   **Apache Airflow version**: GCP Composer (1.10.12+composer)
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): GKE (1.16.13-gke.404)
   
   **Environment**: GCP Composer
   
   - **Cloud provider or hardware configuration**: GCP
   - **OS** (e.g. from /etc/os-release):
   - **Kernel** (e.g. `uname -a`):
   - **Install tools**:
   - **Others**:
   
   **What happened**:
   Trying to instantiate regional workflow template using Dataproc operator fails.
   
   [2020-12-03 02:48:24,503] {taskinstance.py:1153} ERROR - 400 Region 'us-central1' specified in request does not match endpoint region 'global'. To use 'us-central1' region, specify 'us-central1' region in request and configure client to use 'us-central1-dataproc.googleapis.com:443' endpoint.
   Traceback (most recent call last)
     File "/opt/python3.6/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callabl
       return callable_(*args, **kwargs
     File "/opt/python3.6/lib/python3.6/site-packages/grpc/_channel.py", line 826, in __call_
       return _end_unary_response_blocking(state, call, False, None
     File "/opt/python3.6/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blockin
       raise _InactiveRpcError(state
   grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with
   	status = StatusCode.INVALID_ARGUMEN
   	details = "Region 'us-central1' specified in request does not match endpoint region 'global'. To use 'us-central1' region, specify 'us-central1' region in request and configure client to use 'us-central1-dataproc.googleapis.com:443' endpoint.
   	debug_error_string = "{"created":"@1606963704.503150879","description":"Error received from peer ipv4:142.250.71.74:443","file":"src/core/lib/surface/call.cc","file_line":1061,"grpc_message":"Region 'us-central1' specified in request does not match endpoint region 'global'. To use 'us-central1' region, specify 'us-central1' region in request and configure client to use 'us-central1-dataproc.googleapis.com:443' endpoint.","grpc_status":3}
   
   
   **What you expected to happen**:
   
   It seems the endpoint is not handled correctly and Dataproc hook uses global endpoint [1] even if the template is on a region and region parameter set.
   
   I think this method [2] should be updated to get location/region and add client_option to the data proc grpc client.
   
   **How to reproduce it**:
   Need GCP project setup
   1- Create workflow template
   `gcloud dataproc workflow-templates create WORKFLOW_TMPL --region us-central1`
   
   
   2- Schedule the following DAG:
   `
   import airflow
   from airflow import DAG
   from datetime import timedelta
   from airflow.providers.google.cloud.operators.dataproc import DataprocInstantiateWorkflowTemplateOperator
   
   
   default_args = {
       'start_date': airflow.utils.dates.days_ago(0),
       'retries': 1,
       'retry_delay': timedelta(minutes=5)
   }
   
   dag = DAG(
       'dataproc_template_test',
       default_args=default_args,
       description='test dataproc workflow template',
       schedule_interval=None,
       dagrun_timeout=timedelta(minutes=20))
   
   start_template_job = DataprocInstantiateWorkflowTemplateOperator(
       # The task id of your job
       task_id="dataproc_workflow_dag",
       # The template id of your workflow
       template_id="TEMPLATE_ID",
       project_id="PROJECT_ID",
       # The region for the template
       region="us-central1",
       dag=dag
   )
   `
   
   [1]: https://cloud.google.com/dataproc/docs/concepts/regional-endpoints#grpc
   [2]: https://github.com/apache/airflow/blob/292118e33971dfd68cb32a404a85c0d46d225b40/airflow/providers/google/cloud/hooks/dataproc.py#L222


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #12804: Regional Dataproc Workflow Template Instantiate Fails

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #12804:
URL: https://github.com/apache/airflow/issues/12804#issuecomment-738706033


   Thanks @otourzan, I thin you are right that we may need to set location in the API client. @michalslowikowski00 is working on rewriting dataproc to use dataproc python client 2.0


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek closed issue #12804: Regional Dataproc Workflow Template Instantiate Fails

Posted by GitBox <gi...@apache.org>.
turbaszek closed issue #12804:
URL: https://github.com/apache/airflow/issues/12804


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] DenisOgr commented on issue #12804: Regional Dataproc Workflow Template Instantiate Fails

Posted by GitBox <gi...@apache.org>.
DenisOgr commented on issue #12804:
URL: https://github.com/apache/airflow/issues/12804#issuecomment-840671020


   @otourzan Thank you for noticing and fixing this issue. I meet this issue in airflow 2.0.1. Actually, I have a question, how can I get the changes, that you made to fix it? I see merged pull-request to `apache: master`. But in what release these changes were added?  @turbaszek could you help me with this?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12804: Regional Dataproc Workflow Template Instantiate Fails

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12804:
URL: https://github.com/apache/airflow/issues/12804#issuecomment-840784210


   @DenisOgr Run `pip install -U apache-airflow-providers-google`
   
   Docs: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/index.html
   Changes: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/commits.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #12804: Regional Dataproc Workflow Template Instantiate Fails

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #12804:
URL: https://github.com/apache/airflow/issues/12804#issuecomment-738617430


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] otourzan commented on issue #12804: Regional Dataproc Workflow Template Instantiate Fails

Posted by GitBox <gi...@apache.org>.
otourzan commented on issue #12804:
URL: https://github.com/apache/airflow/issues/12804#issuecomment-740324839


   I could workaround it by overriding hook methods in DAG with the following code. I'll send a pull request to fix in main code as well.
   
   ```python
   from airflow.providers.google.cloud.hooks.dataproc import DataprocHook
   from google.cloud.dataproc_v1beta2 import WorkflowTemplateServiceClient
   
   def get_template_client(self, location=None) -> WorkflowTemplateServiceClient:
       """Returns WorkflowTemplateServiceClient."""
       client_options = {'api_endpoint': f'{location}-dataproc.googleapis.com:443'} if location and location != 'global' else None
   
       return WorkflowTemplateServiceClient(
           credentials=self._get_credentials(), client_info=self.client_info, client_options=client_options
       )
   
   def instantiate_workflow_template(
           self,
           location: str,
           template_name: str,
           project_id: str,
           version=None,
           request_id=None,
           parameters=None,
           retry=None,
           timeout=None,
           metadata=None,
       ):
       client = self.get_template_client(location)
       name = client.workflow_template_path(project_id, location, template_name)
       operation = client.instantiate_workflow_template(
           name=name,
           version=version,
           parameters=parameters,
           request_id=request_id,
           retry=retry,
           timeout=timeout,
           metadata=metadata,
       )
       return operation
   
   DataprocHook.get_template_client = get_template_client
   DataprocHook.instantiate_workflow_template = instantiate_workflow_template
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org