You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/04 07:27:49 UTC
[GitHub] [airflow] otourzan opened a new issue #12804: Regional Dataproc Workflow Template Instantiate Fails
otourzan opened a new issue #12804:
URL: https://github.com/apache/airflow/issues/12804
**Apache Airflow version**: GCP Composer (1.10.12+composer)
**Kubernetes version (if you are using kubernetes)** (use `kubectl version`): GKE (1.16.13-gke.404)
**Environment**: GCP Composer
- **Cloud provider or hardware configuration**: GCP
- **OS** (e.g. from /etc/os-release):
- **Kernel** (e.g. `uname -a`):
- **Install tools**:
- **Others**:
**What happened**:
Trying to instantiate regional workflow template using Dataproc operator fails.
[2020-12-03 02:48:24,503] {taskinstance.py:1153} ERROR - 400 Region 'us-central1' specified in request does not match endpoint region 'global'. To use 'us-central1' region, specify 'us-central1' region in request and configure client to use 'us-central1-dataproc.googleapis.com:443' endpoint.
Traceback (most recent call last)
File "/opt/python3.6/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callabl
return callable_(*args, **kwargs
File "/opt/python3.6/lib/python3.6/site-packages/grpc/_channel.py", line 826, in __call_
return _end_unary_response_blocking(state, call, False, None
File "/opt/python3.6/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blockin
raise _InactiveRpcError(state
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with
status = StatusCode.INVALID_ARGUMEN
details = "Region 'us-central1' specified in request does not match endpoint region 'global'. To use 'us-central1' region, specify 'us-central1' region in request and configure client to use 'us-central1-dataproc.googleapis.com:443' endpoint.
debug_error_string = "{"created":"@1606963704.503150879","description":"Error received from peer ipv4:142.250.71.74:443","file":"src/core/lib/surface/call.cc","file_line":1061,"grpc_message":"Region 'us-central1' specified in request does not match endpoint region 'global'. To use 'us-central1' region, specify 'us-central1' region in request and configure client to use 'us-central1-dataproc.googleapis.com:443' endpoint.","grpc_status":3}
**What you expected to happen**:
It seems the endpoint is not handled correctly and Dataproc hook uses global endpoint [1] even if the template is on a region and region parameter set.
I think this method [2] should be updated to get location/region and add client_option to the data proc grpc client.
**How to reproduce it**:
Need GCP project setup
1- Create workflow template
`gcloud dataproc workflow-templates create WORKFLOW_TMPL --region us-central1`
2- Schedule the following DAG:
`
import airflow
from airflow import DAG
from datetime import timedelta
from airflow.providers.google.cloud.operators.dataproc import DataprocInstantiateWorkflowTemplateOperator
default_args = {
'start_date': airflow.utils.dates.days_ago(0),
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
dag = DAG(
'dataproc_template_test',
default_args=default_args,
description='test dataproc workflow template',
schedule_interval=None,
dagrun_timeout=timedelta(minutes=20))
start_template_job = DataprocInstantiateWorkflowTemplateOperator(
# The task id of your job
task_id="dataproc_workflow_dag",
# The template id of your workflow
template_id="TEMPLATE_ID",
project_id="PROJECT_ID",
# The region for the template
region="us-central1",
dag=dag
)
`
[1]: https://cloud.google.com/dataproc/docs/concepts/regional-endpoints#grpc
[2]: https://github.com/apache/airflow/blob/292118e33971dfd68cb32a404a85c0d46d225b40/airflow/providers/google/cloud/hooks/dataproc.py#L222
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] turbaszek commented on issue #12804: Regional Dataproc Workflow Template Instantiate Fails
Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #12804:
URL: https://github.com/apache/airflow/issues/12804#issuecomment-738706033
Thanks @otourzan, I thin you are right that we may need to set location in the API client. @michalslowikowski00 is working on rewriting dataproc to use dataproc python client 2.0
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] turbaszek closed issue #12804: Regional Dataproc Workflow Template Instantiate Fails
Posted by GitBox <gi...@apache.org>.
turbaszek closed issue #12804:
URL: https://github.com/apache/airflow/issues/12804
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] DenisOgr commented on issue #12804: Regional Dataproc Workflow Template Instantiate Fails
Posted by GitBox <gi...@apache.org>.
DenisOgr commented on issue #12804:
URL: https://github.com/apache/airflow/issues/12804#issuecomment-840671020
@otourzan Thank you for noticing and fixing this issue. I meet this issue in airflow 2.0.1. Actually, I have a question, how can I get the changes, that you made to fix it? I see merged pull-request to `apache: master`. But in what release these changes were added? @turbaszek could you help me with this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #12804: Regional Dataproc Workflow Template Instantiate Fails
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12804:
URL: https://github.com/apache/airflow/issues/12804#issuecomment-840784210
@DenisOgr Run `pip install -U apache-airflow-providers-google`
Docs: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/index.html
Changes: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/commits.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #12804: Regional Dataproc Workflow Template Instantiate Fails
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #12804:
URL: https://github.com/apache/airflow/issues/12804#issuecomment-738617430
Thanks for opening your first issue here! Be sure to follow the issue template!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] otourzan commented on issue #12804: Regional Dataproc Workflow Template Instantiate Fails
Posted by GitBox <gi...@apache.org>.
otourzan commented on issue #12804:
URL: https://github.com/apache/airflow/issues/12804#issuecomment-740324839
I could workaround it by overriding hook methods in DAG with the following code. I'll send a pull request to fix in main code as well.
```python
from airflow.providers.google.cloud.hooks.dataproc import DataprocHook
from google.cloud.dataproc_v1beta2 import WorkflowTemplateServiceClient
def get_template_client(self, location=None) -> WorkflowTemplateServiceClient:
"""Returns WorkflowTemplateServiceClient."""
client_options = {'api_endpoint': f'{location}-dataproc.googleapis.com:443'} if location and location != 'global' else None
return WorkflowTemplateServiceClient(
credentials=self._get_credentials(), client_info=self.client_info, client_options=client_options
)
def instantiate_workflow_template(
self,
location: str,
template_name: str,
project_id: str,
version=None,
request_id=None,
parameters=None,
retry=None,
timeout=None,
metadata=None,
):
client = self.get_template_client(location)
name = client.workflow_template_path(project_id, location, template_name)
operation = client.instantiate_workflow_template(
name=name,
version=version,
parameters=parameters,
request_id=request_id,
retry=retry,
timeout=timeout,
metadata=metadata,
)
return operation
DataprocHook.get_template_client = get_template_client
DataprocHook.instantiate_workflow_template = instantiate_workflow_template
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org