You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2020/08/01 17:07:04 UTC

[jira] [Commented] (BEAM-10172) BigQuerySource external data source support in non-US regions

    [ https://issues.apache.org/jira/browse/BEAM-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169329#comment-17169329 ] 

Beam JIRA Bot commented on BEAM-10172:
--------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled "stale-P2". If this issue is still affecting you, we care! Please comment and remove the label. Otherwise, in 14 days the issue will be moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation of what these priorities mean.


> BigQuerySource external data source support in non-US regions
> -------------------------------------------------------------
>
>                 Key: BEAM-10172
>                 URL: https://issues.apache.org/jira/browse/BEAM-10172
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>    Affects Versions: 2.18.0
>         Environment: DirectRunner, DataflowRunner
>            Reporter: Boris Shilov
>            Priority: P2
>              Labels: stale-P2
>
> I am attempting to query an [external data source|https://cloud.google.com/bigquery/external-data-sources], a MySQL database that is exposed via the BigQuery API, located in the EU region. I have the following format query string:
> {code:python}
> query = """
>     SELECT * 
>     FROM EXTERNAL_QUERY("my-project-one-253518.eu.external-source", 
>     "SELECT * FROM my schema.mytable;");
>     """
> {code}
> And the following pipeline instantiation:
> {code:python}
>     pcoll = p | "Load " + name >> beam.io.Read(
>         beam.io.BigQuerySource(query=query, use_standard_sql=True)
>     )
> {code}
> When run this, I see the following output:
> {code:python}
> WARNING:root:Dataset my-project-two:temp_dataset_f07dd1398b0443edaa67c360f5be6958 does not exist so we will create it as temporary with location=None
> ERROR:root:Exception at bundle <apache_beam.runners.direct.bundle_factory._Bundle object at 0x127124640>, due to an exception.
>  Traceback (most recent call last):
>   File "venv/lib/python3.7/site-packages/apache_beam/runners/direct/executor.py", line 345, in call
>     finish_state)
>   File "venv/lib/python3.7/site-packages/apache_beam/runners/direct/executor.py", line 385, in attempt_call
>     result = evaluator.finish_bundle()
>   File "venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 323, in finish_bundle
>     bundles = _read_values_to_bundles(reader)
>   File "venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 310, in _read_values_to_bundles
>     read_result = [GlobalWindows.windowed_value(e) for e in reader]
>   File "venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 310, in <listcomp>
>     read_result = [GlobalWindows.windowed_value(e) for e in reader]
>   File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 937, in __iter__
>     flatten_results=self.flatten_results):
>   File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 710, in run_query
>     page_token, location=location)
>   File "venv/lib/python3.7/site-packages/apache_beam/utils/retry.py", line 209, in wrapper
>     return fun(*args, **kwargs)
>   File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 384, in _get_query_results
>     response = self.client.jobs.GetQueryResults(request)
>   File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py", line 312, in GetQueryResults
>     config, request, global_params=global_params)
>   File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 731, in _RunMethod
>     return self.ProcessHttpResponse(method_config, http_response, request)
>   File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse
>     self.__ProcessHttpResponse(method_config, http_response, request))
>   File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 604, in __ProcessHttpResponse
>     http_response, method_config=method_config, request=request)
> apitools.base.py.exceptions.HttpBadRequestError: HttpError accessing <https://www.googleapis.com/bigquery/v2/projects/my-project-two/queries/636272a8e026434d85200b3f14f719ed?alt=json&location=US&maxResults=10000>: response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8', 'date': 'Tue, 02 Jun 2020 11:29:27 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 'alt-svc': 'h3-27=":443"; ma=2592000,h3-25=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q049=":443"; ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"', 'transfer-encoding': 'chunked', 'status': '400', 'content-length': '354', '-content-encoding': 'gzip'}>, content <{
>   "error": {
>     "code": 400,
>     "message": "Cannot read and write in different locations: source: EU, destination: US",
>     "errors": [
>       {
>         "message": "Cannot read and write in different locations: source: EU, destination: US",
>         "domain": "global",
>         "reason": "invalid"
>       }
>     ],
>     "status": "INVALID_ARGUMENT"
>   }
> }
> {code}
> Which likely indicates to me that the logic Beam uses to impute the zone in which to create the temporary dataset fails when confronted with the special syntax for external queries. Therefore it seems like the zone should be exposed as a parameter of BigQuerySource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)