You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 16:22:48 UTC

[GitHub] [beam] damccorm opened a new issue, #20249: BigQuerySource external data source support in non-US regions

damccorm opened a new issue, #20249:
URL: https://github.com/apache/beam/issues/20249

   I am attempting to query an [external data source](https://cloud.google.com/bigquery/external-data-sources), a MySQL database that is exposed via the BigQuery API, located in the EU region. I have the following format query string:
   
   
   ```
   
   query = """
       SELECT * 
       FROM EXTERNAL_QUERY("my-project-one-253518.eu.external-source", 
   
      "SELECT * FROM my schema.mytable;");
       """
   
   ```
   
   
   And the following pipeline instantiation:
   
   
   ```
   
       pcoll = p | "Load " + name >> beam.io.Read(
           beam.io.BigQuerySource(query=query, use_standard_sql=True)
   
      )
   
   ```
   
   
   When run this, I see the following output:
   ```
   
   WARNING:root:Dataset my-project-two:temp_dataset_f07dd1398b0443edaa67c360f5be6958 does not exist so
   we will create it as temporary with location=None
   ERROR:root:Exception at bundle <apache_beam.runners.direct.bundle_factory._Bundle
   object at 0x127124640>, due to an exception.
    Traceback (most recent call last):
     File "venv/lib/python3.7/site-packages/apache_beam/runners/direct/executor.py",
   line 345, in call
       finish_state)
     File "venv/lib/python3.7/site-packages/apache_beam/runners/direct/executor.py",
   line 385, in attempt_call
       result = evaluator.finish_bundle()
     File "venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
   line 323, in finish_bundle
       bundles = _read_values_to_bundles(reader)
     File "venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
   line 310, in _read_values_to_bundles
       read_result = [GlobalWindows.windowed_value(e) for e in reader]
   
    File "venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 310,
   in <listcomp>
       read_result = [GlobalWindows.windowed_value(e) for e in reader]
     File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
   line 937, in __iter__
       flatten_results=self.flatten_results):
     File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
   line 710, in run_query
       page_token, location=location)
     File "venv/lib/python3.7/site-packages/apache_beam/utils/retry.py",
   line 209, in wrapper
       return fun(*args, **kwargs)
     File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
   line 384, in _get_query_results
       response = self.client.jobs.GetQueryResults(request)
     File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
   line 312, in GetQueryResults
       config, request, global_params=global_params)
     File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py",
   line 731, in _RunMethod
       return self.ProcessHttpResponse(method_config, http_response, request)
   
    File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse
   
      self.__ProcessHttpResponse(method_config, http_response, request))
     File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py",
   line 604, in __ProcessHttpResponse
       http_response, method_config=method_config, request=request)
   apitools.base.py.exceptions.HttpBadRequestError:
   HttpError accessing <https://www.googleapis.com/bigquery/v2/projects/my-project-two/queries/636272a8e026434d85200b3f14f719ed?alt=json&location=US&maxResults=10000>:
   response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8',
   'date': 'Tue, 02 Jun 2020 11:29:27 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection':
   '0', 'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 'alt-svc': 'h3-27=":443";
   ma=2592000,h3-25=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q049=":443";
   ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443";
   ma=2592000; v="46,43"', 'transfer-encoding': 'chunked', 'status': '400', 'content-length': '354', '-content-encoding':
   'gzip'}>, content <{
     "error": {
       "code": 400,
       "message": "Cannot read and write in different
   locations: source: EU, destination: US",
       "errors": [
         {
           "message": "Cannot read
   and write in different locations: source: EU, destination: US",
           "domain": "global",
        
     "reason": "invalid"
         }
       ],
       "status": "INVALID_ARGUMENT"
     }
   }
   
   ```
   
   
   Which likely indicates to me that the logic Beam uses to impute the zone in which to create the temporary dataset fails when confronted with the special syntax for external queries. Therefore it seems like the zone should be exposed as a parameter of BigQuerySource.
   
   
   
   
   Imported from Jira [BEAM-10172](https://issues.apache.org/jira/browse/BEAM-10172). Original Jira may contain additional context.
   Reported by: muscovite bob.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org