You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 01:45:22 UTC

[GitHub] [beam] kennknowles opened a new issue, #19442: Beam does not consider BigQuery's processing location when getting query results

kennknowles opened a new issue, #19442:
URL: https://github.com/apache/beam/issues/19442

   When using the BigQuery source with a SQL query in a pipeline, the "processing location" is not taken into consideration and the pipeline fails.
   
   For example, consider the following which uses `BigQuerySource` to read from BigQuery using some SQL. The BigQuery dataset and tables are located in `australia-southeast1`. The query is submitted successfully ([Beam works out the processing location by examining the first table referenced in the query and sets it accordingly](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221)), but when Beam attempts to poll for the job status after it has been submitted, it fails because it doesn't set the `location` to be `australia-southeast1`, which is required by BigQuery:
   
    
   ```
   
   p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, query='SELECT * from `a_project_id.dataset_in_australia.table_in_australia`')
   ```
   
    
   ```
   
   HttpNotFoundError: HttpError accessing <https://www.googleapis.com/bigquery/v2/projects/a_project_id/queries/5ad9cc803baa432290b6cd0203f556d9?alt=json&maxResults=10000>:
   response: <{'status': '404', 'content-length': '328', 'x-xss-protection': '1; mode=block', 'x-content-type-options':
   'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding':
   'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 2019 03:11:32 GMT', 'x-frame-options': 'SAMEORIGIN',
   'alt-svc': 'quic=":443"; ma=2592000; v="46,44,43,39"', 'content-type': 'application/json; charset=UTF-8'}>,
   content <{
     "error": {
       "code": 404,
       "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9",
      
   "errors": [
         {
           "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9",
          
   "domain": "global",
           "reason": "notFound"
         }
       ],
       "status": "NOT_FOUND"
     }
   }
   
   ```
   
    
   
   The problem can be seen/found here:
   
   [https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571](https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571)
   
   [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357)
   
   The location of the job (in this case `australia-southeast1`) needs to set/inferred (or exposed via the API), otherwise its fails.
   
    For reference, Airflow had the same bug/problem: [https://github.com/apache/airflow/pull/4695](https://github.com/apache/airflow/pull/4695)
   
    
   
    
   
   Imported from Jira [BEAM-6910](https://issues.apache.org/jira/browse/BEAM-6910). Original Jira may contain additional context.
   Reported by: polleyg.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org