You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "dopieralad (via GitHub)" <gi...@apache.org> on 2023/05/10 05:07:17 UTC

[GitHub] [beam] dopieralad opened a new issue, #26622: [Bug]: BigQuery size estimation does not allow impersonation

dopieralad opened a new issue, #26622:
URL: https://github.com/apache/beam/issues/26622

   ### What happened?
   
   ### Environment
   **_SDK_**: _Python_
   **Runner**: _Dataflow_
   **Connector**: _BigQuery_
   **_Apache Beam_ version**: `2.45.0`, but most likely also present in previous versions
   **_Python_ version**: `3.8.16`, but does not seem to matter in this case
   
   ### Preconditions
   - _Google Cloud Platform_ project _P_,
   - _BigQuery_ table _T_ in any project,
   - _Google Cloud Platform_ account _A_ which can read from table _T_ **and** can create _Dataflow_ and _BigQuery_ jobs in project _P_,
   - _Google Cloud Platform_ account _B_ which can impersonate account _A_, but does not have access to table _T_ nor can create_Dataflow_ and _BigQuery_ jobs in project _P_.
   
   ### Reproduction steps
   1. Implement a `Pipeline` that:
       - Reads from table _T_ (`apache_beam.io.ReadFromBigQuery`),
       - Is executed in project _P_ (`project`),
       - Runs as user _B_ (`service_account`),
       - Impersonates user _A_ (`impersonate_service_account`).
   2. Run the pipeline as user _B_,
   3. An error occurs, because the originally configured `PipelineOptions` do not take part in size estimation:
       ```
       Failed to insert job <JobReference    
       jobId: 'beam_bq_job_QUERY_XXX'    
       projectId: 'P'>: HttpError accessing <https://bigquery.googleapis.com/bigquery/v2/projects/P/jobs?alt=json>: response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8', 'date': 'Mon, 08 May 2023 07:14:00 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'status': '403', 'content-length': '486', '-content-encoding': 'gzip'}>, content <{    
        "error": {    
          "code": 403,    
          "message": "Access Denied: Project P: User does not have bigquery.jobs.create permission in project P.",    
            "errors": [    
              {    
                "message": "Access Denied: Project P: User does not have bigquery.jobs.create permission in project P.",    
                "domain": "global",    
                "reason": "accessDenied"    
              }    
            ],    
            "status": "PERMISSION_DENIED"    
          }    
        }    
       ```
       This step does not fail the pipeline, as size estimation is best effort and can just return `None`.
   4. Another error occurs, because of _Google Cloud Platform_ token caching. Even though `PipelineOptions` are configured properly, the first request for _Google Cloud Platform_ token was made with empty options during _BigQuery_ table size estimation in step 3. Subsequent calls utilize the cache and do not even try to get a token for the impersonated user.
       ```
       apitools.base.py.exceptions.HttpForbiddenError: HttpError accessing <https://dataflow.googleapis.com/v1b3/projects/P/locations/europe-west1/jobs?alt=json>: response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8', 'date': 'Mon, 08 May 2023 07:14:21 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'status': '403', 'content-length': '314', '-content-encoding': 'gzip'}>, content <{
         "error": {
           "code": 403,
           "message": "(4598d41a53139eed): Could not create workflow; user does not have write access to project: P Causes: (4598d41a53139582): Permission 'dataflow.jobs.create' denied on project: 'P'",
           "status": "PERMISSION_DENIED"
         }
       }
       ```
   
   ### Source code
   1. [`apache_beam/io/gcp/bigquery.py/_CustomBigQuerySource.estimate_size`](https://github.com/apache/beam/blob/v2.45.0/sdks/python/apache_beam/io/gcp/bigquery.py#L698),
   2. [`apache_beam/io/gcp/bigquery_tools.py/BigQueryWrapper.__init__`](https://github.com/apache/beam/blob/v2.45.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L335),
   3. [`apache_beam/internal/gcp/auth.py/_Credentials.get_service_credentials`](https://github.com/apache/beam/blob/v2.45.0/sdks/python/apache_beam/internal/gcp/auth.py#L124).
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [X] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [X] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [X] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] dopieralad commented on issue #26622: [Bug]: BigQuery size estimation does not allow impersonation

Posted by "dopieralad (via GitHub)" <gi...@apache.org>.
dopieralad commented on issue #26622:
URL: https://github.com/apache/beam/issues/26622#issuecomment-1543430554

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn closed issue #26622: [Bug]: BigQuery size estimation does not allow impersonation

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn closed issue #26622: [Bug]: BigQuery size estimation does not allow impersonation
URL: https://github.com/apache/beam/issues/26622


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org