You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "mai-nakagawa (via GitHub)" <gi...@apache.org> on 2023/08/15 07:11:19 UTC

[GitHub] [airflow] mai-nakagawa opened a new issue, #33400: BigQuery with impersonation_chain does not accept custom scopes

mai-nakagawa opened a new issue, #33400:
URL: https://github.com/apache/airflow/issues/33400

   ### Apache Airflow version
   
   main (development)
   
   ### What happened
   
   I always face the following error when I try to run a BigQuery query that accesses [connected sheets](https://cloud.google.com/bigquery/docs/connected-sheets), when I use `impersonation_chain`.
   ```
     File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 2203, in run_query
       job = self.insert_job(configuration=configuration, project_id=self.project_id)
     File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/common/hooks/base_google.py", line 439, in inner_wrapper
       return func(self, *args, **kwargs)
     File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 1571, in insert_job
       job.result(timeout=timeout, retry=retry)
     File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py", line 1499, in result
       do_get_result()
     File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py", line 1489, in do_get_result
       super(QueryJob, self).result(retry=retry, timeout=timeout)
     File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/bigquery/job/base.py", line 728, in result
       return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
     File "/opt/python3.8/lib/python3.8/site-packages/google/api_core/future/polling.py", line 137, in result
       raise self._exception
   google.api_core.exceptions.Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.
   ```
   
   I think it's because it always uses a default scope: `https://www.googleapis.com/auth/cloud-platform`. We can set scopes with Airflow connections. However, we cannot set scopes with `impersonation_chain`.
   
   ### What you think should happen instead
   
   I would like the operators and hooks to accept custom scope - `https://www.googleapis.com/auth/drive` in this case.
   
   ### How to reproduce
   
   1. Prepare a [connected sheet](https://cloud.google.com/bigquery/docs/connected-sheets).
   2. Run a task with BigQueryInsertJobOperator (or the like) to run a BigQuery query against the connected sheet, using `impersonation_chain`.
   3. You'll face the error:
       ```
       403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.
       ```
   
   ### Operating System
   
   Linux
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Google Cloud Composer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] nathadfield commented on issue #33400: BigQuery with impersonation_chain does not accept custom scopes

Posted by "nathadfield (via GitHub)" <gi...@apache.org>.
nathadfield commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1678562158

   No problem.  I'll ask around to see if other people have some thoughts on this too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] BigQuery with impersonation_chain does not accept custom scopes [airflow]

Posted by "mai-nakagawa (via GitHub)" <gi...@apache.org>.
mai-nakagawa commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1849156253

   Yes, it picks up scopes from Airflow Connection's field. The problem is that we cannot set scopes with impersonation_chain, as written in the description field of this GitHub issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] BigQuery with impersonation_chain does not accept custom scopes [airflow]

Posted by "nathadfield (via GitHub)" <gi...@apache.org>.
nathadfield commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1748918041

   @aritra24 No problem, I've been there myself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] nathadfield commented on issue #33400: BigQuery with impersonation_chain does not accept custom scopes

Posted by "nathadfield (via GitHub)" <gi...@apache.org>.
nathadfield commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1678522826

   @mai-nakagawa Thanks for logging this.  I also know this is a problem so I'm keen to see if this can be addressed.  Are you aware of what the solution to this might be?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] BigQuery with impersonation_chain does not accept custom scopes [airflow]

Posted by "nathadfield (via GitHub)" <gi...@apache.org>.
nathadfield commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1748904434

   @aritra24 Are you planning trying to implement this or should we un-assign you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] BigQuery with impersonation_chain does not accept custom scopes [airflow]

Posted by "aritra24 (via GitHub)" <gi...@apache.org>.
aritra24 commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1774979003

   @nathadfield this might need to be assigned to someone else, I tried it out for a few days now and unfortunately my lack of experience with gcp is really slowing down progress and it might be better handled by someone with better grasp on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] nathadfield commented on issue #33400: BigQuery with impersonation_chain does not accept custom scopes

Posted by "nathadfield (via GitHub)" <gi...@apache.org>.
nathadfield commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1680213843

   @aritra24 I think that would be most welcomed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] BigQuery with impersonation_chain does not accept custom scopes [airflow]

Posted by "nathadfield (via GitHub)" <gi...@apache.org>.
nathadfield commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1845598109

   @pierre-comalada No.  This is currently looking for someone with enough time and desire to take it on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mai-nakagawa commented on issue #33400: BigQuery with impersonation_chain does not accept custom scopes

Posted by "mai-nakagawa (via GitHub)" <gi...@apache.org>.
mai-nakagawa commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1678530406

   @nathadfield Thanks for addressing the matter.
   
   Airflow connection already have a functionality to set scope. So, the only `impersonation_chain` needs scope. My quick idea is to create a class for impersonation_chain to keep the service account email address and scope as follows. What do you think?
   ```
   @dataclass
   class ImpersonationChain:
       chain: ImpersonationServiceAccountWithScope | Sequence[ImpersonationServiceAccountWithScope]
   
   
   @dataclass
   class ImpersonationServiceAccountWithScope:
       email_address: str
       scope: str | None = None
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] nathadfield commented on issue #33400: BigQuery with impersonation_chain does not accept custom scopes

Posted by "nathadfield (via GitHub)" <gi...@apache.org>.
nathadfield commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1678536544

   @mai-nakagawa Ah, ok!  Well, I would suggest that you make a change and raise a PR then it will no doubt come up for discussion with the committers.  I will assign the issue to you and look forward to this improvement! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] BigQuery with impersonation_chain does not accept custom scopes [airflow]

Posted by "buu-nguyen (via GitHub)" <gi...@apache.org>.
buu-nguyen commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1848596780

   > Possible workaround:
   > 
   >     * Step1: Extend a BigQueryHook class and overwrite [`GoogleBaseHook#scopes`](https://github.com/apache/airflow/blob/0f73647bdab79ac6c30961222924f6166f75b55a/airflow/providers/google/common/hooks/base_google.py#L395-L404) method as follows:
   >       ```python
   >       class BigQueryHookWithScopes(BigQueryHook):
   >         def __init__(self, scopes: Sequence[str], *args, **kwargs):
   >             super().__init__(*args, **kwargs)
   >             self._scopes = scopes
   >       
   >         @property
   >         def scopes(self) -> Sequence[str]:
   >             return self._scopes
   >       ```
   > 
   >     * Step2: Extend a BigQuery related Operators to use the above hook as follows:
   >       ```python
   >       class BigQueryExecuteQueryOperatorWithScope(BigQueryExecuteQueryOperator):
   >           def __init__(self, scopes, *args, **kwargs):
   >               super().__init__(*args, **kwargs)
   >               self.scopes = scopes
   >       
   >           def execute(self, context):
   >               self.hook = BigQueryHookWithScopes(
   >                   scopes=self.scopes,
   >                   gcp_conn_id=self.gcp_conn_id,
   >                   use_legacy_sql=self.use_legacy_sql,
   >                   delegate_to=self.delegate_to,
   >                   location=self.location,
   >                   impersonation_chain=self.impersonation_chain,
   >               )
   >               super().execute(context)
   >       ```
   
   Hey, thanks for the workaround. I just noticed that GoogleBaseHook seems to already pick up scopes from connection's field(https://github.com/apache/airflow/blob/0f73647bdab79ac6c30961222924f6166f75b55a/airflow/providers/google/common/hooks/base_google.py#L402C44-L402C44).
   Is there a specific reason to override it? Just curious about this approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] BigQuery with impersonation_chain does not accept custom scopes [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #33400: BigQuery with impersonation_chain does not accept custom scopes
URL: https://github.com/apache/airflow/issues/33400


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] phanikumv commented on issue #33400: BigQuery with impersonation_chain does not accept custom scopes

Posted by "phanikumv (via GitHub)" <gi...@apache.org>.
phanikumv commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1680386737

   Assigned to you @aritra24 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mai-nakagawa commented on issue #33400: BigQuery with impersonation_chain does not accept custom scopes

Posted by "mai-nakagawa (via GitHub)" <gi...@apache.org>.
mai-nakagawa commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1678561047

   @nathadfield Can you please un-assign me then? I might try to fix by myself when I have time, however, I can't guarantee. Sorry for confusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] BigQuery with impersonation_chain does not accept custom scopes [airflow]

Posted by "aritra24 (via GitHub)" <gi...@apache.org>.
aritra24 commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1748909965

   @nathadfield I've been a bit occupied with work and lost track of this, I can try working on this by early to mid next week I presume. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] aritra24 commented on issue #33400: BigQuery with impersonation_chain does not accept custom scopes

Posted by "aritra24 (via GitHub)" <gi...@apache.org>.
aritra24 commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1680208118

   I can try taking this up if it's available 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] nathadfield commented on issue #33400: BigQuery with impersonation_chain does not accept custom scopes

Posted by "nathadfield (via GitHub)" <gi...@apache.org>.
nathadfield commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1678555836

   @mai-nakagawa Sorry, I didn't want to assume that you would implement this.  It just seemed like you were already working on it.  I can un-assign you if you'd prefer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mai-nakagawa commented on issue #33400: BigQuery with impersonation_chain does not accept custom scopes

Posted by "mai-nakagawa (via GitHub)" <gi...@apache.org>.
mai-nakagawa commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1678606598

   Possible workaround:
   
   - Step1: Extend a BigQueryHook class and overwrite [`GoogleBaseHook#scopes`](https://github.com/apache/airflow/blob/0f73647bdab79ac6c30961222924f6166f75b55a/airflow/providers/google/common/hooks/base_google.py#L395-L404) method as follows:
       ```python
       class BigQueryHookWithScopes(BigQueryHook):
         def __init__(self, scopes: Sequence[str], *args, **kwargs):
             super().__init__(*args, **kwargs)
             self._scopes = scopes
     
         @property
         def scopes(self) -> Sequence[str]:
             return self._scopes
       ```
   - Step2: Extend a BigQuery related Operators to use the above hook as follows:
       ```python
       class BigQueryInsertJobOperatorWithScope(BigQueryInsertJobOperator):
           def __init__(self, scopes, *args, **kwargs):
               super().__init__(*args, **kwargs)
               self.scopes = scopes
       
           def execute(self, context):
               self.hook = BigQueryHookWithScopes(
                   scopes=self.scopes,
                   gcp_conn_id=self.gcp_conn_id,
                   use_legacy_sql=self.use_legacy_sql,
                   delegate_to=self.delegate_to,
                   location=self.location,
                   impersonation_chain=self.impersonation_chain,
               )
               super().execute(context)
       ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mai-nakagawa commented on issue #33400: BigQuery with impersonation_chain does not accept custom scopes

Posted by "mai-nakagawa (via GitHub)" <gi...@apache.org>.
mai-nakagawa commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1678545996

   @nathadfield Oh, ok. I'll try when I have time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] BigQuery with impersonation_chain does not accept custom scopes [airflow]

Posted by "pierre-comalada (via GitHub)" <gi...@apache.org>.
pierre-comalada commented on issue #33400:
URL: https://github.com/apache/airflow/issues/33400#issuecomment-1845592742

   Any updates on this issue ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org