You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/06 11:50:46 UTC

[GitHub] [airflow] rajaths010494 opened a new pull request, #28763: Add deferrable GCSObjectExistenceSensorAsync

rajaths010494 opened a new pull request, #28763:
URL: https://github.com/apache/airflow/pull/28763

   This PR donates the following big query deferrable operators and sensors developed in [astronomer-providers](https://github.com/astronomer/astronomer-providers) repo to apache airflow.
   
   `GCSObjectExistenceSensorAsync`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kaxil merged pull request #28763: Add deferrable ``GCSObjectExistenceSensorAsync``

Posted by GitBox <gi...@apache.org>.
kaxil merged PR #28763:
URL: https://github.com/apache/airflow/pull/28763


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] rajaths010494 commented on pull request #28763: Add deferrable ``GCSObjectExistenceSensorAsync``

Posted by GitBox <gi...@apache.org>.
rajaths010494 commented on PR #28763:
URL: https://github.com/apache/airflow/pull/28763#issuecomment-1375286437

   When i re-requested for review. It automatically removed reviewers and now i am not able to add them back.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ashb commented on a diff in pull request #28763: Add deferrable ``GCSObjectExistenceSensorAsync``

Posted by GitBox <gi...@apache.org>.
ashb commented on code in PR #28763:
URL: https://github.com/apache/airflow/pull/28763#discussion_r1065712585


##########
airflow/providers/google/cloud/sensors/gcs.py:
##########
@@ -94,6 +95,55 @@ def poke(self, context: Context) -> bool:
         return hook.exists(self.bucket, self.object, self.retry)
 
 
+class GCSObjectExistenceAsyncSensor(GCSObjectExistenceSensor):
+    """
+    Checks for the existence of a file in Google Cloud Storage.
+
+    :param bucket: The Google Cloud Storage bucket where the object is.
+    :param object: The name of the object to check in the Google cloud storage bucket.
+    :param google_cloud_conn_id: The connection ID to use when connecting to Google Cloud Storage.
+    :param delegate_to: The account to impersonate using domain-wide delegation of authority,
+        if any. For this to work, the service account making the request must have
+        domain-wide delegation enabled.
+    :param impersonation_chain: Optional service account to impersonate using short-term
+        credentials, or chained list of accounts required to get the access_token
+        of the last account in the list, which will be impersonated in the request.
+        If set as a string, the account must grant the originating account
+        the Service Account Token Creator IAM role.
+        If set as a sequence, the identities from the list must grant
+        Service Account Token Creator IAM role to the directly preceding identity, with first
+        account from the list granting this role to the originating account (templated).
+    """
+
+    def execute(self, context: Context) -> None:
+        """Airflow runs this method on the worker and defers using the trigger."""
+        self.defer(
+            timeout=timedelta(seconds=self.timeout),
+            trigger=GCSBlobTrigger(
+                bucket=self.bucket,
+                object_name=self.object,
+                poke_interval=self.poke_interval,
+                google_cloud_conn_id=self.google_cloud_conn_id,
+                hook_params={
+                    "delegate_to": self.delegate_to,
+                    "impersonation_chain": self.impersonation_chain,
+                },
+            ),
+            method_name="execute_complete",
+        )
+
+    def execute_complete(self, context: dict[str, Any], event: dict[str, str]) -> str:

Review Comment:
   I think?
   ```suggestion
       def execute_complete(self, context: Context, event: dict[str, str]) -> str:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #28763: Add deferrable ``GCSObjectExistenceSensorAsync``

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #28763:
URL: https://github.com/apache/airflow/pull/28763#discussion_r1064334417


##########
airflow/providers/google/cloud/hooks/gcs.py:
##########
@@ -32,18 +32,21 @@
 from typing import IO, Callable, Generator, Sequence, TypeVar, cast, overload
 from urllib.parse import urlsplit
 
+from aiohttp import ClientSession as ClientSession

Review Comment:
   ```suggestion
   from aiohttp import ClientSession
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] rajaths010494 commented on pull request #28763: Add deferrable ``GCSObjectExistenceSensorAsync``

Posted by GitBox <gi...@apache.org>.
rajaths010494 commented on PR #28763:
URL: https://github.com/apache/airflow/pull/28763#issuecomment-1378261307

   Thanks @vchiapaikeo i will look at this and check [get_credentials_and_project_id](https://github.com/apache/airflow/blob/284cd529898fbadd14308004a0b0cb6f389b4318/airflow/providers/google/common/hooks/base_google.py#L235-L271) which handles the impersonation chain properly.
   and add  support for impersonation chain


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] rajaths010494 commented on a diff in pull request #28763: Add deferrable ``GCSObjectExistenceSensorAsync``

Posted by GitBox <gi...@apache.org>.
rajaths010494 commented on code in PR #28763:
URL: https://github.com/apache/airflow/pull/28763#discussion_r1065729910


##########
airflow/providers/google/cloud/sensors/gcs.py:
##########
@@ -94,6 +95,55 @@ def poke(self, context: Context) -> bool:
         return hook.exists(self.bucket, self.object, self.retry)
 
 
+class GCSObjectExistenceAsyncSensor(GCSObjectExistenceSensor):
+    """
+    Checks for the existence of a file in Google Cloud Storage.
+
+    :param bucket: The Google Cloud Storage bucket where the object is.
+    :param object: The name of the object to check in the Google cloud storage bucket.
+    :param google_cloud_conn_id: The connection ID to use when connecting to Google Cloud Storage.
+    :param delegate_to: The account to impersonate using domain-wide delegation of authority,
+        if any. For this to work, the service account making the request must have
+        domain-wide delegation enabled.
+    :param impersonation_chain: Optional service account to impersonate using short-term
+        credentials, or chained list of accounts required to get the access_token
+        of the last account in the list, which will be impersonated in the request.
+        If set as a string, the account must grant the originating account
+        the Service Account Token Creator IAM role.
+        If set as a sequence, the identities from the list must grant
+        Service Account Token Creator IAM role to the directly preceding identity, with first
+        account from the list granting this role to the originating account (templated).
+    """
+
+    def execute(self, context: Context) -> None:
+        """Airflow runs this method on the worker and defers using the trigger."""
+        self.defer(
+            timeout=timedelta(seconds=self.timeout),
+            trigger=GCSBlobTrigger(
+                bucket=self.bucket,
+                object_name=self.object,
+                poke_interval=self.poke_interval,
+                google_cloud_conn_id=self.google_cloud_conn_id,
+                hook_params={
+                    "delegate_to": self.delegate_to,
+                    "impersonation_chain": self.impersonation_chain,
+                },
+            ),
+            method_name="execute_complete",
+        )
+
+    def execute_complete(self, context: dict[str, Any], event: dict[str, str]) -> str:

Review Comment:
   Modified



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] utkarsharma2 commented on a diff in pull request #28763: Add deferrable ``GCSObjectExistenceSensorAsync``

Posted by GitBox <gi...@apache.org>.
utkarsharma2 commented on code in PR #28763:
URL: https://github.com/apache/airflow/pull/28763#discussion_r1065720529


##########
airflow/providers/google/cloud/hooks/gcs.py:
##########
@@ -1174,3 +1177,14 @@ def _parse_gcs_url(gsurl: str) -> tuple[str, str]:
     # Remove leading '/' but NOT trailing one
     blob = parsed_url.path.lstrip("/")
     return bucket, blob
+
+
+class GCSAsyncHook(GoogleBaseAsyncHook):
+    """GCSAsyncHook run on the trigger worker, inherits from GoogleBaseHookAsync"""
+
+    sync_hook_class = GCSHook

Review Comment:
   @phanikumv If this is a constant should be in caps - `SYNC_HOOK_CLASS`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] rajaths010494 commented on a diff in pull request #28763: Add deferrable ``GCSObjectExistenceSensorAsync``

Posted by GitBox <gi...@apache.org>.
rajaths010494 commented on code in PR #28763:
URL: https://github.com/apache/airflow/pull/28763#discussion_r1064341941


##########
airflow/providers/google/cloud/hooks/gcs.py:
##########
@@ -32,18 +32,21 @@
 from typing import IO, Callable, Generator, Sequence, TypeVar, cast, overload
 from urllib.parse import urlsplit
 
+from aiohttp import ClientSession as ClientSession

Review Comment:
   changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vchiapaikeo commented on pull request #28763: Add deferrable ``GCSObjectExistenceSensorAsync``

Posted by GitBox <gi...@apache.org>.
vchiapaikeo commented on PR #28763:
URL: https://github.com/apache/airflow/pull/28763#issuecomment-1377456226

   Hi @rajaths010494 @kaxil - I wanted to point out a possible bug that is introduced here - it seems like using this approach:
   
   ```py
       async def get_storage_client(self, session: ClientSession) -> Storage:
           """Returns a Google Cloud Storage service object."""
           with await self.service_file_as_context() as file:
               return Storage(service_file=file, session=cast(Session, session))
   ```
   
   does not honor the impersonation_chain when application default credentials (ADC) are being used. We've tried a similar approach from astronomer-providers and encountered issues on our end. I believe this is because in the case that `await self.service_file_as_context() as file` yields None (the fall through [here](https://github.com/apache/airflow/blob/284cd529898fbadd14308004a0b0cb6f389b4318/airflow/providers/google/common/hooks/base_google.py#L484-L513)), the async storage client gets instantiated with this:
   
   ```
   return Storage(service_file=None, session=cast(Session, session))
   ```
   
   TalkIQ's async io storage client has no idea of the impersonation chain on instantiation here --> https://github.com/talkiq/gcloud-aio/blob/4e0006240913ecadde68722b8986f184a9b1adea/storage/gcloud/aio/storage/storage.py#L149-L168
   
   And as a result, the token gets instantiated with ADC without regard for impersonation chain.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org