You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "josh-fell (via GitHub)" <gi...@apache.org> on 2023/03/03 02:21:49 UTC

[GitHub] [airflow] josh-fell commented on a diff in pull request #29694: `GoogleDriveHook`: Fixing log message + adding more verbose documentation

josh-fell commented on code in PR #29694:
URL: https://github.com/apache/airflow/pull/29694#discussion_r1123963291


##########
airflow/providers/google/suite/hooks/drive.py:
##########
@@ -243,8 +300,21 @@ def upload_file(
             .create(body=file_metadata, media_body=media, fields="id", supportsAllDrives=True)
             .execute(num_retries=self.num_retries)
         )
-        self.log.info("File %s uploaded to gdrive://%s.", local_location, remote_location)
-        return file.get("id")
+        file_id = file.get("id")
+
+        upload_location = remote_location
+
+        if folder_id != "root":
+            try:
+                upload_location = self._resolve_file_path(folder_id)
+            except (GoogleApiClientError, AirflowException) as e:

Review Comment:
   I agree with @aru-trackunit. I don't think the task should fail simply because the file path can't be resolved for logging purposes. It's unfortunate the Google API doesn't surface this information easily. As long as the task does what it's supposed to do, Airflow shouldn't care what is in its user-facing task logs. Having this logic configurable also helps too.
   
   I do agree with @eladkal though that `AirflowException` is irrelevant. The operator is not failing for some orchestration or execution-related issue (IMO I think we use `AirflowException` with too broad of a brush sometimes). Maybe a custom exception is raised (e.g. `NestedFolderLimitReachedException` or something similar) just in case some real AirflowException is swallowed unknowingly?



##########
airflow/providers/google/suite/hooks/drive.py:
##########
@@ -159,6 +161,59 @@ def exists(
             )
         )
 
+    def _get_file_info(self, file_id: str):
+        """
+        Returns Google API file_info object containing id, name, parents in the response
+        https://developers.google.com/drive/api/v3/reference/files/get
+
+        :param file_id: id as string representation of interested file
+        :return: file
+        """
+        file_info = (
+            self.get_conn()
+            .files()
+            .get(
+                fileId=file_id,
+                fields="id,name,parents",
+                supportsAllDrives=True,
+            )
+            .execute(num_retries=2)
+        )
+        return file_info
+
+    def _resolve_file_path(self, file_id: str) -> str:
+        """
+        Returns the full Google Drive path for given file_id
+
+        :param file_id: The id of a file in Google Drive
+        :return: Google Drive full path for a file
+        """
+        MAX_NESTED_FOLDERS_LEVEL = 20  # Link to docs https://support.google.com/a/users/answer/7338880?hl=en

Review Comment:
   This still feels magical. But, out of curiosity, if the documentation states "A folder in a shared drive can have up to 20 levels of nested folders", does that mean users _can't_ physically create more than that number of nested folders or is it really a _shouldn't_ create situation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org