You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/02/17 15:23:45 UTC

[GitHub] [airflow] Scuall1992 opened a new pull request #14276: Refactor Gdrive to GCS operator

Scuall1992 opened a new pull request #14276:
URL: https://github.com/apache/airflow/pull/14276


   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   
   @turbaszek Should I create an issue for this?
   
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#discussion_r578020864



##########
File path: airflow/providers/google/cloud/transfers/gdrive_to_gcs.py
##########
@@ -85,38 +92,14 @@ def __init__(
         **kwargs,
     ) -> None:
         super().__init__(**kwargs)
-        self.destination_bucket = destination_bucket
-        self.destination_object = destination_object
+        self.bucket_name = destination_bucket or bucket_name
+        self.object_name = destination_object or object_name

Review comment:
       ```suggestion
           self.bucket_name = destination_bucket or bucket_name
           if destination_bucket:
               warnings.warn("`destination_bucket ` is deprecated please use `bucket_name`", DeprecationWarning)
               bucket_name = destination_bucket
           self.bucket_name = bucket_name
   ```
   In this way users will get DeprecationWarning in runtime, WDYT? Probably we should do the same for `object_name`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Scuall1992 commented on pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
Scuall1992 commented on pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#issuecomment-782601914


   @turbaszek Is there anything that I need to do in this issue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#discussion_r577732898



##########
File path: airflow/providers/google/cloud/transfers/gdrive_to_gcs.py
##########
@@ -74,8 +73,8 @@ class GoogleDriveToGCSOperator(BaseOperator):
     def __init__(
         self,
         *,
-        destination_bucket: str,
-        destination_object: str,
+        bucket_name: str,
+        object_name: str,

Review comment:
       This is a breaking change. While I agree that you suggestion is reasonable and I support it we will need to keep old arguments to preserve backward compatibly and we need to add deprecation warning. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#discussion_r578332740



##########
File path: airflow/providers/google/cloud/transfers/gdrive_to_gcs.py
##########
@@ -93,7 +94,11 @@ def __init__(
     ) -> None:
         super().__init__(**kwargs)
         self.bucket_name = destination_bucket or bucket_name
+        if destination_bucket:
+            warning.warn("`destination_bucket ` is deprecated please use `bucket_name`", DeprecationWarning)

Review comment:
       ```suggestion
               warnings.warn("`destination_bucket ` is deprecated please use `bucket_name`", DeprecationWarning)
   ```

##########
File path: airflow/providers/google/cloud/transfers/gdrive_to_gcs.py
##########
@@ -93,7 +94,11 @@ def __init__(
     ) -> None:
         super().__init__(**kwargs)
         self.bucket_name = destination_bucket or bucket_name
+        if destination_bucket:
+            warning.warn("`destination_bucket ` is deprecated please use `bucket_name`", DeprecationWarning)
         self.object_name = destination_object or object_name
+        if destination_object:
+            warning.warn("`destination_object ` is deprecated please use `object_name`", DeprecationWarning)

Review comment:
       ```suggestion
               warnings.warn("`destination_object ` is deprecated please use `object_name`", DeprecationWarning)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#discussion_r578468131



##########
File path: airflow/providers/google/cloud/transfers/gdrive_to_gcs.py
##########
@@ -85,38 +93,18 @@ def __init__(
         **kwargs,
     ) -> None:
         super().__init__(**kwargs)
-        self.destination_bucket = destination_bucket
-        self.destination_object = destination_object
+        self.bucket_name = destination_bucket or bucket_name
+        if destination_bucket:
+            warnings.warn("`destination_bucket ` is deprecated please use `bucket_name`", DeprecationWarning)

Review comment:
       ```suggestion
               warnings.warn("`destination_bucket` is deprecated please use `bucket_name`", DeprecationWarning)
   ```

##########
File path: airflow/providers/google/cloud/transfers/gdrive_to_gcs.py
##########
@@ -85,38 +93,18 @@ def __init__(
         **kwargs,
     ) -> None:
         super().__init__(**kwargs)
-        self.destination_bucket = destination_bucket
-        self.destination_object = destination_object
+        self.bucket_name = destination_bucket or bucket_name
+        if destination_bucket:
+            warnings.warn("`destination_bucket ` is deprecated please use `bucket_name`", DeprecationWarning)
+        self.object_name = destination_object or object_name
+        if destination_object:
+            warnings.warn("`destination_object ` is deprecated please use `object_name`", DeprecationWarning)

Review comment:
       ```suggestion
               warnings.warn("`destination_object` is deprecated please use `object_name`", DeprecationWarning)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#discussion_r578389753



##########
File path: tests/providers/google/cloud/transfers/test_gdrive_to_gcs.py
##########
@@ -85,13 +46,14 @@ def test_execute(self, mock_upload_data, mock_gdrive_hook, mock_gcs_hook):
         )
         op.execute(context)
 
-        mock_gdrive_hook.assert_called_once_with(
-            gcp_conn_id=GCP_CONN_ID,
-            delegate_to=None,
-            impersonation_chain=IMPERSONATION_CHAIN,
+        mock_gdrive_hook.return_value.get_file_id.assert_called_once_with(
+            folder_id=FOLDER_ID, file_name=FILE_NAME, drive_id=DRIVE_ID
         )
-        mock_gcs_hook.assert_called_once_with(
-            gcp_conn_id=GCP_CONN_ID,
-            delegate_to=None,
-            impersonation_chain=IMPERSONATION_CHAIN,
+
+        mock_gdrive_hook.return_value.download_file.assert_called_once_with(
+            file_id=mock.ANY, file_handle=mock.ANY

Review comment:
       I think we should test that the `file_id` is passed properly. So I would suggest adding this before `op.execute`:
   ```
   meta = {"id": "123xyz"}
   mock_gdrive_hook.return_value.get_file_id.return_value = meta
   ```
   and then we can do
   ```suggestion
           mock_gdrive_hook.return_value.download_file.assert_called_once_with(
               file_id=meta["id"], file_handle=mock.ANY
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Scuall1992 commented on pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
Scuall1992 commented on pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#issuecomment-782811628


   @turbaszek  I did it. Can you check please?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#issuecomment-782828206


   @Scuall1992 great job! 👏 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#issuecomment-782694108


   > How can I restart checks?
   
   I restarted them. In general `git commit --amend --no-edit` and `git push --force-with-lease` will trigger them once more


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Scuall1992 commented on a change in pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
Scuall1992 commented on a change in pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#discussion_r577774498



##########
File path: airflow/providers/google/cloud/transfers/gdrive_to_gcs.py
##########
@@ -74,8 +73,8 @@ class GoogleDriveToGCSOperator(BaseOperator):
     def __init__(
         self,
         *,
-        destination_bucket: str,
-        destination_object: str,
+        bucket_name: str,
+        object_name: str,

Review comment:
       Okay




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Scuall1992 commented on pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
Scuall1992 commented on pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#issuecomment-781495705


   > Only a small issue, otherwise this looks really good to me! 🚀
   
   Thank you for your help. Do you have any other issue for me?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#discussion_r577733707



##########
File path: airflow/providers/google/cloud/transfers/gdrive_to_gcs.py
##########
@@ -129,4 +104,10 @@ def execute(self, context):
             delegate_to=self.delegate_to,
             impersonation_chain=self.impersonation_chain,
         )
-        self._upload_data(gdrive_hook=gdrive_hook, gcs_hook=gcs_hook)
+        file_metadata = gdrive_hook.get_file_id(
+            folder_id=self.folder_id, file_name=self.file_name, drive_id=self.drive_id
+        )
+        with gcs_hook.provide_file_and_upload(
+            bucket_name=self.bucket_name, object_name=self.object_name
+        ) as file:
+            gdrive_hook.download_file(file_id=file_metadata["id"], file_handle=file)

Review comment:
       Should we also adjust tests?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#issuecomment-782899636


   @Scuall1992 there's a loot of `good-first-issues` that need some love:
   https://github.com/apache/airflow/contribute
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#issuecomment-781201859


   [The Workflow run](https://github.com/apache/airflow/actions/runs/577595532) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Scuall1992 commented on pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
Scuall1992 commented on pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#issuecomment-782846847


   @turbaszek thank you. Do you have any ideas what I can do next?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Scuall1992 commented on a change in pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
Scuall1992 commented on a change in pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#discussion_r578589715



##########
File path: airflow/providers/google/cloud/transfers/gdrive_to_gcs.py
##########
@@ -85,38 +93,18 @@ def __init__(
         **kwargs,
     ) -> None:
         super().__init__(**kwargs)
-        self.destination_bucket = destination_bucket
-        self.destination_object = destination_object
+        self.bucket_name = destination_bucket or bucket_name
+        if destination_bucket:
+            warnings.warn("`destination_bucket ` is deprecated please use `bucket_name`", DeprecationWarning)
+        self.object_name = destination_object or object_name
+        if destination_object:
+            warnings.warn("`destination_object ` is deprecated please use `object_name`", DeprecationWarning)

Review comment:
       I can do it in an 3-4 hours




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#issuecomment-781387788


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Scuall1992 commented on a change in pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
Scuall1992 commented on a change in pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#discussion_r577773760



##########
File path: airflow/providers/google/cloud/transfers/gdrive_to_gcs.py
##########
@@ -129,4 +104,10 @@ def execute(self, context):
             delegate_to=self.delegate_to,
             impersonation_chain=self.impersonation_chain,
         )
-        self._upload_data(gdrive_hook=gdrive_hook, gcs_hook=gcs_hook)
+        file_metadata = gdrive_hook.get_file_id(
+            folder_id=self.folder_id, file_name=self.file_name, drive_id=self.drive_id
+        )
+        with gcs_hook.provide_file_and_upload(
+            bucket_name=self.bucket_name, object_name=self.object_name
+        ) as file:
+            gdrive_hook.download_file(file_id=file_metadata["id"], file_handle=file)

Review comment:
       Yes, I'll do it today




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#issuecomment-782752969


   @Scuall1992 can you try rebasing onto current master? It's possible that we have some problems with CI...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Scuall1992 commented on pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
Scuall1992 commented on pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#issuecomment-782735086


   I can't figure out what is go wrong


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek merged pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
turbaszek merged pull request #14276:
URL: https://github.com/apache/airflow/pull/14276


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Scuall1992 commented on a change in pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
Scuall1992 commented on a change in pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#discussion_r578188333



##########
File path: airflow/providers/google/cloud/transfers/gdrive_to_gcs.py
##########
@@ -85,38 +92,14 @@ def __init__(
         **kwargs,
     ) -> None:
         super().__init__(**kwargs)
-        self.destination_bucket = destination_bucket
-        self.destination_object = destination_object
+        self.bucket_name = destination_bucket or bucket_name
+        self.object_name = destination_object or object_name

Review comment:
       I agree. Good suggestion




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#discussion_r578020864



##########
File path: airflow/providers/google/cloud/transfers/gdrive_to_gcs.py
##########
@@ -85,38 +92,14 @@ def __init__(
         **kwargs,
     ) -> None:
         super().__init__(**kwargs)
-        self.destination_bucket = destination_bucket
-        self.destination_object = destination_object
+        self.bucket_name = destination_bucket or bucket_name
+        self.object_name = destination_object or object_name

Review comment:
       ```suggestion
           self.bucket_name = destination_bucket or bucket_name
           if destination_bucket:
               warnings.warn("`destination_bucket ` is deprecated please use `bucket_name`", DeprecationWarning)
               bucket_name = destination_bucket
           self.bucket_name = bucket_name
   ```
   In this way users will get DeprecationWarning in runtime, WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Scuall1992 commented on pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
Scuall1992 commented on pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#issuecomment-782693404


   How can I restart checks?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14276: Refactor Gdrive to GCS operator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14276:
URL: https://github.com/apache/airflow/pull/14276#discussion_r578332680



##########
File path: airflow/providers/google/cloud/transfers/gdrive_to_gcs.py
##########
@@ -15,6 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from logging import warning

Review comment:
       ```suggestion
   import warnings
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org