You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/05 16:04:50 UTC

[GitHub] [airflow] TobKed opened a new pull request #11284: Improve GCSToSFTPOperator paths handling

TobKed opened a new pull request #11284:
URL: https://github.com/apache/airflow/pull/11284


   Improve GCSToSFTPOperator paths handling.
   
   Now for every file on GCS is created separate folder on SFTP.
   
   now:
   ```
   copy_file_from_gcs_to_sftp = GCSToSFTPOperator(
           task_id="file-copy-gsc-to-sftp",
           source_bucket="bucket_name",
           source_object="folder/subfolder/file.txt",
           destination_path="/tmp",
       )
   # downloads file to "/tmp/folder/subfolder/file.txt"
   ```
   
   change:
   ```
   copy_file_from_gcs_to_sftp = GCSToSFTPOperator(
           task_id="file-copy-gsc-to-sftp",
           source_bucket="bucket_name",
           source_object="folder/subfolder/file.txt",
           destination_path="/tmp",
       )
   # downloads file to "/tmp/file.txt"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-731599265


   [The Workflow run](https://github.com/apache/airflow/actions/runs/376235777) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r501508699



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -129,15 +134,25 @@ def execute(self, context):
                 )
 
             prefix, delimiter = self.source_object.split(WILDCARD, 1)
+            prefix_dirname = os.path.dirname(prefix)
+
             objects = gcs_hook.list(self.source_bucket, prefix=prefix, delimiter=delimiter)
 
             for source_object in objects:
-                destination_path = os.path.join(self.destination_path, source_object)
+                if self.ignore_gcs_path_in_destination:
+                    source_object_basename = os.path.relpath(source_object, start=prefix_dirname)
+                    destination_path = os.path.join(self.destination_path, source_object_basename)
+                else:
+                    destination_path = os.path.join(self.destination_path, source_object)
                 self._copy_single_object(gcs_hook, sftp_hook, source_object, destination_path)
 
             self.log.info("Done. Uploaded '%d' files to %s", len(objects), self.destination_path)
         else:
-            destination_path = os.path.join(self.destination_path, self.source_object)
+            if self.ignore_gcs_path_in_destination:
+                source_object_basename = os.path.basename(self.source_object)
+                destination_path = os.path.join(self.destination_path, source_object_basename)
+            else:
+                destination_path = os.path.join(self.destination_path, self.source_object)

Review comment:
       Thanks @turbaszek , it is a great suggestion.
   I modified it slightly: https://github.com/apache/airflow/pull/11284/commits/9b9ef602a170125344c3b03c1eaca5c077719eb5

##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       Sure. Example was added.

##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       I renamed it to `keep_directory_structure`. Thanks @mik-laj !




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
TobKed commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-704243237


   @olchas , yes, you are right. In your example of wildcard folders will be created.
   
   @turbaszek I made this behavior optional for backward compatibility. PTAL


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalslowikowski00 commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
michalslowikowski00 commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r502275241



##########
File path: tests/providers/google/cloud/transfers/test_gcs_to_sftp.py
##########
@@ -30,140 +31,291 @@
 SFTP_CONN_ID = "SFTP_CONN_ID"
 DELEGATE_TO = "DELEGATE_TO"
 IMPERSONATION_CHAIN = ["ACCOUNT_1", "ACCOUNT_2", "ACCOUNT_3"]
-
 TEST_BUCKET = "test-bucket"
-SOURCE_OBJECT_WILDCARD_FILENAME = "test_object*.txt"
-SOURCE_OBJECT_NO_WILDCARD = "test_object.txt"
-SOURCE_OBJECT_MULTIPLE_WILDCARDS = "csv/*/test_*.csv"
-
-SOURCE_FILES_LIST = [
-    "test_object/file1.txt",
-    "test_object/file2.txt",
-    "test_object/file3.json",
-]
-
 DESTINATION_SFTP = "destination_path"
 
 
 # pylint: disable=unused-argument
 class TestGoogleCloudStorageToSFTPOperator(unittest.TestCase):
+    @parameterized.expand(

Review comment:
       Perfect 💪 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalslowikowski00 commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
michalslowikowski00 commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r502275241



##########
File path: tests/providers/google/cloud/transfers/test_gcs_to_sftp.py
##########
@@ -30,140 +31,291 @@
 SFTP_CONN_ID = "SFTP_CONN_ID"
 DELEGATE_TO = "DELEGATE_TO"
 IMPERSONATION_CHAIN = ["ACCOUNT_1", "ACCOUNT_2", "ACCOUNT_3"]
-
 TEST_BUCKET = "test-bucket"
-SOURCE_OBJECT_WILDCARD_FILENAME = "test_object*.txt"
-SOURCE_OBJECT_NO_WILDCARD = "test_object.txt"
-SOURCE_OBJECT_MULTIPLE_WILDCARDS = "csv/*/test_*.csv"
-
-SOURCE_FILES_LIST = [
-    "test_object/file1.txt",
-    "test_object/file2.txt",
-    "test_object/file3.json",
-]
-
 DESTINATION_SFTP = "destination_path"
 
 
 # pylint: disable=unused-argument
 class TestGoogleCloudStorageToSFTPOperator(unittest.TestCase):
+    @parameterized.expand(

Review comment:
       Perfect 💪 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r500823119



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       For example: If you want to copy file from bucket: `gs://bucket_name/data/2020/file.txt` to `/tmp` directory, by default it will be copied reflecting object path on the bucket, so file path on sftp will be `/tmp/data/2020/file.txt`. If parameter `ignore_gcs_path_in_destination` will be set to True the file path on sftp will be `/tmp/file.txt`.
   
   I am open for proposals for better name for this parameter :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r503201814



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -35,6 +35,31 @@ class GCSToSFTPOperator(BaseOperator):
     """
     Transfer files from a Google Cloud Storage bucket to SFTP server.
 
+    **Example**: ::
+
+        with models.DAG(
+            "example_gcs_to_sftp",
+            start_date=datetime(2020, 6, 19),
+            schedule_interval=None,
+        ) as dag:
+            # downloads file to /tmp/folder/subfolder/file.txt
+            copy_file_from_gcs_to_sftp = GCSToSFTPOperator(
+                task_id="file-copy-gsc-to-sftp",
+                source_bucket="test-gcs-sftp-bucket-name",
+                source_object="folder/subfolder/file.txt",
+                destination_path="/tmp/sftp",
+            )
+
+            # moves file to /tmp/file.txt
+            move_file_from_gcs_to_sftp = GCSToSFTPOperator(
+                task_id="file-move-gsc-to-sftp",
+                source_bucket=BUCKET_SRC,
+                source_object=OBJECT_SRC_2,
+                destination_path=DESTINATION_PATH_1,

Review comment:
       Can you use values here instead of undefined constants?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
TobKed commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-707235725


   I rebased on the latest master and fixed the logic mistake. Previously i forget that after changing parameter from `ignore_gcs_path_in_destination` to `keep_directory_structure` the logic of resolving paths should be inverted.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-764765423






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r501508699



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -129,15 +134,25 @@ def execute(self, context):
                 )
 
             prefix, delimiter = self.source_object.split(WILDCARD, 1)
+            prefix_dirname = os.path.dirname(prefix)
+
             objects = gcs_hook.list(self.source_bucket, prefix=prefix, delimiter=delimiter)
 
             for source_object in objects:
-                destination_path = os.path.join(self.destination_path, source_object)
+                if self.ignore_gcs_path_in_destination:
+                    source_object_basename = os.path.relpath(source_object, start=prefix_dirname)
+                    destination_path = os.path.join(self.destination_path, source_object_basename)
+                else:
+                    destination_path = os.path.join(self.destination_path, source_object)
                 self._copy_single_object(gcs_hook, sftp_hook, source_object, destination_path)
 
             self.log.info("Done. Uploaded '%d' files to %s", len(objects), self.destination_path)
         else:
-            destination_path = os.path.join(self.destination_path, self.source_object)
+            if self.ignore_gcs_path_in_destination:
+                source_object_basename = os.path.basename(self.source_object)
+                destination_path = os.path.join(self.destination_path, source_object_basename)
+            else:
+                destination_path = os.path.join(self.destination_path, self.source_object)

Review comment:
       Thanks @turbaszek , it is a great suggestion.
   I modified it slightly: https://github.com/apache/airflow/pull/11284/commits/9b9ef602a170125344c3b03c1eaca5c077719eb5




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-731333037


   [The Workflow run](https://github.com/apache/airflow/actions/runs/374916164) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] marksworn commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
marksworn commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-781331423


   This does not appear to be working for me - the documentation is unclear which should be used to exclude the GCS path structure from the destination path - True or False? I have tested with both and with both it is still adding elements of the GCS filepath that I do not want to include.
   
   This is a useful feature in some cases, but it feels like it should be default that the user controls definition of the source and destination paths without any unexpected behaviour.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r502267420



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       ```suggestion
       :param use_relative_paths: (Optional) When set to False the path of the file
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-734798291


   > @mik-laj any feedback about your tests?
   
   As far as I know, they were successful. Tobias has more details.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r501182178



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -129,15 +134,25 @@ def execute(self, context):
                 )
 
             prefix, delimiter = self.source_object.split(WILDCARD, 1)
+            prefix_dirname = os.path.dirname(prefix)
+
             objects = gcs_hook.list(self.source_bucket, prefix=prefix, delimiter=delimiter)
 
             for source_object in objects:
-                destination_path = os.path.join(self.destination_path, source_object)
+                if self.ignore_gcs_path_in_destination:
+                    source_object_basename = os.path.relpath(source_object, start=prefix_dirname)
+                    destination_path = os.path.join(self.destination_path, source_object_basename)
+                else:
+                    destination_path = os.path.join(self.destination_path, source_object)
                 self._copy_single_object(gcs_hook, sftp_hook, source_object, destination_path)
 
             self.log.info("Done. Uploaded '%d' files to %s", len(objects), self.destination_path)
         else:
-            destination_path = os.path.join(self.destination_path, self.source_object)
+            if self.ignore_gcs_path_in_destination:
+                source_object_basename = os.path.basename(self.source_object)
+                destination_path = os.path.join(self.destination_path, source_object_basename)
+            else:
+                destination_path = os.path.join(self.destination_path, self.source_object)

Review comment:
       How about
   ```python
   def _resolve_destination_path(self, prefix=None):
       if self.ignore_gcs_path_in_destination:
           source_object_basename = os.path.basename(self.source_object, start=prefix)
           return os.path.join(self.destination_path, source_object_basename)
       else:
           return os.path.join(self.destination_path, self.source_object)
   ```
   then we can avoid duplication of the code




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r503419950



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -35,6 +35,31 @@ class GCSToSFTPOperator(BaseOperator):
     """
     Transfer files from a Google Cloud Storage bucket to SFTP server.
 
+    **Example**: ::
+
+        with models.DAG(
+            "example_gcs_to_sftp",
+            start_date=datetime(2020, 6, 19),
+            schedule_interval=None,
+        ) as dag:
+            # downloads file to /tmp/folder/subfolder/file.txt
+            copy_file_from_gcs_to_sftp = GCSToSFTPOperator(
+                task_id="file-copy-gsc-to-sftp",
+                source_bucket="test-gcs-sftp-bucket-name",
+                source_object="folder/subfolder/file.txt",
+                destination_path="/tmp/sftp",
+            )
+
+            # moves file to /tmp/file.txt
+            move_file_from_gcs_to_sftp = GCSToSFTPOperator(
+                task_id="file-move-gsc-to-sftp",
+                source_bucket=BUCKET_SRC,
+                source_object=OBJECT_SRC_2,
+                destination_path=DESTINATION_PATH_1,

Review comment:
       I missed it. Thank you! FIxed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r502399721



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       I renamed it to `keep_directory_structure`. Thanks @mik-laj !




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r502267420



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       ```suggestion
       :param use_relative_names: (Optional) When set to False the path of the file
   ```

##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       ```suggestion
       :param use_relative_path: (Optional) When set to False the path of the file
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-734795143


   The PR needs to run all tests because it modifies core of Airflow! Please rebase it to latest master or ask committer to re-run it!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r502272071



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       or keep_directory_structure




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalslowikowski00 commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
michalslowikowski00 commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r500784071



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       It confused me a little bit. Can you explain this parameter?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-764765423


   [The Workflow run](https://github.com/apache/airflow/actions/runs/501444044) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-764772645


   [The Workflow run](https://github.com/apache/airflow/actions/runs/501505474) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
TobKed commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-783184749


   @marksworn I am sorry to hear that. Would like to create issue with link to this PR? Would you like to create fix for it? I would be happy to assist you with this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r502267420



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       ```suggestion
       :param use_basename: (Optional) When set to False the path of the file
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-764772514


   [The Workflow run](https://github.com/apache/airflow/actions/runs/501505061) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-734795127


   @mik-laj any feedback about your tests?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
TobKed commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-703729907


   cc @mik-laj @turbaszek @michalslowikowski00 @olchas 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r502267420



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       ```suggestion
       :param use_basename: (Optional) When set to False the path of the file
   ```

##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       Can you add an example to the description? 

##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       ```suggestion
       :param use_relative_names: (Optional) When set to False the path of the file
   ```

##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       ```suggestion
       :param use_relative_path: (Optional) When set to False the path of the file
   ```

##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       ```suggestion
       :param use_relative_paths: (Optional) When set to False the path of the file
   ```

##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       or keep_directory_structure




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-721446338


   Can you please rebase your PR on latest Master since we have applied [Black](https://github.com/apache/airflow/commit/4e8f9cc8d02b29c325b8a5a76b4837671bdf5f68) and [PyUpgrade](https://github.com/apache/airflow/commit/8c42cf1b00c90f0d7f11b8a3a455381de8e003c5) on Master.
   
   It will help if your squash your commits into single commit first so that there are less conflicts.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
olchas commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-704180364


   > Should we make this configurable to avoid breaking change? I see the point of downloading a single file but this may be problematic when downloading multiple files with the same name (for example structure like `output/2020-09-20/data.csv` and downloading data for multiple days). WDYT @TobKed ?
   
   From what I understood by reading the code, only the part up to the wildcard character is stripped from the `source_object` path. In other word, when downloading multiple files from your example, you would set `source_object` to something like this: `output/*`. Then, when uploading them to `destination_path`, only the `output` would be stripped from every collected object's path, so you should get multiple directories for multiple days.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalslowikowski00 commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
michalslowikowski00 commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r500849401



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       Got it. I do not have better solution.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-734798291


   > @mik-laj any feedback about your tests?
   
   What do you mean?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #11284:
URL: https://github.com/apache/airflow/pull/11284


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-764767235


   [The Workflow run](https://github.com/apache/airflow/actions/runs/501481679) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] marksworn edited a comment on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
marksworn edited a comment on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-781331423


   This does not appear to be working for me - the documentation is unclear which should be used to exclude the GCS path structure from the destination path - True or False? I have tested with both and with both it is still adding elements of the GCS filepath that I do not want to include.
   
   This is a useful feature in some cases, but it feels like it should be default that the user controls definition of the source and destination paths without any unexpected behaviour.
   
   With `keep_directory_structure=False` and destination set to `/Activity/pending/`:
   
   `Executing copy of gs://<bucket>/Activity/2021/02/activity_20210217.csv to /Activity/pending/Activity/2021/02/activity_20210217.csv`
   
   With `keep_directory_structure=True`:
   
   `Executing copy of gs://<bucket>/Activity/2021/02/activity_20210217.csv to /Activity/pending/Activity/2021/02/activity_20210217.csv`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas edited a comment on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
olchas edited a comment on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-704180364


   > Should we make this configurable to avoid breaking change? I see the point of downloading a single file but this may be problematic when downloading multiple files with the same name (for example structure like `output/2020-09-20/data.csv` and downloading data for multiple days). WDYT @TobKed ?
   
   From what I understood by reading the code, only the part up to the wildcard character is stripped from the `source_object` path. In other word, when downloading multiple files from your example, you would set `source_object` to something like this: `output/*`. Then, when uploading them to `destination_path`, only the `output` would be stripped from every collected object's path, so you should get multiple directories for multiple days in your destination.
   
   Did I get this right, @TobKed?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-703736343


   Should we make this configurable to avoid breaking change? I see the point of downloading a single file but this may be problematic when downloading multiple files with the same name (for example structure like `output/2020-09-20/data.csv` and downloading data for multiple days). WDYT @TobKed ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-703729325


   The CI and PROD Docker Images for the build are prepared in a separate "Build Image" workflow,
   that you will not see in the list of checks (you will see "Wait for images" jobs instead).
   
   You can checks the status of those images in [The workflow run](https://github.com/apache/airflow/actions/runs/289733643)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r502399425



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       Sure. Example was added.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#issuecomment-713621960


   We test it together with the customer. Pllease wait for customer feedback.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #11284: Improve GCSToSFTPOperator paths handling

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #11284:
URL: https://github.com/apache/airflow/pull/11284#discussion_r502267662



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_sftp.py
##########
@@ -52,6 +52,9 @@ class GCSToSFTPOperator(BaseOperator):
     :param destination_path: The sftp remote path. This is the specified directory path for
         uploading to the SFTP server.
     :type destination_path: str
+    :param ignore_gcs_path_in_destination: (Optional) When set to False the path of the file

Review comment:
       Can you add an example to the description? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org