You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/03/15 08:31:34 UTC

[GitHub] [airflow] ephraimbuddy opened a new pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

ephraimbuddy opened a new pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728
 
 
   - Rewrote the GCSToGCSOperator class to accept a list of objects to copy rather than a single object.
   - Added tests for the GCSToGCSOperator.
   - Made some change in documentation
   
   ---
   Issue link: WILL BE INSERTED BY [boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x ] Description above provides context of the change
   - [x ] Commit message/PR title starts with `[AIRFLOW-NNNN]`. AIRFLOW-NNNN = JIRA ID<sup>*</sup>
   - [x ] Unit tests coverage for changes (not needed for documentation changes)
   - [ x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x ] Relevant documentation is updated including usage instructions.
   - [x ] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   <sup>*</sup> For document-only changes commit message can start with `[AIRFLOW-XXXX]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#issuecomment-600328680
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=h1) Report
   > Merging [#7728](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/49998edd2ff0b64fd1771138fc7d8e835c564a47&el=desc) will **decrease** coverage by `27.09%`.
   > The diff coverage is `13.33%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7728/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #7728       +/-   ##
   ===========================================
   - Coverage   86.99%   59.90%   -27.10%     
   ===========================================
     Files         915      915               
     Lines       44198    44228       +30     
   ===========================================
   - Hits        38451    26494    -11957     
   - Misses       5747    17734    +11987     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `44.44% <13.33%> (-51.21%)` | :arrow_down: |
   | [airflow/providers/amazon/aws/hooks/kinesis.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYW1hem9uL2F3cy9ob29rcy9raW5lc2lzLnB5) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/providers/apache/livy/sensors/livy.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL2xpdnkvc2Vuc29ycy9saXZ5LnB5) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/providers/google/suite/hooks/sheets.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL3N1aXRlL2hvb2tzL3NoZWV0cy5weQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/providers/amazon/aws/sensors/redshift.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYW1hem9uL2F3cy9zZW5zb3JzL3JlZHNoaWZ0LnB5) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/providers/postgres/operators/postgres.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcG9zdGdyZXMvb3BlcmF0b3JzL3Bvc3RncmVzLnB5) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/providers/microsoft/azure/operators/adx.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvbWljcm9zb2Z0L2F6dXJlL29wZXJhdG9ycy9hZHgucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...irflow/providers/amazon/aws/hooks/batch\_waiters.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYW1hem9uL2F3cy9ob29rcy9iYXRjaF93YWl0ZXJzLnB5) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ow/providers/amazon/aws/sensors/cloud\_formation.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYW1hem9uL2F3cy9zZW5zb3JzL2Nsb3VkX2Zvcm1hdGlvbi5weQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...w/providers/apache/hive/operators/mysql\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL2hpdmUvb3BlcmF0b3JzL215c3FsX3RvX2hpdmUucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [305 more](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=footer). Last update [49998ed...1813ca4](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#issuecomment-600328680
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=h1) Report
   > Merging [#7728](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/4979e5ce2fe11df963882db32c2ad394eaf53b58&el=desc) will **decrease** coverage by `0.77%`.
   > The diff coverage is `86.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7728/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7728      +/-   ##
   ==========================================
   - Coverage   86.99%   86.22%   -0.78%     
   ==========================================
     Files         915      915              
     Lines       44198    44228      +30     
   ==========================================
   - Hits        38452    38136     -316     
   - Misses       5746     6092     +346     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `90.90% <86.66%> (-4.75%)` | :arrow_down: |
   | [...flow/providers/apache/cassandra/hooks/cassandra.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL2Nhc3NhbmRyYS9ob29rcy9jYXNzYW5kcmEucHk=) | `21.51% <0.00%> (-72.16%)` | :arrow_down: |
   | [...w/providers/apache/hive/operators/mysql\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL2hpdmUvb3BlcmF0b3JzL215c3FsX3RvX2hpdmUucHk=) | `35.84% <0.00%> (-64.16%)` | :arrow_down: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0.00%> (-55.56%)` | :arrow_down: |
   | [airflow/providers/postgres/operators/postgres.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcG9zdGdyZXMvb3BlcmF0b3JzL3Bvc3RncmVzLnB5) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [airflow/providers/redis/operators/redis\_publish.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcmVkaXMvb3BlcmF0b3JzL3JlZGlzX3B1Ymxpc2gucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0.00%> (-47.06%)` | :arrow_down: |
   | [airflow/providers/mongo/sensors/mongo.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvbW9uZ28vc2Vuc29ycy9tb25nby5weQ==) | `53.33% <0.00%> (-46.67%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0.00%> (-45.08%)` | :arrow_down: |
   | [airflow/providers/mysql/operators/mysql.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvbXlzcWwvb3BlcmF0b3JzL215c3FsLnB5) | `55.00% <0.00%> (-45.00%)` | :arrow_down: |
   | ... and [10 more](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=footer). Last update [4979e5c...05fb3a9](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#discussion_r393915193
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -40,32 +40,27 @@ class GCSToGCSOperator(BaseOperator):
     :param source_bucket: The source Google Cloud Storage bucket where the
          object is. (templated)
     :type source_bucket: str
-    :param source_object: The source name of the object to copy in the Google cloud
+    :param source_object: A list of prefix of the objects to copy in the Google cloud
         storage bucket. (templated)
-        You can use only one wildcard for objects (filenames) within your
-        bucket. The wildcard can appear inside the object name or at the
-        end of the object name. Appending a wildcard to the bucket name is
-        unsupported.
-    :type source_object: str
+    :type source_object: List[str]
 
 Review comment:
   Nice. I'll take that approach of source_object and source_objects. Thanks a lot

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#discussion_r393910336
 
 

 ##########
 File path: tests/providers/google/cloud/operators/test_gcs_to_gcs.py
 ##########
 @@ -28,33 +28,40 @@
 
 TASK_ID = 'test-gcs-to-gcs-operator'
 TEST_BUCKET = 'test-bucket'
-DELIMITER = '.csv'
 PREFIX = 'TEST'
+SOURCE_OBJECTS_NO_FILE = ['']
+SOURCE_OBJECTS_TWO_EMPTY_STRING = ['', '']
+SOURCE_OBJECTS_SINGLE_FILE = ['test_object/file1.txt']
+SOURCE_OBJECTS_MULTIPLE_FILES = ['test_object/file1.txt', 'test_object/file2.txt']
+SOURCE_OBJECTS_LIST = ['test_object/file1.txt', 'test_object/file2.txt', 'test_object/file3.json']
+
 SOURCE_OBJECT_WILDCARD_PREFIX = '*test_object'
 SOURCE_OBJECT_WILDCARD_SUFFIX = 'test_object*'
 SOURCE_OBJECT_WILDCARD_MIDDLE = 'test*object'
 SOURCE_OBJECT_WILDCARD_FILENAME = 'test_object*.txt'
 SOURCE_OBJECT_NO_WILDCARD = 'test_object.txt'
 SOURCE_OBJECT_MULTIPLE_WILDCARDS = 'csv/*/test_*.csv'
 DESTINATION_BUCKET = 'archive'
+DESTINATION_OBJECT = 'foo/bar'
 DESTINATION_OBJECT_PREFIX = 'foo/bar'
 SOURCE_FILES_LIST = [
     'test_object/file1.txt',
     'test_object/file2.txt',
     'test_object/file3.json',
 ]
+DELIMITER = '.json'
 
 Review comment:
   I actually changed my mind - I see that now we have suffix/delimiter in various places - I think it's better to keep "delimiter" from list_blob method even if it's not an obvious name.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#discussion_r392667594
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -40,32 +37,27 @@ class GCSToGCSOperator(BaseOperator):
     :param source_bucket: The source Google Cloud Storage bucket where the
          object is. (templated)
     :type source_bucket: str
-    :param source_object: The source name of the object to copy in the Google cloud
+    :param source_objects: A list of prefix of the objects to copy in the Google cloud
         storage bucket. (templated)
-        You can use only one wildcard for objects (filenames) within your
-        bucket. The wildcard can appear inside the object name or at the
-        end of the object name. Appending a wildcard to the bucket name is
-        unsupported.
-    :type source_object: str
+    :type source_objects: List[str]
     :param destination_bucket: The destination Google Cloud Storage bucket
         where the object should be. If the destination_bucket is None, it defaults
         to source_bucket. (templated)
     :type destination_bucket: str
     :param destination_object: The destination name of the object in the
         destination Google Cloud Storage bucket. (templated)
-        If a wildcard is supplied in the source_object argument, this is the
-        prefix that will be prepended to the final destination objects' paths.
-        Note that the source path's part before the wildcard will be removed;
-        if it needs to be retained it should be appended to destination_object.
-        For example, with prefix ``foo/*`` and destination_object ``blah/``, the
-        file ``foo/baz`` will be copied to ``blah/baz``; to retain the prefix write
-        the destination_object as e.g. ``blah/foo``, in which case the copied file
-        will be named ``blah/foo/baz``.
+        If destination object is not specified, then it defaults to each of the source objects.
+        For example, if source_objects = ['foo/sales','bah/inventory'], then destination will be
+        'foo/sales' and 'bah/inventory' if destination_object is not specified.
     :type destination_object: str
     :param move_object: When move object is True, the object is moved instead
         of copied to the new location. This is the equivalent of a mv command
         as opposed to a cp command.
     :type move_object: bool
+    :type delimiter: str
 
 Review comment:
   Ok. Thanks very much for the review. 
   I'm thinking of removing it entirely and add back `google_cloud_storage_conn_id` which I removed. This is for backward compatibility. The arguments are already 10 so leaving the both arguments will make tests not to pass.
   If `delimiter` is removed, then search will now be based on source object wildcard.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#discussion_r393906618
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -40,32 +40,27 @@ class GCSToGCSOperator(BaseOperator):
     :param source_bucket: The source Google Cloud Storage bucket where the
          object is. (templated)
     :type source_bucket: str
-    :param source_object: The source name of the object to copy in the Google cloud
+    :param source_object: A list of prefix of the objects to copy in the Google cloud
         storage bucket. (templated)
-        You can use only one wildcard for objects (filenames) within your
-        bucket. The wildcard can appear inside the object name or at the
-        end of the object name. Appending a wildcard to the bucket name is
-        unsupported.
-    :type source_object: str
+    :type source_object: List[str]
 
 Review comment:
   should be List [str] or str
   
   But maybe it would be better to keep both source_object and source_objects parameters (source_object = str, source_objects=List[str]) and check that only one is set? 
   
   I think that would be still backwards compatible, but more "accurate" with naming.
   
   But I leave it up to you - I am ok with both approaches (as long as we have correct type). 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#discussion_r392667627
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -169,30 +158,25 @@ def execute(self, context):
                 'destination_bucket is None. Defaulting it to source_bucket (%s)',
                 self.source_bucket)
             self.destination_bucket = self.source_bucket
-
-        if WILDCARD in self.source_object:
-            total_wildcards = self.source_object.count(WILDCARD)
-            if total_wildcards > 1:
-                error_msg = "Only one wildcard '*' is allowed in source_object parameter. " \
-                            "Found {} in {}.".format(total_wildcards, self.source_object)
-
-                raise AirflowException(error_msg)
-
-            prefix, delimiter = self.source_object.split(WILDCARD, 1)
-            objects = hook.list(self.source_bucket, prefix=prefix, delimiter=delimiter)
-
+        if not all(isinstance(item, str) for item in self.source_objects):
+            raise AirflowException('At least, one of the `objects` in the `source_objects` is not a string')
 
 Review comment:
   Ok. Noted

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#discussion_r392654033
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -40,32 +37,27 @@ class GCSToGCSOperator(BaseOperator):
     :param source_bucket: The source Google Cloud Storage bucket where the
          object is. (templated)
     :type source_bucket: str
-    :param source_object: The source name of the object to copy in the Google cloud
+    :param source_objects: A list of prefix of the objects to copy in the Google cloud
         storage bucket. (templated)
-        You can use only one wildcard for objects (filenames) within your
-        bucket. The wildcard can appear inside the object name or at the
-        end of the object name. Appending a wildcard to the bucket name is
-        unsupported.
-    :type source_object: str
+    :type source_objects: List[str]
     :param destination_bucket: The destination Google Cloud Storage bucket
         where the object should be. If the destination_bucket is None, it defaults
         to source_bucket. (templated)
     :type destination_bucket: str
     :param destination_object: The destination name of the object in the
         destination Google Cloud Storage bucket. (templated)
-        If a wildcard is supplied in the source_object argument, this is the
-        prefix that will be prepended to the final destination objects' paths.
-        Note that the source path's part before the wildcard will be removed;
-        if it needs to be retained it should be appended to destination_object.
-        For example, with prefix ``foo/*`` and destination_object ``blah/``, the
-        file ``foo/baz`` will be copied to ``blah/baz``; to retain the prefix write
-        the destination_object as e.g. ``blah/foo``, in which case the copied file
-        will be named ``blah/foo/baz``.
+        If destination object is not specified, then it defaults to each of the source objects.
+        For example, if source_objects = ['foo/sales','bah/inventory'], then destination will be
+        'foo/sales' and 'bah/inventory' if destination_object is not specified.
     :type destination_object: str
     :param move_object: When move object is True, the object is moved instead
         of copied to the new location. This is the equivalent of a mv command
         as opposed to a cp command.
     :type move_object: bool
+    :type delimiter: str
 
 Review comment:
   Should we name it "suffix" rather than delimiter?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#issuecomment-600328680
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=h1) Report
   > Merging [#7728](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/4979e5ce2fe11df963882db32c2ad394eaf53b58&el=desc) will **decrease** coverage by `0.77%`.
   > The diff coverage is `86.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7728/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7728      +/-   ##
   ==========================================
   - Coverage   86.99%   86.22%   -0.78%     
   ==========================================
     Files         915      915              
     Lines       44198    44228      +30     
   ==========================================
   - Hits        38452    38136     -316     
   - Misses       5746     6092     +346     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `90.90% <86.66%> (-4.75%)` | :arrow_down: |
   | [...flow/providers/apache/cassandra/hooks/cassandra.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL2Nhc3NhbmRyYS9ob29rcy9jYXNzYW5kcmEucHk=) | `21.51% <0.00%> (-72.16%)` | :arrow_down: |
   | [...w/providers/apache/hive/operators/mysql\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL2hpdmUvb3BlcmF0b3JzL215c3FsX3RvX2hpdmUucHk=) | `35.84% <0.00%> (-64.16%)` | :arrow_down: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0.00%> (-55.56%)` | :arrow_down: |
   | [airflow/providers/postgres/operators/postgres.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcG9zdGdyZXMvb3BlcmF0b3JzL3Bvc3RncmVzLnB5) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [airflow/providers/redis/operators/redis\_publish.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcmVkaXMvb3BlcmF0b3JzL3JlZGlzX3B1Ymxpc2gucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0.00%> (-47.06%)` | :arrow_down: |
   | [airflow/providers/mongo/sensors/mongo.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvbW9uZ28vc2Vuc29ycy9tb25nby5weQ==) | `53.33% <0.00%> (-46.67%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0.00%> (-45.08%)` | :arrow_down: |
   | [airflow/providers/mysql/operators/mysql.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvbXlzcWwvb3BlcmF0b3JzL215c3FsLnB5) | `55.00% <0.00%> (-45.00%)` | :arrow_down: |
   | ... and [10 more](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=footer). Last update [4979e5c...05fb3a9](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#discussion_r393908228
 
 

 ##########
 File path: tests/providers/google/cloud/operators/test_gcs_to_gcs.py
 ##########
 @@ -28,33 +28,40 @@
 
 TASK_ID = 'test-gcs-to-gcs-operator'
 TEST_BUCKET = 'test-bucket'
-DELIMITER = '.csv'
 PREFIX = 'TEST'
+SOURCE_OBJECTS_NO_FILE = ['']
+SOURCE_OBJECTS_TWO_EMPTY_STRING = ['', '']
+SOURCE_OBJECTS_SINGLE_FILE = ['test_object/file1.txt']
+SOURCE_OBJECTS_MULTIPLE_FILES = ['test_object/file1.txt', 'test_object/file2.txt']
+SOURCE_OBJECTS_LIST = ['test_object/file1.txt', 'test_object/file2.txt', 'test_object/file3.json']
+
 SOURCE_OBJECT_WILDCARD_PREFIX = '*test_object'
 SOURCE_OBJECT_WILDCARD_SUFFIX = 'test_object*'
 SOURCE_OBJECT_WILDCARD_MIDDLE = 'test*object'
 SOURCE_OBJECT_WILDCARD_FILENAME = 'test_object*.txt'
 SOURCE_OBJECT_NO_WILDCARD = 'test_object.txt'
 SOURCE_OBJECT_MULTIPLE_WILDCARDS = 'csv/*/test_*.csv'
 DESTINATION_BUCKET = 'archive'
+DESTINATION_OBJECT = 'foo/bar'
 DESTINATION_OBJECT_PREFIX = 'foo/bar'
 SOURCE_FILES_LIST = [
     'test_object/file1.txt',
     'test_object/file2.txt',
     'test_object/file3.json',
 ]
+DELIMITER = '.json'
 
 Review comment:
   Should be SUFFIX ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#discussion_r392667594
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -40,32 +37,27 @@ class GCSToGCSOperator(BaseOperator):
     :param source_bucket: The source Google Cloud Storage bucket where the
          object is. (templated)
     :type source_bucket: str
-    :param source_object: The source name of the object to copy in the Google cloud
+    :param source_objects: A list of prefix of the objects to copy in the Google cloud
         storage bucket. (templated)
-        You can use only one wildcard for objects (filenames) within your
-        bucket. The wildcard can appear inside the object name or at the
-        end of the object name. Appending a wildcard to the bucket name is
-        unsupported.
-    :type source_object: str
+    :type source_objects: List[str]
     :param destination_bucket: The destination Google Cloud Storage bucket
         where the object should be. If the destination_bucket is None, it defaults
         to source_bucket. (templated)
     :type destination_bucket: str
     :param destination_object: The destination name of the object in the
         destination Google Cloud Storage bucket. (templated)
-        If a wildcard is supplied in the source_object argument, this is the
-        prefix that will be prepended to the final destination objects' paths.
-        Note that the source path's part before the wildcard will be removed;
-        if it needs to be retained it should be appended to destination_object.
-        For example, with prefix ``foo/*`` and destination_object ``blah/``, the
-        file ``foo/baz`` will be copied to ``blah/baz``; to retain the prefix write
-        the destination_object as e.g. ``blah/foo``, in which case the copied file
-        will be named ``blah/foo/baz``.
+        If destination object is not specified, then it defaults to each of the source objects.
+        For example, if source_objects = ['foo/sales','bah/inventory'], then destination will be
+        'foo/sales' and 'bah/inventory' if destination_object is not specified.
     :type destination_object: str
     :param move_object: When move object is True, the object is moved instead
         of copied to the new location. This is the equivalent of a mv command
         as opposed to a cp command.
     :type move_object: bool
+    :type delimiter: str
 
 Review comment:
   Ok. Thanks very much for the review. 
   I'm thinking of removing it entirely and add back `google_cloud_storage_conn_id` which I removed. This is for backward compatibility. The arguments are already 10 so leaving the both arguments will make tests not to pass.
   If `delimiter` is removed, then search will now be based on source object wildcard.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#discussion_r392667615
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -40,32 +37,27 @@ class GCSToGCSOperator(BaseOperator):
     :param source_bucket: The source Google Cloud Storage bucket where the
          object is. (templated)
     :type source_bucket: str
-    :param source_object: The source name of the object to copy in the Google cloud
+    :param source_objects: A list of prefix of the objects to copy in the Google cloud
 
 Review comment:
   Ok. Noted. Thanks a lot

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io commented on issue #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#issuecomment-600328680
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=h1) Report
   > Merging [#7728](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/4979e5ce2fe11df963882db32c2ad394eaf53b58?src=pr&el=desc) will **decrease** coverage by `22.24%`.
   > The diff coverage is `86.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7728/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #7728       +/-   ##
   ===========================================
   - Coverage   86.99%   64.75%   -22.25%     
   ===========================================
     Files         915      914        -1     
     Lines       44198    44215       +17     
   ===========================================
   - Hits        38452    28633     -9819     
   - Misses       5746    15582     +9836
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `90.9% <86.66%> (-4.75%)` | :arrow_down: |
   | [...low/contrib/operators/wasb\_delete\_blob\_operator.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy93YXNiX2RlbGV0ZV9ibG9iX29wZXJhdG9yLnB5) | `0% <0%> (-100%)` | :arrow_down: |
   | [airflow/contrib/hooks/vertica\_hook.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL3ZlcnRpY2FfaG9vay5weQ==) | `0% <0%> (-100%)` | :arrow_down: |
   | [airflow/contrib/sensors/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvX19pbml0X18ucHk=) | `0% <0%> (-100%)` | :arrow_down: |
   | [airflow/hooks/mssql\_hook.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9tc3NxbF9ob29rLnB5) | `0% <0%> (-100%)` | :arrow_down: |
   | [...viders/docker/example\_dags/example\_docker\_swarm.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZG9ja2VyL2V4YW1wbGVfZGFncy9leGFtcGxlX2RvY2tlcl9zd2FybS5weQ==) | `0% <0%> (-100%)` | :arrow_down: |
   | [airflow/hooks/webhdfs\_hook.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy93ZWJoZGZzX2hvb2sucHk=) | `0% <0%> (-100%)` | :arrow_down: |
   | [airflow/contrib/sensors/emr\_base\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvZW1yX2Jhc2Vfc2Vuc29yLnB5) | `0% <0%> (-100%)` | :arrow_down: |
   | [...irflow/contrib/operators/slack\_webhook\_operator.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9zbGFja193ZWJob29rX29wZXJhdG9yLnB5) | `0% <0%> (-100%)` | :arrow_down: |
   | [...providers/google/cloud/example\_dags/example\_dlp.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2V4YW1wbGVfZGFncy9leGFtcGxlX2RscC5weQ==) | `0% <0%> (-100%)` | :arrow_down: |
   | ... and [491 more](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=footer). Last update [4979e5c...05fb3a9](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on issue #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#issuecomment-600435629
 
 
   Thanks @ephraimbuddy !

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#issuecomment-600328680
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=h1) Report
   > Merging [#7728](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/49998edd2ff0b64fd1771138fc7d8e835c564a47&el=desc) will **decrease** coverage by `27.09%`.
   > The diff coverage is `13.33%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7728/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #7728       +/-   ##
   ===========================================
   - Coverage   86.99%   59.90%   -27.10%     
   ===========================================
     Files         915      915               
     Lines       44198    44228       +30     
   ===========================================
   - Hits        38451    26494    -11957     
   - Misses       5747    17734    +11987     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `44.44% <13.33%> (-51.21%)` | :arrow_down: |
   | [airflow/providers/amazon/aws/hooks/kinesis.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYW1hem9uL2F3cy9ob29rcy9raW5lc2lzLnB5) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/providers/apache/livy/sensors/livy.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL2xpdnkvc2Vuc29ycy9saXZ5LnB5) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/providers/google/suite/hooks/sheets.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL3N1aXRlL2hvb2tzL3NoZWV0cy5weQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/providers/amazon/aws/sensors/redshift.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYW1hem9uL2F3cy9zZW5zb3JzL3JlZHNoaWZ0LnB5) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/providers/postgres/operators/postgres.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcG9zdGdyZXMvb3BlcmF0b3JzL3Bvc3RncmVzLnB5) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/providers/microsoft/azure/operators/adx.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvbWljcm9zb2Z0L2F6dXJlL29wZXJhdG9ycy9hZHgucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...irflow/providers/amazon/aws/hooks/batch\_waiters.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYW1hem9uL2F3cy9ob29rcy9iYXRjaF93YWl0ZXJzLnB5) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ow/providers/amazon/aws/sensors/cloud\_formation.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYW1hem9uL2F3cy9zZW5zb3JzL2Nsb3VkX2Zvcm1hdGlvbi5weQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...w/providers/apache/hive/operators/mysql\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL2hpdmUvb3BlcmF0b3JzL215c3FsX3RvX2hpdmUucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [305 more](https://codecov.io/gh/apache/airflow/pull/7728/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=footer). Last update [49998ed...1813ca4](https://codecov.io/gh/apache/airflow/pull/7728?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#discussion_r392654143
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -40,32 +37,27 @@ class GCSToGCSOperator(BaseOperator):
     :param source_bucket: The source Google Cloud Storage bucket where the
          object is. (templated)
     :type source_bucket: str
-    :param source_object: The source name of the object to copy in the Google cloud
+    :param source_objects: A list of prefix of the objects to copy in the Google cloud
 
 Review comment:
   I think we need to keep backwards compatibility. source_object should still be there and passing anything to it should be equivalent to passing [source_object] to source_objects

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#discussion_r392654333
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -169,30 +158,25 @@ def execute(self, context):
                 'destination_bucket is None. Defaulting it to source_bucket (%s)',
                 self.source_bucket)
             self.destination_bucket = self.source_bucket
-
-        if WILDCARD in self.source_object:
-            total_wildcards = self.source_object.count(WILDCARD)
-            if total_wildcards > 1:
-                error_msg = "Only one wildcard '*' is allowed in source_object parameter. " \
-                            "Found {} in {}.".format(total_wildcards, self.source_object)
-
-                raise AirflowException(error_msg)
-
-            prefix, delimiter = self.source_object.split(WILDCARD, 1)
-            objects = hook.list(self.source_bucket, prefix=prefix, delimiter=delimiter)
-
+        if not all(isinstance(item, str) for item in self.source_objects):
+            raise AirflowException('At least, one of the `objects` in the `source_objects` is not a string')
 
 Review comment:
   I think we should not remove the wildcard approach. It's much more powerful than just file extension. For example this would work with wildcard but would not work with suffix approach proposed:
   'folder/*/sub_folder/files'
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk merged pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7728: [AIRFLOW-5610] Add ability to specify multiple objects to copy in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7728#discussion_r393915406
 
 

 ##########
 File path: tests/providers/google/cloud/operators/test_gcs_to_gcs.py
 ##########
 @@ -28,33 +28,40 @@
 
 TASK_ID = 'test-gcs-to-gcs-operator'
 TEST_BUCKET = 'test-bucket'
-DELIMITER = '.csv'
 PREFIX = 'TEST'
+SOURCE_OBJECTS_NO_FILE = ['']
+SOURCE_OBJECTS_TWO_EMPTY_STRING = ['', '']
+SOURCE_OBJECTS_SINGLE_FILE = ['test_object/file1.txt']
+SOURCE_OBJECTS_MULTIPLE_FILES = ['test_object/file1.txt', 'test_object/file2.txt']
+SOURCE_OBJECTS_LIST = ['test_object/file1.txt', 'test_object/file2.txt', 'test_object/file3.json']
+
 SOURCE_OBJECT_WILDCARD_PREFIX = '*test_object'
 SOURCE_OBJECT_WILDCARD_SUFFIX = 'test_object*'
 SOURCE_OBJECT_WILDCARD_MIDDLE = 'test*object'
 SOURCE_OBJECT_WILDCARD_FILENAME = 'test_object*.txt'
 SOURCE_OBJECT_NO_WILDCARD = 'test_object.txt'
 SOURCE_OBJECT_MULTIPLE_WILDCARDS = 'csv/*/test_*.csv'
 DESTINATION_BUCKET = 'archive'
+DESTINATION_OBJECT = 'foo/bar'
 DESTINATION_OBJECT_PREFIX = 'foo/bar'
 SOURCE_FILES_LIST = [
     'test_object/file1.txt',
     'test_object/file2.txt',
     'test_object/file3.json',
 ]
+DELIMITER = '.json'
 
 Review comment:
   Ok. Well noted

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services