You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/03/20 23:13:30 UTC

[GitHub] [airflow] ephraimbuddy opened a new pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

ephraimbuddy opened a new pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791
 
 
   This PR adds the ability to specify maximum modified time for objects in GCSTOGCSOperator.
   This will enable users to use date intervals to copy objects by specifying last_modified_time and maximum_modified_time.
   ---
   Issue link: WILL BE INSERTED BY [boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#issuecomment-601966502
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=h1) Report
   > Merging [#7791](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/fee827ed5a5d4aea2d2e60f73443357721a2c45a&el=desc) will **decrease** coverage by `0.29%`.
   > The diff coverage is `85.71%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7791/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7791      +/-   ##
   ==========================================
   - Coverage   86.93%   86.63%   -0.30%     
   ==========================================
     Files         924      924              
     Lines       44674    44715      +41     
   ==========================================
   - Hits        38836    38741      -95     
   - Misses       5838     5974     +136     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `89.81% <80.00%> (-1.10%)` | :arrow_down: |
   | [airflow/providers/google/cloud/hooks/gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2hvb2tzL2djcy5weQ==) | `85.66% <87.50%> (+0.20%)` | :arrow_up: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0.00%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0.00%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0.00%> (-45.08%)` | :arrow_down: |
   | [...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==) | `69.69% <0.00%> (-25.26%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0.00%> (-23.53%)` | :arrow_down: |
   | [airflow/utils/process\_utils.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9wcm9jZXNzX3V0aWxzLnB5) | `76.53% <0.00%> (-11.23%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=footer). Last update [fee827e...d6520d3](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r396082483
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -87,6 +87,10 @@ class GCSToGCSOperator(BaseOperator):
         only if they were modified after last_modified_time.
         If tzinfo has not been set, UTC will be assumed.
     :type last_modified_time: datetime.datetime
+    :param maximum_modified_time: When specified, the objects will be copied or moved,
 
 Review comment:
   Please verify. Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r396090850
 
 

 ##########
 File path: airflow/providers/google/cloud/hooks/gcs.py
 ##########
 @@ -307,6 +307,79 @@ def is_updated_after(self, bucket_name, object_name, ts):
 
         return False
 
+    def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
+        """
+        Checks if an blob_name is updated in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+                storage bucket.
+        :type object_name: str
+        :param min_ts: The minimum timestamp to check against.
+        :type min_ts: datetime.datetime
+        :param max_ts: The maximum timestamp to check against.
+        :type max_ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
+                object_name, bucket_name))
+
+        blob_update_time = blob.updated
+
+        if blob_update_time is not None:
+            import dateutil.tz
+
+            if not min_ts.tzinfo:
+                min_ts = min_ts.replace(tzinfo=dateutil.tz.tzutc())
+            if not max_ts.tzinfo:
+                max_ts = max_ts.replace(tzinfo=dateutil.tz.tzutc())
+
+            self.log.info("Verify object date: %s is between %s and %s", blob_update_time, min_ts, max_ts)
+
+            if min_ts < blob_update_time < max_ts:
+                return True
+        return False
+
+    def is_updated_before(self, bucket_name, object_name, ts):
+        """
+        Checks if an blob_name is updated before given time in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+            storage bucket.
+        :type object_name: str
+        :param ts: The timestamp to check against.
+        :type ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
 
 Review comment:
   BTW. It was really from the top of my head. You'd have to use functool.partial to do it in the way I described https://stackoverflow.com/questions/15331726/how-does-functools-partial-do-what-it-does/15331841

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r396090389
 
 

 ##########
 File path: airflow/providers/google/cloud/hooks/gcs.py
 ##########
 @@ -307,6 +307,79 @@ def is_updated_after(self, bucket_name, object_name, ts):
 
         return False
 
+    def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
+        """
+        Checks if an blob_name is updated in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+                storage bucket.
+        :type object_name: str
+        :param min_ts: The minimum timestamp to check against.
+        :type min_ts: datetime.datetime
+        :param max_ts: The maximum timestamp to check against.
+        :type max_ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
+                object_name, bucket_name))
+
+        blob_update_time = blob.updated
+
+        if blob_update_time is not None:
+            import dateutil.tz
+
+            if not min_ts.tzinfo:
+                min_ts = min_ts.replace(tzinfo=dateutil.tz.tzutc())
+            if not max_ts.tzinfo:
+                max_ts = max_ts.replace(tzinfo=dateutil.tz.tzutc())
+
+            self.log.info("Verify object date: %s is between %s and %s", blob_update_time, min_ts, max_ts)
+
+            if min_ts < blob_update_time < max_ts:
+                return True
+        return False
+
+    def is_updated_before(self, bucket_name, object_name, ts):
+        """
+        Checks if an blob_name is updated before given time in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+            storage bucket.
+        :type object_name: str
+        :param ts: The timestamp to check against.
+        :type ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
 
 Review comment:
   Nice!
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#issuecomment-601966502
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=h1) Report
   > Merging [#7791](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9ad1844d582f15b488d261ba80159dba388d7f8b&el=desc) will **decrease** coverage by `0.26%`.
   > The diff coverage is `85.96%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7791/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7791      +/-   ##
   ==========================================
   - Coverage   86.88%   86.61%   -0.27%     
   ==========================================
     Files         924      924              
     Lines       44674    44727      +53     
   ==========================================
   - Hits        38816    38742      -74     
   - Misses       5858     5985     +127     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `87.17% <75.00%> (-3.73%)` | :arrow_down: |
   | [airflow/providers/google/cloud/hooks/gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2hvb2tzL2djcy5weQ==) | `86.43% <91.89%> (+0.97%)` | :arrow_up: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0.00%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0.00%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0.00%> (-45.08%)` | :arrow_down: |
   | [...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==) | `69.69% <0.00%> (-25.26%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0.00%> (-23.53%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=footer). Last update [9ad1844...c8d9075](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395972376
 
 

 ##########
 File path: airflow/providers/google/cloud/hooks/gcs.py
 ##########
 @@ -307,6 +307,79 @@ def is_updated_after(self, bucket_name, object_name, ts):
 
         return False
 
+    def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
+        """
+        Checks if an blob_name is updated in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+                storage bucket.
+        :type object_name: str
+        :param min_ts: The minimum timestamp to check against.
+        :type min_ts: datetime.datetime
+        :param max_ts: The maximum timestamp to check against.
+        :type max_ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
+                object_name, bucket_name))
+
+        blob_update_time = blob.updated
+
+        if blob_update_time is not None:
+            import dateutil.tz
+
+            if not min_ts.tzinfo:
+                min_ts = min_ts.replace(tzinfo=dateutil.tz.tzutc())
+            if not max_ts.tzinfo:
+                max_ts = max_ts.replace(tzinfo=dateutil.tz.tzutc())
+
+            self.log.info("Verify object date: %s is between %s and %s", blob_update_time, min_ts, max_ts)
+
+            if min_ts < blob_update_time < max_ts:
+                return True
+        return False
+
+    def is_updated_before(self, bucket_name, object_name, ts):
 
 Review comment:
   There's is_updated_after that checks if object is updated after the last_modified_time

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#issuecomment-601966502
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=h1) Report
   > Merging [#7791](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9ad1844d582f15b488d261ba80159dba388d7f8b&el=desc) will **decrease** coverage by `0.26%`.
   > The diff coverage is `85.96%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7791/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7791      +/-   ##
   ==========================================
   - Coverage   86.88%   86.61%   -0.27%     
   ==========================================
     Files         924      924              
     Lines       44674    44727      +53     
   ==========================================
   - Hits        38816    38742      -74     
   - Misses       5858     5985     +127     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `87.17% <75.00%> (-3.73%)` | :arrow_down: |
   | [airflow/providers/google/cloud/hooks/gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2hvb2tzL2djcy5weQ==) | `86.43% <91.89%> (+0.97%)` | :arrow_up: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0.00%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0.00%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0.00%> (-45.08%)` | :arrow_down: |
   | [...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==) | `69.69% <0.00%> (-25.26%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0.00%> (-23.53%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=footer). Last update [9ad1844...c8d9075](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395973548
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -87,6 +87,10 @@ class GCSToGCSOperator(BaseOperator):
         only if they were modified after last_modified_time.
         If tzinfo has not been set, UTC will be assumed.
     :type last_modified_time: datetime.datetime
+    :param maximum_modified_time: When specified, the objects will be copied or moved,
 
 Review comment:
   What I meant was to add an option to specify how old the object should be - i.e. 3600 = older than 1 hour. So the thing is to specify the "age" (relative to current time) rather than modified time (absolute UTC). -> it boils down to "max_modified < current_time - age
    (so it is a simple modification) but I think it is much more useful this way  - otherwise you wil have to do this calculation in DAGs (you will rarely  pass absolute time as parameter hard-coded in DAG.). 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395973205
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -277,14 +283,35 @@ def _copy_source_with_wildcard(self, hook, prefix):
                                      destination_object=destination_object)
 
     def _copy_single_object(self, hook, source_object, destination_object):
-        if self.last_modified_time is not None:
+        if self.last_modified_time and self.maximum_modified_time:
+            # check to see if object was modified between last_modified_time and
+            # maximum_modified_time
+            if hook.is_updated_between(self.source_bucket,
+                                       source_object,
+                                       self.last_modified_time,
+                                       self.maximum_modified_time
+                                       ):
+                self.log.debug("Object has been modified between %s and %s",
 
 Review comment:
   Ok. I'll do that

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r396030997
 
 

 ##########
 File path: airflow/providers/google/cloud/hooks/gcs.py
 ##########
 @@ -307,6 +307,79 @@ def is_updated_after(self, bucket_name, object_name, ts):
 
         return False
 
+    def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
+        """
+        Checks if an blob_name is updated in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+                storage bucket.
+        :type object_name: str
+        :param min_ts: The minimum timestamp to check against.
+        :type min_ts: datetime.datetime
+        :param max_ts: The maximum timestamp to check against.
+        :type max_ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
+                object_name, bucket_name))
+
+        blob_update_time = blob.updated
+
+        if blob_update_time is not None:
+            import dateutil.tz
+
+            if not min_ts.tzinfo:
+                min_ts = min_ts.replace(tzinfo=dateutil.tz.tzutc())
+            if not max_ts.tzinfo:
+                max_ts = max_ts.replace(tzinfo=dateutil.tz.tzutc())
+
+            self.log.info("Verify object date: %s is between %s and %s", blob_update_time, min_ts, max_ts)
+
+            if min_ts < blob_update_time < max_ts:
+                return True
+        return False
+
+    def is_updated_before(self, bucket_name, object_name, ts):
+        """
+        Checks if an blob_name is updated before given time in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+            storage bucket.
+        :type object_name: str
+        :param ts: The timestamp to check against.
+        :type ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
 
 Review comment:
   I am confused here on how to implement this method. I thought the expected method already exist, thinking that `exists` method could do it, but i'm realizing now that I don't really understand what is expected. Please if possible give me some hints. Thanks very much for your time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395972320
 
 

 ##########
 File path: airflow/providers/google/cloud/hooks/gcs.py
 ##########
 @@ -307,6 +307,79 @@ def is_updated_after(self, bucket_name, object_name, ts):
 
         return False
 
+    def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
+        """
+        Checks if an blob_name is updated in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+                storage bucket.
+        :type object_name: str
+        :param min_ts: The minimum timestamp to check against.
 
 Review comment:
   Ok. I'll address it. Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395970036
 
 

 ##########
 File path: airflow/providers/google/cloud/hooks/gcs.py
 ##########
 @@ -307,6 +307,79 @@ def is_updated_after(self, bucket_name, object_name, ts):
 
         return False
 
+    def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
+        """
+        Checks if an blob_name is updated in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+                storage bucket.
+        :type object_name: str
+        :param min_ts: The minimum timestamp to check against.
 
 Review comment:
   I think we should be specific whether the min/max time is included or excluded. My proposal will be to 
   include min_time but exclude max_time (min_time <= blob_update_time < max_time).
   This will mimic the behaviour of [0:x] operator.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk merged pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395970467
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -87,6 +87,10 @@ class GCSToGCSOperator(BaseOperator):
         only if they were modified after last_modified_time.
         If tzinfo has not been set, UTC will be assumed.
     :type last_modified_time: datetime.datetime
+    :param maximum_modified_time: When specified, the objects will be copied or moved,
 
 Review comment:
   I have a feeling that it would be much more useful to have another parameter (is_older_than) that could specify how old (or how new) objects are.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#issuecomment-602197324
 
 
   Thanks @ephraimbuddy !

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395970116
 
 

 ##########
 File path: airflow/providers/google/cloud/hooks/gcs.py
 ##########
 @@ -307,6 +307,79 @@ def is_updated_after(self, bucket_name, object_name, ts):
 
         return False
 
+    def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
+        """
+        Checks if an blob_name is updated in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+                storage bucket.
+        :type object_name: str
+        :param min_ts: The minimum timestamp to check against.
+        :type min_ts: datetime.datetime
+        :param max_ts: The maximum timestamp to check against.
+        :type max_ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
+                object_name, bucket_name))
+
+        blob_update_time = blob.updated
+
+        if blob_update_time is not None:
+            import dateutil.tz
+
+            if not min_ts.tzinfo:
+                min_ts = min_ts.replace(tzinfo=dateutil.tz.tzutc())
+            if not max_ts.tzinfo:
+                max_ts = max_ts.replace(tzinfo=dateutil.tz.tzutc())
+
+            self.log.info("Verify object date: %s is between %s and %s", blob_update_time, min_ts, max_ts)
+
+            if min_ts < blob_update_time < max_ts:
+                return True
+        return False
+
+    def is_updated_before(self, bucket_name, object_name, ts):
+        """
+        Checks if an blob_name is updated before given time in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+            storage bucket.
+        :type object_name: str
+        :param ts: The timestamp to check against.
+        :type ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
 
 Review comment:
   Maybe extract common method and pass the condition as lambda ? That will be a lot less code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#issuecomment-601966502
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=h1) Report
   > Merging [#7791](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9ad1844d582f15b488d261ba80159dba388d7f8b&el=desc) will **decrease** coverage by `0.26%`.
   > The diff coverage is `85.96%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7791/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7791      +/-   ##
   ==========================================
   - Coverage   86.88%   86.61%   -0.27%     
   ==========================================
     Files         924      924              
     Lines       44674    44727      +53     
   ==========================================
   - Hits        38816    38742      -74     
   - Misses       5858     5985     +127     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `87.17% <75.00%> (-3.73%)` | :arrow_down: |
   | [airflow/providers/google/cloud/hooks/gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2hvb2tzL2djcy5weQ==) | `86.43% <91.89%> (+0.97%)` | :arrow_up: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0.00%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0.00%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0.00%> (-45.08%)` | :arrow_down: |
   | [...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==) | `69.69% <0.00%> (-25.26%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0.00%> (-23.53%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=footer). Last update [9ad1844...c8d9075](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#issuecomment-601966502
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=h1) Report
   > Merging [#7791](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9ad1844d582f15b488d261ba80159dba388d7f8b&el=desc) will **decrease** coverage by `0.26%`.
   > The diff coverage is `85.96%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7791/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7791      +/-   ##
   ==========================================
   - Coverage   86.88%   86.61%   -0.27%     
   ==========================================
     Files         924      924              
     Lines       44674    44727      +53     
   ==========================================
   - Hits        38816    38742      -74     
   - Misses       5858     5985     +127     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `87.17% <75.00%> (-3.73%)` | :arrow_down: |
   | [airflow/providers/google/cloud/hooks/gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2hvb2tzL2djcy5weQ==) | `86.43% <91.89%> (+0.97%)` | :arrow_up: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0.00%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0.00%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0.00%> (-45.08%)` | :arrow_down: |
   | [...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==) | `69.69% <0.00%> (-25.26%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0.00%> (-23.53%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=footer). Last update [9ad1844...c8d9075](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395970053
 
 

 ##########
 File path: airflow/providers/google/cloud/hooks/gcs.py
 ##########
 @@ -307,6 +307,79 @@ def is_updated_after(self, bucket_name, object_name, ts):
 
         return False
 
+    def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
+        """
+        Checks if an blob_name is updated in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+                storage bucket.
+        :type object_name: str
+        :param min_ts: The minimum timestamp to check against.
+        :type min_ts: datetime.datetime
+        :param max_ts: The maximum timestamp to check against.
+        :type max_ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
+                object_name, bucket_name))
+
+        blob_update_time = blob.updated
+
+        if blob_update_time is not None:
+            import dateutil.tz
+
+            if not min_ts.tzinfo:
+                min_ts = min_ts.replace(tzinfo=dateutil.tz.tzutc())
+            if not max_ts.tzinfo:
+                max_ts = max_ts.replace(tzinfo=dateutil.tz.tzutc())
+
+            self.log.info("Verify object date: %s is between %s and %s", blob_update_time, min_ts, max_ts)
+
+            if min_ts < blob_update_time < max_ts:
+                return True
+        return False
+
+    def is_updated_before(self, bucket_name, object_name, ts):
 
 Review comment:
   Should we add is_updated_after?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395970467
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -87,6 +87,10 @@ class GCSToGCSOperator(BaseOperator):
         only if they were modified after last_modified_time.
         If tzinfo has not been set, UTC will be assumed.
     :type last_modified_time: datetime.datetime
+    :param maximum_modified_time: When specified, the objects will be copied or moved,
 
 Review comment:
   I have a feeling that it would be much more useful to have another parameter (is_older_than => or rather "age") that could specify how old (or how new) objects are.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395973112
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -87,6 +87,10 @@ class GCSToGCSOperator(BaseOperator):
         only if they were modified after last_modified_time.
         If tzinfo has not been set, UTC will be assumed.
     :type last_modified_time: datetime.datetime
+    :param maximum_modified_time: When specified, the objects will be copied or moved,
 
 Review comment:
   I think that is what the maximum_modified_time is doing. It checks if objects are older than the given time and then copy it. While last_modified_time is checking if objects are new compared to the last_modified_time. Actually, that is what I think they are doing, I might be wrong though. 
   When both last_modified_time and maximum_modified_time are supplied, then they act like a time interval to search for the object. I will be much happy if you can throw more light on this. Thanks for your time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395973112
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -87,6 +87,10 @@ class GCSToGCSOperator(BaseOperator):
         only if they were modified after last_modified_time.
         If tzinfo has not been set, UTC will be assumed.
     :type last_modified_time: datetime.datetime
+    :param maximum_modified_time: When specified, the objects will be copied or moved,
 
 Review comment:
   I think that is what the maximum_modified_time is doing. It checks if objects are older than the given time and then copy them. While last_modified_time is checking if objects are new compared to the last_modified_time. Actually, that is what I think they are doing, I might be wrong though. 
   When both last_modified_time and maximum_modified_time are supplied, then they act like a time interval to search for the object. I will be much happy if you can throw more light on this. Thanks for your time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r396065490
 
 

 ##########
 File path: airflow/providers/google/cloud/hooks/gcs.py
 ##########
 @@ -307,6 +307,79 @@ def is_updated_after(self, bucket_name, object_name, ts):
 
         return False
 
+    def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
+        """
+        Checks if an blob_name is updated in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+                storage bucket.
+        :type object_name: str
+        :param min_ts: The minimum timestamp to check against.
+        :type min_ts: datetime.datetime
+        :param max_ts: The maximum timestamp to check against.
+        :type max_ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
+                object_name, bucket_name))
+
+        blob_update_time = blob.updated
+
+        if blob_update_time is not None:
+            import dateutil.tz
+
+            if not min_ts.tzinfo:
+                min_ts = min_ts.replace(tzinfo=dateutil.tz.tzutc())
+            if not max_ts.tzinfo:
+                max_ts = max_ts.replace(tzinfo=dateutil.tz.tzutc())
+
+            self.log.info("Verify object date: %s is between %s and %s", blob_update_time, min_ts, max_ts)
+
+            if min_ts < blob_update_time < max_ts:
+                return True
+        return False
+
+    def is_updated_before(self, bucket_name, object_name, ts):
+        """
+        Checks if an blob_name is updated before given time in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+            storage bucket.
+        :type object_name: str
+        :param ts: The timestamp to check against.
+        :type ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
 
 Review comment:
   No worries - that was just a suggestion -> you do not need to implement it. 
   
   I thought that there is quite a lot of common code that could be extracted. I thought that we could do something like below pattern (I have not compiled the code - just tried to show you the idea):
   ```
   def verify_the_object(self, bucket_name, object_name, verify_method):
              client = self.get_conn()
              bucket = client.bucket(bucket_name)
              blob = bucket.get_blob(blob_name=object_name)
              if blob is None:
                   raise ValueError("Object ({}) not found in Bucket ({})".format(object_name, bucket_name))
             return verify_method(blob)
   ```
   then is_updated_between could look like
   ```
   def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
       def _update_between(blob):
            blob_update_time = blob.updated
           if blob_update_time is not None:
               import dateutil.tz
               if not min_ts.tzinfo:
                   min_ts = min_ts.replace(tzinfo=dateutil.tz.tzutc())
               if not max_ts.tzinfo:
                   max_ts = max_ts.replace(tzinfo=dateutil.tz.tzutc())
               self.log.info("Verify object date: %s is between %s and %s", blob_update_time, min_ts, max_ts)
               if min_ts < blob_update_time < max_ts:
                   return True
           return False
       return self.verify_the_object(bucket_name, object_name, _update_between)
   ```
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io commented on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#issuecomment-601966502
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=h1) Report
   > Merging [#7791](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/fee827ed5a5d4aea2d2e60f73443357721a2c45a&el=desc) will **decrease** coverage by `0.29%`.
   > The diff coverage is `85.71%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7791/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7791      +/-   ##
   ==========================================
   - Coverage   86.93%   86.63%   -0.30%     
   ==========================================
     Files         924      924              
     Lines       44674    44715      +41     
   ==========================================
   - Hits        38836    38741      -95     
   - Misses       5838     5974     +136     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `89.81% <80.00%> (-1.10%)` | :arrow_down: |
   | [airflow/providers/google/cloud/hooks/gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2hvb2tzL2djcy5weQ==) | `85.66% <87.50%> (+0.20%)` | :arrow_up: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0.00%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0.00%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0.00%> (-45.08%)` | :arrow_down: |
   | [...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==) | `69.69% <0.00%> (-25.26%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0.00%> (-23.53%)` | :arrow_down: |
   | [airflow/utils/process\_utils.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9wcm9jZXNzX3V0aWxzLnB5) | `76.53% <0.00%> (-11.23%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=footer). Last update [fee827e...d6520d3](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#issuecomment-601966502
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=h1) Report
   > Merging [#7791](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/fee827ed5a5d4aea2d2e60f73443357721a2c45a&el=desc) will **decrease** coverage by `0.29%`.
   > The diff coverage is `85.71%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7791/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7791      +/-   ##
   ==========================================
   - Coverage   86.93%   86.63%   -0.30%     
   ==========================================
     Files         924      924              
     Lines       44674    44715      +41     
   ==========================================
   - Hits        38836    38741      -95     
   - Misses       5838     5974     +136     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `89.81% <80.00%> (-1.10%)` | :arrow_down: |
   | [airflow/providers/google/cloud/hooks/gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2hvb2tzL2djcy5weQ==) | `85.66% <87.50%> (+0.20%)` | :arrow_up: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0.00%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0.00%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0.00%> (-45.08%)` | :arrow_down: |
   | [...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==) | `69.69% <0.00%> (-25.26%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0.00%> (-23.53%)` | :arrow_down: |
   | [airflow/utils/process\_utils.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9wcm9jZXNzX3V0aWxzLnB5) | `76.53% <0.00%> (-11.23%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=footer). Last update [fee827e...d6520d3](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395972594
 
 

 ##########
 File path: airflow/providers/google/cloud/hooks/gcs.py
 ##########
 @@ -307,6 +307,79 @@ def is_updated_after(self, bucket_name, object_name, ts):
 
         return False
 
+    def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
+        """
+        Checks if an blob_name is updated in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+                storage bucket.
+        :type object_name: str
+        :param min_ts: The minimum timestamp to check against.
+        :type min_ts: datetime.datetime
+        :param max_ts: The maximum timestamp to check against.
+        :type max_ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
+                object_name, bucket_name))
+
+        blob_update_time = blob.updated
+
+        if blob_update_time is not None:
+            import dateutil.tz
+
+            if not min_ts.tzinfo:
+                min_ts = min_ts.replace(tzinfo=dateutil.tz.tzutc())
+            if not max_ts.tzinfo:
+                max_ts = max_ts.replace(tzinfo=dateutil.tz.tzutc())
+
+            self.log.info("Verify object date: %s is between %s and %s", blob_update_time, min_ts, max_ts)
+
+            if min_ts < blob_update_time < max_ts:
+                return True
+        return False
+
+    def is_updated_before(self, bucket_name, object_name, ts):
+        """
+        Checks if an blob_name is updated before given time in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+            storage bucket.
+        :type object_name: str
+        :param ts: The timestamp to check against.
+        :type ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
 
 Review comment:
   Ok. Thanks. It already exist. I should have used it. Thanks very much for the review

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r396082445
 
 

 ##########
 File path: airflow/providers/google/cloud/hooks/gcs.py
 ##########
 @@ -307,6 +307,79 @@ def is_updated_after(self, bucket_name, object_name, ts):
 
         return False
 
+    def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
+        """
+        Checks if an blob_name is updated in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+                storage bucket.
+        :type object_name: str
+        :param min_ts: The minimum timestamp to check against.
+        :type min_ts: datetime.datetime
+        :param max_ts: The maximum timestamp to check against.
+        :type max_ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
+                object_name, bucket_name))
+
+        blob_update_time = blob.updated
+
+        if blob_update_time is not None:
+            import dateutil.tz
+
+            if not min_ts.tzinfo:
+                min_ts = min_ts.replace(tzinfo=dateutil.tz.tzutc())
+            if not max_ts.tzinfo:
+                max_ts = max_ts.replace(tzinfo=dateutil.tz.tzutc())
+
+            self.log.info("Verify object date: %s is between %s and %s", blob_update_time, min_ts, max_ts)
+
+            if min_ts < blob_update_time < max_ts:
+                return True
+        return False
+
+    def is_updated_before(self, bucket_name, object_name, ts):
+        """
+        Checks if an blob_name is updated before given time in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+            storage bucket.
+        :type object_name: str
+        :param ts: The timestamp to check against.
+        :type ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
 
 Review comment:
   I used another way to do this, please check if it's ok. I wasn't able to access parameter of the outer function from the inner function when I tried your suggestion, so I decided to use something a little different. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395970116
 
 

 ##########
 File path: airflow/providers/google/cloud/hooks/gcs.py
 ##########
 @@ -307,6 +307,79 @@ def is_updated_after(self, bucket_name, object_name, ts):
 
         return False
 
+    def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
+        """
+        Checks if an blob_name is updated in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+                storage bucket.
+        :type object_name: str
+        :param min_ts: The minimum timestamp to check against.
+        :type min_ts: datetime.datetime
+        :param max_ts: The maximum timestamp to check against.
+        :type max_ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
+                object_name, bucket_name))
+
+        blob_update_time = blob.updated
+
+        if blob_update_time is not None:
+            import dateutil.tz
+
+            if not min_ts.tzinfo:
+                min_ts = min_ts.replace(tzinfo=dateutil.tz.tzutc())
+            if not max_ts.tzinfo:
+                max_ts = max_ts.replace(tzinfo=dateutil.tz.tzutc())
+
+            self.log.info("Verify object date: %s is between %s and %s", blob_update_time, min_ts, max_ts)
+
+            if min_ts < blob_update_time < max_ts:
+                return True
+        return False
+
+    def is_updated_before(self, bucket_name, object_name, ts):
+        """
+        Checks if an blob_name is updated before given time in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+            storage bucket.
+        :type object_name: str
+        :param ts: The timestamp to check against.
+        :type ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
 
 Review comment:
   Meybe extract common method and pass the condition as lambda ? That will be a lot less code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#issuecomment-601966502
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=h1) Report
   > Merging [#7791](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9ad1844d582f15b488d261ba80159dba388d7f8b&el=desc) will **decrease** coverage by `0.26%`.
   > The diff coverage is `85.96%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7791/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7791      +/-   ##
   ==========================================
   - Coverage   86.88%   86.61%   -0.27%     
   ==========================================
     Files         924      924              
     Lines       44674    44727      +53     
   ==========================================
   - Hits        38816    38742      -74     
   - Misses       5858     5985     +127     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `87.17% <75.00%> (-3.73%)` | :arrow_down: |
   | [airflow/providers/google/cloud/hooks/gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2hvb2tzL2djcy5weQ==) | `86.43% <91.89%> (+0.97%)` | :arrow_up: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0.00%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0.00%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0.00%> (-45.08%)` | :arrow_down: |
   | [...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==) | `69.69% <0.00%> (-25.26%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0.00%> (-23.53%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=footer). Last update [9ad1844...c8d9075](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#issuecomment-601966502
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=h1) Report
   > Merging [#7791](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/fee827ed5a5d4aea2d2e60f73443357721a2c45a&el=desc) will **decrease** coverage by `0.29%`.
   > The diff coverage is `85.71%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7791/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7791      +/-   ##
   ==========================================
   - Coverage   86.93%   86.63%   -0.30%     
   ==========================================
     Files         924      924              
     Lines       44674    44715      +41     
   ==========================================
   - Hits        38836    38741      -95     
   - Misses       5838     5974     +136     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `89.81% <80.00%> (-1.10%)` | :arrow_down: |
   | [airflow/providers/google/cloud/hooks/gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2hvb2tzL2djcy5weQ==) | `85.66% <87.50%> (+0.20%)` | :arrow_up: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0.00%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0.00%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0.00%> (-45.08%)` | :arrow_down: |
   | [...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==) | `69.69% <0.00%> (-25.26%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0.00%> (-23.53%)` | :arrow_down: |
   | [airflow/utils/process\_utils.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9wcm9jZXNzX3V0aWxzLnB5) | `76.53% <0.00%> (-11.23%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=footer). Last update [fee827e...d6520d3](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#issuecomment-601966502
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=h1) Report
   > Merging [#7791](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9ad1844d582f15b488d261ba80159dba388d7f8b&el=desc) will **decrease** coverage by `0.26%`.
   > The diff coverage is `85.96%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7791/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7791      +/-   ##
   ==========================================
   - Coverage   86.88%   86.61%   -0.27%     
   ==========================================
     Files         924      924              
     Lines       44674    44727      +53     
   ==========================================
   - Hits        38816    38742      -74     
   - Misses       5858     5985     +127     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `87.17% <75.00%> (-3.73%)` | :arrow_down: |
   | [airflow/providers/google/cloud/hooks/gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2hvb2tzL2djcy5weQ==) | `86.43% <91.89%> (+0.97%)` | :arrow_up: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0.00%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0.00%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0.00%> (-45.08%)` | :arrow_down: |
   | [...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==) | `69.69% <0.00%> (-25.26%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0.00%> (-23.53%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=footer). Last update [9ad1844...c8d9075](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395970571
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -277,14 +283,35 @@ def _copy_source_with_wildcard(self, hook, prefix):
                                      destination_object=destination_object)
 
     def _copy_single_object(self, hook, source_object, destination_object):
-        if self.last_modified_time is not None:
+        if self.last_modified_time and self.maximum_modified_time:
+            # check to see if object was modified between last_modified_time and
+            # maximum_modified_time
+            if hook.is_updated_between(self.source_bucket,
+                                       source_object,
+                                       self.last_modified_time,
+                                       self.maximum_modified_time
+                                       ):
+                self.log.debug("Object has been modified between %s and %s",
+                               self.last_modified_time, self.maximum_modified_time)
+            else:
+                return
+
+        elif self.last_modified_time is not None:
             # Check to see if object was modified after last_modified_time
             if hook.is_updated_after(self.source_bucket,
                                      source_object,
                                      self.last_modified_time):
                 self.log.debug("Object has been modified after %s ", self.last_modified_time)
             else:
                 return
+        elif self.maximum_modified_time is not None:
+            # Check to see if object was modified before maximum_modified_time
+            if hook.is_updated_before(self.source_bucket,
+                                      source_object,
+                                      self.maximum_modified_time):
+                self.log.debug("Object has been modified before %s ", self.maximum_modified_time)
 
 Review comment:
   Same here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r396067328
 
 

 ##########
 File path: airflow/providers/google/cloud/hooks/gcs.py
 ##########
 @@ -307,6 +307,79 @@ def is_updated_after(self, bucket_name, object_name, ts):
 
         return False
 
+    def is_updated_between(self, bucket_name, object_name, min_ts, max_ts):
+        """
+        Checks if an blob_name is updated in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+                storage bucket.
+        :type object_name: str
+        :param min_ts: The minimum timestamp to check against.
+        :type min_ts: datetime.datetime
+        :param max_ts: The maximum timestamp to check against.
+        :type max_ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
+                object_name, bucket_name))
+
+        blob_update_time = blob.updated
+
+        if blob_update_time is not None:
+            import dateutil.tz
+
+            if not min_ts.tzinfo:
+                min_ts = min_ts.replace(tzinfo=dateutil.tz.tzutc())
+            if not max_ts.tzinfo:
+                max_ts = max_ts.replace(tzinfo=dateutil.tz.tzutc())
+
+            self.log.info("Verify object date: %s is between %s and %s", blob_update_time, min_ts, max_ts)
+
+            if min_ts < blob_update_time < max_ts:
+                return True
+        return False
+
+    def is_updated_before(self, bucket_name, object_name, ts):
+        """
+        Checks if an blob_name is updated before given time in Google Cloud Storage.
+
+        :param bucket_name: The Google Cloud Storage bucket where the object is.
+        :type bucket_name: str
+        :param object_name: The name of the object to check in the Google cloud
+            storage bucket.
+        :type object_name: str
+        :param ts: The timestamp to check against.
+        :type ts: datetime.datetime
+        """
+        client = self.get_conn()
+        bucket = client.bucket(bucket_name)
+        blob = bucket.get_blob(blob_name=object_name)
+
+        if blob is None:
+            raise ValueError("Object ({}) not found in Bucket ({})".format(
 
 Review comment:
   Wow. This is super. Thanks so much!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#issuecomment-601966502
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=h1) Report
   > Merging [#7791](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9ad1844d582f15b488d261ba80159dba388d7f8b&el=desc) will **decrease** coverage by `0.26%`.
   > The diff coverage is `85.96%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7791/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7791      +/-   ##
   ==========================================
   - Coverage   86.88%   86.61%   -0.27%     
   ==========================================
     Files         924      924              
     Lines       44674    44727      +53     
   ==========================================
   - Hits        38816    38742      -74     
   - Misses       5858     5985     +127     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `87.17% <75.00%> (-3.73%)` | :arrow_down: |
   | [airflow/providers/google/cloud/hooks/gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2hvb2tzL2djcy5weQ==) | `86.43% <91.89%> (+0.97%)` | :arrow_up: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0.00%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0.00%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0.00%> (-45.08%)` | :arrow_down: |
   | [...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==) | `69.69% <0.00%> (-25.26%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0.00%> (-23.53%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=footer). Last update [9ad1844...c8d9075](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395973792
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -87,6 +87,10 @@ class GCSToGCSOperator(BaseOperator):
         only if they were modified after last_modified_time.
         If tzinfo has not been set, UTC will be assumed.
     :type last_modified_time: datetime.datetime
+    :param maximum_modified_time: When specified, the objects will be copied or moved,
 
 Review comment:
   Ok. I understand it better now. Thanks for the explanations.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#issuecomment-601966502
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=h1) Report
   > Merging [#7791](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9ad1844d582f15b488d261ba80159dba388d7f8b?src=pr&el=desc) will **decrease** coverage by `0.26%`.
   > The diff coverage is `85.96%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7791/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7791      +/-   ##
   ==========================================
   - Coverage   86.88%   86.61%   -0.27%     
   ==========================================
     Files         924      924              
     Lines       44674    44727      +53     
   ==========================================
   - Hits        38816    38742      -74     
   - Misses       5858     5985     +127
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...low/providers/google/cloud/operators/gcs\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9nY3NfdG9fZ2NzLnB5) | `87.17% <75%> (-3.73%)` | :arrow_down: |
   | [airflow/providers/google/cloud/hooks/gcs.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2hvb2tzL2djcy5weQ==) | `86.43% <91.89%> (+0.97%)` | :arrow_up: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0%> (-45.08%)` | :arrow_down: |
   | [...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==) | `69.69% <0%> (-25.26%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7791/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0%> (-23.53%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=footer). Last update [9ad1844...c8d9075](https://codecov.io/gh/apache/airflow/pull/7791?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7791: Add ability to specify a maximum modified time for objects in GCSToGCSOperator
URL: https://github.com/apache/airflow/pull/7791#discussion_r395970551
 
 

 ##########
 File path: airflow/providers/google/cloud/operators/gcs_to_gcs.py
 ##########
 @@ -277,14 +283,35 @@ def _copy_source_with_wildcard(self, hook, prefix):
                                      destination_object=destination_object)
 
     def _copy_single_object(self, hook, source_object, destination_object):
-        if self.last_modified_time is not None:
+        if self.last_modified_time and self.maximum_modified_time:
+            # check to see if object was modified between last_modified_time and
+            # maximum_modified_time
+            if hook.is_updated_between(self.source_bucket,
+                                       source_object,
+                                       self.last_modified_time,
+                                       self.maximum_modified_time
+                                       ):
+                self.log.debug("Object has been modified between %s and %s",
 
 Review comment:
   I think you should rather log in debug when the object was excluded rather than included.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services