You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Joel Croteau (Jira)" <ji...@apache.org> on 2019/10/07 23:31:00 UTC

[jira] [Created] (AIRFLOW-5612) Add ability to actually do things with created and modified date in GoogleCloudStorageHook

Joel Croteau created AIRFLOW-5612:
-------------------------------------

             Summary: Add ability to actually do things with created and modified date in GoogleCloudStorageHook
                 Key: AIRFLOW-5612
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5612
             Project: Apache Airflow
          Issue Type: Improvement
          Components: gcp
    Affects Versions: 1.10.5
            Reporter: Joel Croteau


{{GoogleCloudStorageHook}} seems to support only a very small subset of the actual GCS API. In particular, the only thing it allows you to do with the date of an object is check if the metadata was updated after a specified time using {{is_updated_after}}. First of all, this only looks at the metadata update date, which is probably not what is wanted for most purposes, as {{timeCreated}} is generally what conveys useful information. Second of all, it seems rather arbitrary to only allow me to compare if the updated time is greater than some other time, and not just give me the time and let me make my own inferences. In particular, for a scheduled workflow with a potential backfill, I would like to check for a creation date with both a minimum and maximum value, which this doesn't allow.

 

Also, tangentially, if you want to get multiple pieces of information on an object, using {{GoogleCloudStorageHook}} will necessitate a separate call to {{objects().get()}} for every piece of information, even though everything is returned by the one call. Would it not make more sense to be able to return an object structure with all of the needed information in it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)