You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/05 23:33:52 UTC

[GitHub] [airflow] jarfgit commented on pull request #17145: Adds an s3 list prefixes operator

jarfgit commented on pull request #17145:
URL: https://github.com/apache/airflow/pull/17145#issuecomment-893890145


   @o-nikolas @iostreamdoth @potiuk Ok, at long last I've updated this pull request:
   
   * Refactored `S3ListOperator` to take a `recursive` parameter
   * Refactored the S3 list keys hook to return both a list of keys and list of common prefixes (I know this is out of scope for the issue, but making unnecessary API calls bothers me and it was easy enough to do)
   * If `recursive == True` then we add the prefixes to the list of keys returned by the operator
   * Ensured `recursive` defaults to `False` and _shouldn't_ present a breaking change
   * Added unit tests to both the `s3_list_keys` and `s3_list_prefixes` hooks to make sure I (and presumably future contributors) fully understood what the `delimiter` and `prefix` args were doing and how various combinations affected what was returned when `recursive == True`
   
   
   The above assumes the following:
   
   * The user wants to retrieve both keys and prefixes using the same optional params (i.e. `delimiter` and `prefix`). 
   * The user can distinguish keys from prefixes from the list returned by the operator. I would assume that keys have a file extension and prefixes would include the delimiter... but it seems possible that keys may not _always_ have a file extension and I'm basing my understanding of the prefixes containing the delimiter on the unit tests.
   
   
   So I have this question:
   
   If we want to refactor the operator to return both prefixes and keys but a user might want to use different optional params between keys and prefixes, I don't see an alternative other than requiring the user to use the operator twice with different params. With this in mind, is there a valid argument to have a dedicated `S3ListPrefixes` operator after all?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org