You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/01/30 08:36:28 UTC

[GitHub] icaroNZ opened a new pull request #4611: [AIRFLOW- boyscout] Enforce delimiter for gcs_to_gcs operator using a flag, enforce_delimeter

icaroNZ opened a new pull request #4611: [AIRFLOW- boyscout] Enforce delimiter for gcs_to_gcs operator using a flag, enforce_delimeter
URL: https://github.com/apache/airflow/pull/4611
 
 
   Problem now:
   Given the files: test1.csv, test2.csv, test10.csv, test100.csv, test1.gz, test2.gz, test10.gz, test100.gz
   When trying to match test*.csv
   Result all files above is match
   Fix:
   Given the files: test1.csv, test2.csv, test10.csv, test100.csv, test1.gz, test2.gz, test10.gz, test100.gz
   When trying to match test*.csv
   Result only the files test1.csv, test2.csv, test10.csv, test100.csv is a match
   
   Problem that still in the code: when using multiple wildcards it does not enforces the 'middle part' of it:
   Given the files: testProd1.csv, test2Prod.csv, testProd10.csv, testProd100.csv, testProd1.gz, test2Prod.gz, test10Prod.gz, test100Prod.gz, in directory dir1 and dir2
   When trying to match /testAcceptance.csv
   Result all files above is match
   Expect: No files should be returned
   
   The enforce_delimiter flag has a default value of False and do not change the current operator if the flag value is set to False or left unset.
   When set to True it uses a new hook, list_with_delimiter, in this hook the value after the last wildcard '*' is enforced.
   Notice that this PR fix only the problem of enforcing the last part of the path, the middle part stays as it is, as per above

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services