You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Vikram Oberoi (JIRA)" <ji...@apache.org> on 2018/08/02 23:16:00 UTC

[jira] [Created] (AIRFLOW-2842) GCS rsync operator

Vikram Oberoi created AIRFLOW-2842:
--------------------------------------

             Summary: GCS rsync operator
                 Key: AIRFLOW-2842
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2842
             Project: Apache Airflow
          Issue Type: Improvement
            Reporter: Vikram Oberoi


The GoogleCloudStorageToGoogleCloudStorageOperator supports copying objects from one bucket to another using a wildcard.

As long you don't delete anything in the source bucket, the destination bucket will end up synchronized on every run.

However, each object gets copied over even if it exists at the destination, which makes this operation inefficient, time-consuming, and potentially costly.

I'd love an operator that behaves like `gsutil rsync` for when I need to synchronize two buckets, supporting `gsutil rsync -d` behavior as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)