You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Daniel Lamblin (JIRA)" <ji...@apache.org> on 2019/01/15 10:20:00 UTC

[jira] [Commented] (AIRFLOW-2862) S3ToRedshiftTransfer Copy Command Flexibility

    [ https://issues.apache.org/jira/browse/AIRFLOW-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742936#comment-16742936 ] 

Daniel Lamblin commented on AIRFLOW-2862:
-----------------------------------------

This would be a breaking change and would need to be held back for the 2.0 release (if breaking this is okay then) or made non-default behavior via an option… or just write a different operator named a little differently.

TBH if it were my Airflow deployments I was doing this for, I would do something like:
 * Copy this operator and name it like S3ToRedshiftTransfer2
 * put the more sensible (it's a good suggestion) change into this operator in a way that the S3ToRedshiftTransfer operator can subclass the S3ToRedshiftTransfer2 operator and
 * override the command template to provide the existing behavior in a subclass named S3ToRedshiftTransfer (I know that sounds backward but...).; then
 * when 2.0 is released
 ** rename S3ToRedshiftTransfer2 operator back to S3ToRedshiftTransfer,
 ** rename the subclassed operator to S3ToRedshiftTransferDeprecated, and
 ** leave it's implementation in documentation only for users who are upgrading and can't update some X number of DAGs.

> S3ToRedshiftTransfer Copy Command Flexibility
> ---------------------------------------------
>
>                 Key: AIRFLOW-2862
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2862
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>            Reporter: Micheal Ascah
>            Assignee: Micheal Ascah
>            Priority: Minor
>
> Currently, the S3ToRedshiftTransfer class requires that the target table to be loaded is suffixed to the end of the S3 key provided.
> It doesn't seem justifiable that the operator should require the file be named by any convention. The S3 bucket + S3 key should be all that is needed. This makes it possible to load any S3 Key into a Redshift table, rather than only files that have the table name at the end of the S3 key.
> The S3 key parameter should also be template-able so that files created in S3 using timestamps from macros in other tasks in the current DAG run can be used to identify files when loading from S3 to Redshift.
> The command template should change from 
> {code:java}
> COPY {schema}.{table}
>  FROM 's3://{s3_bucket}/{s3_key}/{table}'
>  with credentials
>  'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
>  {copy_options};{code}
>  To
>  
> {code:java}
> COPY {schema}.{table}
>  FROM 's3://{s3_bucket}/{s3_key}'
>  with credentials
>  'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
>  {copy_options};
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)