You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "jack (Jira)" <ji...@apache.org> on 2019/10/28 23:09:00 UTC

[jira] [Commented] (AIRFLOW-1663) Redshift Connection, Hook, & Operator for COPY command usability

    [ https://issues.apache.org/jira/browse/AIRFLOW-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961540#comment-16961540 ] 

jack commented on AIRFLOW-1663:
-------------------------------

Possibly what was meant to be done on:

https://issues.apache.org/jira/browse/AIRFLOW-5338

 

> Redshift Connection, Hook, & Operator for COPY command usability
> ----------------------------------------------------------------
>
>                 Key: AIRFLOW-1663
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1663
>             Project: Apache Airflow
>          Issue Type: New Feature
>          Components: hooks, operators
>            Reporter: Andy Hadjigeorgiou
>            Assignee: Andy Hadjigeorgiou
>            Priority: Minor
>
> I'm using Redshift as a data warehouse in conjunction with Airflow, and I've found that it wasn't immediately apparent that Airflow had the hooks/connections to support Redshift. In practice, because Redshift is based off of Postgres, a Postgres hook works for basic commands. However, when running a COPY command (uniquely built in Redshift to copy data in parallel), more work is necessary to include AWS credentials (ideally credentials aren't in version control, but in a connection). Redshift's unloading to s3 feature would also benefit from a solution where credentials could be stored in a connection.
> My proposed solution is to include a Redshift connection, that will allow us to include AWS credentials along with Redshift db connection credentials (similar to an S3 connection). From here, I'll create an appropriate RedshiftHook (probably an extension of PostgresHook), and a RedshiftOperator, with means to simplify Redshift sql queries with AWS credentials (& perhaps using psycopg2's copy_expert method).
> It's my first time posting here, and I'm looking to contribute meaningfully - any feedback regarding this feature would be much appreciated! I read that features which involve contributing to new hooks & operators are welcome, and features in line with project Roadmap are ideal ("Adding features already offered by existing workflow solutions (i.e we need to add expected features"). Currently, Airflow only supports Redshift because of it's basis on Postgres, but more native support will be in line with the features of other workflow solutions, and attract more Redshift users.
> I've already started work on this feature, once I clean it up I'll post it here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)