You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Andy Hadjigeorgiou (JIRA)" <ji...@apache.org> on 2017/09/29 15:46:00 UTC

[jira] [Created] (AIRFLOW-1663) Redshift Connection, Hook, & Operator for COPY command usability

Andy Hadjigeorgiou created AIRFLOW-1663:
-------------------------------------------

             Summary: Redshift Connection, Hook, & Operator for COPY command usability
                 Key: AIRFLOW-1663
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1663
             Project: Apache Airflow
          Issue Type: New Feature
          Components: hooks, operators
            Reporter: Andy Hadjigeorgiou
            Assignee: Andy Hadjigeorgiou
            Priority: Minor


I'm using Redshift as a data warehouse in conjunction with Airflow, and I've found that it wasn't immediately apparent that Airflow had the hooks/connections to support Redshift. In practice, because Redshift is based off of Postgres, a Postgres hook works for basic commands. However, when running a COPY command (uniquely built in Redshift to copy data in parallel), more work is necessary to include AWS credentials (ideally credentials aren't in version control, but in a connection). Redshift's unloading to s3 feature would also benefit from a solution where credentials could be stored in a connection.

My proposed solution is to include a Redshift connection, that will allow us to include AWS credentials along with Redshift db connection credentials (similar to an S3 connection). From here, I'll create an appropriate RedshiftHook (probably an extension of PostgresHook), and a RedshiftOperator, with means to simplify Redshift sql queries with AWS credentials (& perhaps using psycopg2's copy_expert method).

It's my first time posting here, and I'm looking to contribute meaningfully - any feedback regarding this feature would be much appreciated! I read that features which involve contributing to new hooks & operators are welcome, and features in line with project Roadmap are ideal ("Adding features already offered by existing workflow solutions (i.e we need to add expected features"). Currently, Airflow only supports Redshift because of it's basis on Postgres, but more native support will be in line with the features of other workflow solutions, and attract more Redshift users.

I've already started work on this feature, once I clean it up I'll post it here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)