You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/30 17:03:32 UTC

[GitHub] [airflow] dstandish commented on pull request #18447: Add RedshiftSQLHook, RedshiftSQLOperator

dstandish commented on pull request #18447:
URL: https://github.com/apache/airflow/pull/18447#issuecomment-931503404


   @ashb I verified IAM works with postgres hook. and i verified that super is returned as string.
   
   But even so (and separate from the `geometry` data type, which seems like a valid point though I didn't verify), I think there's a case for adding a redshift hook using this connector.
   
   From a user experience perspective, the redshift connector appears to be easier to deal with.
   
   For example, to use IAM with postgreshook, unless your airflow user aws profile has access to generate cluster creds, you seem to need two separate airflow connections:
   
   ```python
   import os
   
   from airflow.models.connection import Connection
   from airflow.providers.postgres.hooks.postgres import PostgresHook
   
   os.environ['AIRFLOW_CONN_AWS_REDSHIFT_CLUSTER_CREDS_GEN'] = 'aws://aws_id:aws_key@?region=region-1'
   c = Connection(
       host='cluster-identifier.random-id.region-1.redshift.amazonaws.com',
       login='my_db_username',
       schema='my_database',
       extra=json.dumps({"iam": True, "redshift": True, "aws_conn_id": "aws_redshift_cluster_creds_gen"}),
   )
   hook = PostgresHook(connection=c)
   cnx = hook.get_conn()
   ```
   
   (commonly, your airflow user might not have that access, but you'd reference an aws airflow conn that does)
   
   And you'd need to use the cluster hostname `cluster-identifier.random-id.region-1.redshift.amazonaws.com` but with redshift connector the simple cluster identifier will do (which is just more convenient):
   
   ```python
   conn = redshift_connector.connect(
       iam=True,
       database='dev',
       db_user='awsuser',
       cluster_identifier='examplecluster',
       profile='default'
    )
   ```
   
   Beyond that, redshift does not behave 100% the same as postgres with regard to certain things.   For example external schemas don't show up in `information_schema.schemata`.  If you look in the [cursor code](https://github.com/aws/amazon-redshift-python-driver/blob/master/redshift_connector/cursor.py), where there are a lot of convenience methods for listing things like tables and primary keys, you can see 9 references to redshift-specific views (search for svv).
   
   These things built into tho redshift connector would seem to provide meaningful user value.  Though I'm unaware how how well tested this library is in the wild.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org