You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/30 17:03:32 UTC
[GitHub] [airflow] dstandish commented on pull request #18447: Add RedshiftSQLHook, RedshiftSQLOperator
dstandish commented on pull request #18447:
URL: https://github.com/apache/airflow/pull/18447#issuecomment-931503404
@ashb I verified IAM works with postgres hook. and i verified that super is returned as string.
But even so (and separate from the `geometry` data type, which seems like a valid point though I didn't verify), I think there's a case for adding a redshift hook using this connector.
From a user experience perspective, the redshift connector appears to be easier to deal with.
For example, to use IAM with postgreshook, unless your airflow user aws profile has access to generate cluster creds, you seem to need two separate airflow connections:
```python
import os
from airflow.models.connection import Connection
from airflow.providers.postgres.hooks.postgres import PostgresHook
os.environ['AIRFLOW_CONN_AWS_REDSHIFT_CLUSTER_CREDS_GEN'] = 'aws://aws_id:aws_key@?region=region-1'
c = Connection(
host='cluster-identifier.random-id.region-1.redshift.amazonaws.com',
login='my_db_username',
schema='my_database',
extra=json.dumps({"iam": True, "redshift": True, "aws_conn_id": "aws_redshift_cluster_creds_gen"}),
)
hook = PostgresHook(connection=c)
cnx = hook.get_conn()
```
(commonly, your airflow user might not have that access, but you'd reference an aws airflow conn that does)
And you'd need to use the cluster hostname `cluster-identifier.random-id.region-1.redshift.amazonaws.com` but with redshift connector the simple cluster identifier will do (which is just more convenient):
```python
conn = redshift_connector.connect(
iam=True,
database='dev',
db_user='awsuser',
cluster_identifier='examplecluster',
profile='default'
)
```
Beyond that, redshift does not behave 100% the same as postgres with regard to certain things. For example external schemas don't show up in `information_schema.schemata`. If you look in the [cursor code](https://github.com/aws/amazon-redshift-python-driver/blob/master/redshift_connector/cursor.py), where there are a lot of convenience methods for listing things like tables and primary keys, you can see 9 references to redshift-specific views (search for svv).
These things built into tho redshift connector would seem to provide meaningful user value. Though I'm unaware how how well tested this library is in the wild.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org