You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/29 16:37:14 UTC

[GitHub] [airflow] mebelousov opened a new issue #10641: Add new connection type - Greenplum

mebelousov opened a new issue #10641:
URL: https://github.com/apache/airflow/issues/10641


   Greenplum DB is a popular open source relational DBMS[1]. 
   
   I want to implement some database specific behaviour for a common class. It's easy to check connection type for postgres, mysql or oracle. Now I use postgres connection type for Greenplum and cannot explicitly check type. 
   
   Airflow has `PostgresHook` based on `psycopg2`. As Greenplum supports PostgreSQL libpq API [2], `PostgresHook` could be reused for Greenplum.
    
   As I see it's enough to change 2 code line in the `connectoin.py`.
   
   
   I'm not sure that it's nice to reuse `Hook`.
   Please confirm or reject my idea.
   
   
   ```diff
   Index: airflow/models/connection.py
   ===================================================================
   --- airflow/models/connection.py  (revision 6416d898060706787861ff8ecbc4363152a35f45)
   +++ airflow/models/connection.py  (date 1598717245079)
   @@ -109,6 +109,7 @@
            ('grpc', 'GRPC Connection'),
            ('yandexcloud', 'Yandex Cloud'),
            ('spark', 'Spark'),
   +        ('greenplum', 'Greenplum'),
        ]
    
        def __init__(
   @@ -243,7 +244,7 @@
            elif self.conn_type == 'google_cloud_platform':
                from airflow.contrib.hooks.bigquery_hook import BigQueryHook
                return BigQueryHook(bigquery_conn_id=self.conn_id)
   -        elif self.conn_type == 'postgres':
   +        elif self.conn_type in ('postgres', 'greenplum'):
                from airflow.hooks.postgres_hook import PostgresHook
                return PostgresHook(postgres_conn_id=self.conn_id)
            elif self.conn_type == 'pig_cli':
   ```
   
   ---
   1. https://github.com/greenplum-db/gpdb
   2. https://gpdb.docs.pivotal.io/6-10/security-guide/topics/ports_and_protocols.html


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #10641: Add new connection type - Greenplum

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #10641:
URL: https://github.com/apache/airflow/issues/10641#issuecomment-691970423


   Why don't you configure the connection as PostgreSQL and then add the missing information in the description? See: https://github.com/apache/airflow/pull/10873/files
   Wasn't the ability to add a description for the connection enough for you?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] RosterIn commented on issue #10641: Add new connection type - Greenplum

Posted by GitBox <gi...@apache.org>.
RosterIn commented on issue #10641:
URL: https://github.com/apache/airflow/issues/10641#issuecomment-743141687


   @mebelousov I think in that case you don't need to duplicate code. you can inherit from PostgresHook and overwrite only the methods that are incomparable. 
   
   You hook can be:
   
   
   ```
   class GreenplumHook(PostgresHook):
       def __init__(self, *args, **kwargs):
           super().__init__(*args, **kwargs)
   
       def _generate_insert_sql():
           pass
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #10641: Add new connection type - Greenplum

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #10641:
URL: https://github.com/apache/airflow/issues/10641#issuecomment-683313281


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mebelousov closed issue #10641: Add new connection type - Greenplum

Posted by GitBox <gi...@apache.org>.
mebelousov closed issue #10641:
URL: https://github.com/apache/airflow/issues/10641


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mebelousov commented on issue #10641: Add new connection type - Greenplum

Posted by GitBox <gi...@apache.org>.
mebelousov commented on issue #10641:
URL: https://github.com/apache/airflow/issues/10641#issuecomment-692009371


   @mik-laj good idea! Thank you! Seems it's a good work-around.
   
   I found out than method `_generate_insert_sql` in PostgresHook [1] is incompatible with Greenplum syntax [2]. Now Greenplum doesn't allow to `ON CONFLICT ({0}) DO UPDATE SET {1}`.
   I believe to add GreenplumHook for full compatibility. GreenplumHook will be copy of PostgresHook with exeption of `_generate_insert_sql`. It's enough to use `_generate_insert_sql` from DbApiHook.
   Is it good idea to add a new hook?
   
   In general we have two choices:
   
   1. Add GreenplumHook. 
   1. Use connection description.
   
   
   ---
   1. https://github.com/apache/airflow/blob/eaa49b2257913c34b15408a14e445f6106e691ee/airflow/providers/postgres/hooks/postgres.py#L194
   2. http://docs.greenplum.org/6-0/ref_guide/sql_commands/INSERT.html


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #10641: Add new connection type - Greenplum

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #10641:
URL: https://github.com/apache/airflow/issues/10641#issuecomment-691713482


   @mebelousov would you like to open a PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org