You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Bjorn Olsen <bj...@gmail.com> on 2019/12/27 12:30:26 UTC

General-purpose SQL hook/operator

Hi community



I have a question/suggestion about the different SQL Database hooks +
operators.



Examples:

- MsSqlHook (Implements DbApiHook using pymssql)

- MySqlHook (Implements DbApiHook using MySQLdb)

- PostgresHook (Implements DbApiHook using psycopg2)



These hooks tend to have convenience methods for doing more complex DB
functions (eg bulk loading), which is useful.



However there is also a downside to them.

First I need to figure out which operator and hook to use and what driver
it is using. Then I have to review the source code to understand what it is
doing and what it isn't doing. Finally (at my unique client site) I request
the driver lib to be whitelisted and installed, and I test it for my use
case.



This can be quite time consuming if working with several different DBs.



Does Airflow have a general-purpose SQL hook + operator?

I didn't see one but perhaps I'm missing it :)



I was considering implementing something based on SQL Alchemy.



Advantages:

- Range of database dialects

- Simple to use esp. for prototyping

- Consistent - no DB-specific code

- No extra dependencies - Airflow already requires SQL Alchemy

- Maybe SQL Alchemy features would be useful eg function-based querying



Disadvantages:

- Another way to do the same thing (some existing DB hooks)

- Maybe less efficient than using DB specific hooks



What are your thoughts?



Kind regards

Re: General-purpose SQL hook/operator

Posted by Anton Zayniev <an...@gmail.com>.
+1 to Jarek

On Sun, Dec 29, 2019, 21:58 Jarek Potiuk <Ja...@polidea.com> wrote:

> I think it's a rare thing for anyone to use a generic DB operations (as
> operator). You usually know which database you have as source/target.
> Even if in one company you have more than one database, then it is not a
> problem to use one hook/operator in Postgres and another in MySQL etc.
>
> IMHO it's better to have a specialized DB hook rather than an SQLAlchemy
> one as you can use specialized SQL for that database.
> Also those specialized hooks already use the DbApiHook with all the common
> code extracted so there is not a lot of duplicated code.
>
> SQLAlchemy is good for an application that has a lot of complex logic and
> OO architecture and should rely on several different DBs as backend.
> But it is mostly only limitation in case of simple ETL scripts (which is
> the case for pretty much all Airflow tasks).
>
> J.
>
> On Fri, Dec 27, 2019 at 1:31 PM Bjorn Olsen <bj...@gmail.com>
> wrote:
>
> > Hi community
> >
> >
> >
> > I have a question/suggestion about the different SQL Database hooks +
> > operators.
> >
> >
> >
> > Examples:
> >
> > - MsSqlHook (Implements DbApiHook using pymssql)
> >
> > - MySqlHook (Implements DbApiHook using MySQLdb)
> >
> > - PostgresHook (Implements DbApiHook using psycopg2)
> >
> >
> >
> > These hooks tend to have convenience methods for doing more complex DB
> > functions (eg bulk loading), which is useful.
> >
> >
> >
> > However there is also a downside to them.
> >
> > First I need to figure out which operator and hook to use and what driver
> > it is using. Then I have to review the source code to understand what it
> is
> > doing and what it isn't doing. Finally (at my unique client site) I
> request
> > the driver lib to be whitelisted and installed, and I test it for my use
> > case.
> >
> >
> >
> > This can be quite time consuming if working with several different DBs.
> >
> >
> >
> > Does Airflow have a general-purpose SQL hook + operator?
> >
> > I didn't see one but perhaps I'm missing it :)
> >
> >
> >
> > I was considering implementing something based on SQL Alchemy.
> >
> >
> >
> > Advantages:
> >
> > - Range of database dialects
> >
> > - Simple to use esp. for prototyping
> >
> > - Consistent - no DB-specific code
> >
> > - No extra dependencies - Airflow already requires SQL Alchemy
> >
> > - Maybe SQL Alchemy features would be useful eg function-based querying
> >
> >
> >
> > Disadvantages:
> >
> > - Another way to do the same thing (some existing DB hooks)
> >
> > - Maybe less efficient than using DB specific hooks
> >
> >
> >
> > What are your thoughts?
> >
> >
> >
> > Kind regards
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Re: General-purpose SQL hook/operator

Posted by Jarek Potiuk <Ja...@polidea.com>.
I think it's a rare thing for anyone to use a generic DB operations (as
operator). You usually know which database you have as source/target.
Even if in one company you have more than one database, then it is not a
problem to use one hook/operator in Postgres and another in MySQL etc.

IMHO it's better to have a specialized DB hook rather than an SQLAlchemy
one as you can use specialized SQL for that database.
Also those specialized hooks already use the DbApiHook with all the common
code extracted so there is not a lot of duplicated code.

SQLAlchemy is good for an application that has a lot of complex logic and
OO architecture and should rely on several different DBs as backend.
But it is mostly only limitation in case of simple ETL scripts (which is
the case for pretty much all Airflow tasks).

J.

On Fri, Dec 27, 2019 at 1:31 PM Bjorn Olsen <bj...@gmail.com> wrote:

> Hi community
>
>
>
> I have a question/suggestion about the different SQL Database hooks +
> operators.
>
>
>
> Examples:
>
> - MsSqlHook (Implements DbApiHook using pymssql)
>
> - MySqlHook (Implements DbApiHook using MySQLdb)
>
> - PostgresHook (Implements DbApiHook using psycopg2)
>
>
>
> These hooks tend to have convenience methods for doing more complex DB
> functions (eg bulk loading), which is useful.
>
>
>
> However there is also a downside to them.
>
> First I need to figure out which operator and hook to use and what driver
> it is using. Then I have to review the source code to understand what it is
> doing and what it isn't doing. Finally (at my unique client site) I request
> the driver lib to be whitelisted and installed, and I test it for my use
> case.
>
>
>
> This can be quite time consuming if working with several different DBs.
>
>
>
> Does Airflow have a general-purpose SQL hook + operator?
>
> I didn't see one but perhaps I'm missing it :)
>
>
>
> I was considering implementing something based on SQL Alchemy.
>
>
>
> Advantages:
>
> - Range of database dialects
>
> - Simple to use esp. for prototyping
>
> - Consistent - no DB-specific code
>
> - No extra dependencies - Airflow already requires SQL Alchemy
>
> - Maybe SQL Alchemy features would be useful eg function-based querying
>
>
>
> Disadvantages:
>
> - Another way to do the same thing (some existing DB hooks)
>
> - Maybe less efficient than using DB specific hooks
>
>
>
> What are your thoughts?
>
>
>
> Kind regards
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>