You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Gael Magnan <ga...@gmail.com> on 2017/01/09 10:15:10 UTC

Refactoring Connection

Hi,

following my question on gitter the other day and the response from Bolke
de Bruin, I've started working on refactoring the connections in airflow.

Before submitting a PR I wanted to share my proposal with you and get
feedbacks.

The idea is quite simple, I've divided the Connection class in two,
Connection and ConnectionType, connection has the same interface it had
before plus a few methods, but the class keeps a reference to a dictionary
of registered ConnectionType. It delegates the work of parsing from URI,
formatting to URI (added) and getting the hook to the ConnectionType
associated with the conn_type.

I've thought of two ways of registering new ConnectionTypes, the first is
making the BaseConnectionType use a metaclass that registered any new
ConnectionType with Connection when the class is declared, it would require
the less work to extend the connection module, as just importing the file
with the connection would do the trick.
The second one is juste to have a function/classmethod that you call
manually to register your connection. It would be simpler to understand but
requires more work every time you create a new ConnectionType.

Hope this proposal is clear enough, and I'm waiting for feebacks and
possible improvements.

Regards
Gael Magnan de Bornier

Re: Refactoring Connection

Posted by Gael Magnan <ga...@gmail.com>.
Typo on the airflow link: it's
https://issues.apache.org/jira/browse/AIRFLOW-746
Sorry

Le mer. 11 janv. 2017 à 15:48, Gael Magnan <ga...@gmail.com> a écrit :

> Thanks for all the information,
>
> currently what I have is working iso with what was existing, just need to
> add some doc and tests, but I have a few questions.
>
> Currently airflow/urtils/db.py initdb function creates a lot of
> connections by default.
> I understand that they are used for the tests, but why are they created
> even on production environment?
> Looking into that made me realize that we can create connections of a type
> that is not defined, (ex: beeline, aws, ...) by code just like you said but
> it's to the best of my knowledge impossible with the UI.
>
> By the way, should I create the beeline and aws connection types? So
> everyone can create them through the UI?
>
> I've created a JIRA issue:
> https://issues.apache.org/jira/browse/AIRFLOW-235, and a first draft of
> the PR: https://github.com/apache/incubator-airflow/pull/1981, more tests
> and doc coming + all the changes depending on where this discussion takes
> us.
>
>
> Le mar. 10 janv. 2017 à 21:55, Maxime Beauchemin <
> maximebeauchemin@gmail.com> a écrit :
>
> Note that the `Conection.extra` attribute was designed to store anything
> that is specific to a connection type. Here's an example of how it's used
> in the HiveHook:
>
> https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/hive_hooks.py#L75
>
> Also note that currently there's no enforcement of connection type that
> match a hook, so you can use any connection with any hook. So you could set
> a new connection for mongo with no type at all and still use your local
> MongoHook without changing the Airflow source code.
>
> The one part that is missing is the association from the connection to the
> hook here in Connection.get_hook
>
> https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L620
> This allows getting the hook from the connection, and allows for the Data
> Profiling section of the UI to know how to execute a query against that
> connection type.
>
> Max
>
> On Tue, Jan 10, 2017 at 2:58 AM, Gael Magnan <ga...@gmail.com> wrote:
>
> > @George Leslie-Waksman:
> >   It could be done, but since it's marked as deprecated everywhere and
> Alex
> > Van Boxel asked me tu used entry points I'm not sure we should do both.
> >
> > @Jeremiah Lowin:
> >   I was thinking not changing much of the api, juste the implementation
> > behind it.
> >   So for example, adding the mongodb connection type would just be doing
> > the following (assuming we have a MongodbHook):
> >   ```
> >   from airflow.models import BaseConnectionType
> >
> >   class MongodbConnectionType(BaseConnectionType):
> >       name = 'mongodb'
> >       description = 'MongoDB'
> >
> >       def get_hook(conn_id):
> >           try:
> >               from airflow.hooks.mongodb_hook import MongodbHook
> >               return MongodbHook(conn_id=conn_id)
> >           except:
> >               pass
> >   ```
> >
> >   and if we want to handle the query params (and we do) we could just do
> > something like overriding the parse_from_uri method (not final code):
> >   ```
> >       @classmethod
> >       def parse_from_uri(cls, uri):
> >           temp_uri = urlparse(uri)
> >           (host, schema, login, password, port,
> >             extras) = super().parse_from_uri(cls, uri)
> >           extras = dict([param.split('=', 1) for param in
> > temp_uri.query.split('&')])
> >           return host, schema, login, password, port, extras
> >   ```
> >
> >   So if you defined a connection using an uri the way you are able now,
> and
> > this URI started with mongodb,
> >   it would be extracted and a connection created.
> >
> >
> > @Maxime Beauchemin:
> >   By can't easily I meant that to had a connection type you have to go
> > through making a pr to airflow, you can't just use a plugin.
> >   I think all you have to do to add a new one is modify the Connection
> > class in models. But it's still hard coded.
> >
> >   I agree with you, we could easily use just a data structure for most of
> > the connections, especially the ones existing now,
> >   but it would make it harder to add a connection whose uri format is not
> > consistant with the other ones (not to mention ones that don't use uri),
> >   except if we also handle a string format that would parse the uri and
> set
> > the right fields.
> >
> > Le mar. 10 janv. 2017 à 01:35, Maxime Beauchemin <
> > maximebeauchemin@gmail.com>
> > a écrit :
> >
> > > About *"you can't **easily add a connection type without modifying the
> > > airflow code source*", can we start by listing out the places that need
> > to
> > > be touched up when adding a connection?
> > >
> > > Here's what I could find:
> > > https://github.com/apache/incubator-airflow/blob/master/
> > > airflow/models.py#L516
> > > https://github.com/apache/incubator-airflow/blob/master/
> > > airflow/utils/db.py#L109
> > > https://github.com/apache/incubator-airflow/blob/master/
> > > airflow/models.py#L620
> > >
> > > I didn't dig super deep but it seems like what we need is a
> configurable
> > > connection configuration blob that informs the connection type name,
> the
> > > verbose name and the related hook location.
> > >
> > > Maybe a solution is to have a `CONNECTION_TYPES` object in settings.py
> > that
> > > could be altered in different environments. This CONNECTION_TYPE would
> > > store the name, the verbose_name and the related hook's path as a
> string
> > > `airflow.hooks.mysql_hook.MySqlHook` or something like that. It may not
> > be
> > > that far from what you were proposing, and whether it should be a class
> > or
> > > a data structure is arguable, but I'd like to keep configuration
> elements
> > > easily serializable so I'd vote for a data structure using only
> > primitives.
> > >
> > > Max
> > >
> > > On Mon, Jan 9, 2017 at 5:13 AM, Alex Van Boxel <al...@vanboxel.be>
> wrote:
> > >
> > > > I was actually going to propose something different with
> entry-points,
> > > but
> > > > your requirement beat me to it (but that's ok :-). Actually I think
> > with
> > > > this mechanism people would be able to extend Airflow connection
> > > mechanism
> > > > (and later other stuff) by doing *pip install
> > > airflow-sexy-new-connection*
> > > > (for example).
> > > >
> > > > On Mon, Jan 9, 2017 at 1:39 PM Gael Magnan <ga...@gmail.com>
> > wrote:
> > > >
> > > > > Thank you for the read, I'm gonna look at it, it's probably gonna
> be
> > > > better
> > > > > that what I have.
> > > > >
> > > > > Point taken about the URI, I'll see if i can find something generic
> > > > enough
> > > > > to handle all those cases.
> > > > >
> > > > > Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <al...@vanboxel.be> a
> > > écrit
> > > > :
> > > > >
> > > > > > Thanks a lot, yes it clarifies a lot and I do agree you really
> need
> > > to
> > > > > hack
> > > > > > inside Airflow to add a Connection type. While you're working at
> > this
> > > > > could
> > > > > > you have a look at the standard python *entry-point mechanism*
> for
> > > > > > registering Connection types/components.
> > > > > >
> > > > > > A good read on this:
> > > > > >
> > > > > >
> > > > > http://docs.pylonsproject.org/projects/pylons-webframework/
> > > > en/latest/advanced_pylons/entry_points_and_plugins.html
> > > > > >
> > > > > > My first though would be that just by adding an entry to the
> > factory
> > > > > method
> > > > > > would be enough to register your Connection + ConnectionType and
> > UI.
> > > > > >
> > > > > > Also note that not everything works with a URI. The Google Cloud
> > > > > Connection
> > > > > > doesn't have one, it uses a secret key file stored on disk, so
> > don't
> > > > > force
> > > > > > every connection type to work with URI's.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <gaelmagnan@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Yes sure,
> > > > > > >
> > > > > > > The question was the following:
> > > > > > > "I was looking at the code of the connections, and I realized
> you
> > > > can't
> > > > > > > easily add a connection type without modifying the airflow code
> > > > > source. I
> > > > > > > wanted to create a mongodb connection type, but I think the
> best
> > > > > approche
> > > > > > > would be to refactor connections first. Thoughts anyone?"
> > > > > > >
> > > > > > > The answer of Bolke de Bruin was: "making it more generic would
> > be
> > > > > > > appreciated"
> > > > > > >
> > > > > > > So basically the way the code is set up actually every types of
> > > > > > connection
> > > > > > > existing is defined though a list in the Connection class. It
> > > > > implements
> > > > > > > exactly the same code for parsing uri to get connections info
> and
> > > > > doesn't
> > > > > > > allow for a simple way to get back the uri from the connection
> > > infos.
> > > > > > >
> > > > > > > I need to add a mongodb connection and a way to get it back as
> a
> > > uri,
> > > > > so
> > > > > > i
> > > > > > > could use an other type of connection and play around with that
> > or
> > > > > juste
> > > > > > > add one more hard coded connection type, but I though this
> might
> > be
> > > > > > > something that comes back regularly and having a simple way to
> > plug
> > > > in
> > > > > > new
> > > > > > > types of connection would make it easier for anyone to
> > contribute a
> > > > new
> > > > > > > connection type.
> > > > > > >
> > > > > > > Hope this clarifies my proposal.
> > > > > > >
> > > > > > > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <alex@vanboxel.be
> >
> > a
> > > > > écrit
> > > > > > :
> > > > > > >
> > > > > > > > Hey Gael,
> > > > > > > >
> > > > > > > > could you please recap the question here and provide some
> > > context.
> > > > > Not
> > > > > > > > everyone on the mailinglist is actively following Gitter,
> > > including
> > > > > me.
> > > > > > > > With some context it would be easier to give feedback.
> Thanks.
> > > > > > > >
> > > > > > > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <
> > > gaelmagnan@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > following my question on gitter the other day and the
> > response
> > > > from
> > > > > > > Bolke
> > > > > > > > > de Bruin, I've started working on refactoring the
> connections
> > > in
> > > > > > > airflow.
> > > > > > > > >
> > > > > > > > > Before submitting a PR I wanted to share my proposal with
> you
> > > and
> > > > > get
> > > > > > > > > feedbacks.
> > > > > > > > >
> > > > > > > > > The idea is quite simple, I've divided the Connection class
> > in
> > > > two,
> > > > > > > > > Connection and ConnectionType, connection has the same
> > > interface
> > > > it
> > > > > > had
> > > > > > > > > before plus a few methods, but the class keeps a reference
> > to a
> > > > > > > > dictionary
> > > > > > > > > of registered ConnectionType. It delegates the work of
> > parsing
> > > > from
> > > > > > > URI,
> > > > > > > > > formatting to URI (added) and getting the hook to the
> > > > > ConnectionType
> > > > > > > > > associated with the conn_type.
> > > > > > > > >
> > > > > > > > > I've thought of two ways of registering new
> ConnectionTypes,
> > > the
> > > > > > first
> > > > > > > is
> > > > > > > > > making the BaseConnectionType use a metaclass that
> registered
> > > any
> > > > > new
> > > > > > > > > ConnectionType with Connection when the class is declared,
> it
> > > > would
> > > > > > > > require
> > > > > > > > > the less work to extend the connection module, as just
> > > importing
> > > > > the
> > > > > > > file
> > > > > > > > > with the connection would do the trick.
> > > > > > > > > The second one is juste to have a function/classmethod that
> > you
> > > > > call
> > > > > > > > > manually to register your connection. It would be simpler
> to
> > > > > > understand
> > > > > > > > but
> > > > > > > > > requires more work every time you create a new
> > ConnectionType.
> > > > > > > > >
> > > > > > > > > Hope this proposal is clear enough, and I'm waiting for
> > > feebacks
> > > > > and
> > > > > > > > > possible improvements.
> > > > > > > > >
> > > > > > > > > Regards
> > > > > > > > > Gael Magnan de Bornier
> > > > > > > > >
> > > > > > > > --
> > > > > > > >   _/
> > > > > > > > _/ Alex Van Boxel
> > > > > > > >
> > > > > > >
> > > > > > --
> > > > > >   _/
> > > > > > _/ Alex Van Boxel
> > > > > >
> > > > >
> > > > --
> > > >   _/
> > > > _/ Alex Van Boxel
> > > >
> > >
> >
>
>

Re: Refactoring Connection

Posted by Gael Magnan <ga...@gmail.com>.
Thanks for all the information,

currently what I have is working iso with what was existing, just need to
add some doc and tests, but I have a few questions.

Currently airflow/urtils/db.py initdb function creates a lot of connections
by default.
I understand that they are used for the tests, but why are they created
even on production environment?
Looking into that made me realize that we can create connections of a type
that is not defined, (ex: beeline, aws, ...) by code just like you said but
it's to the best of my knowledge impossible with the UI.

By the way, should I create the beeline and aws connection types? So
everyone can create them through the UI?

I've created a JIRA issue:https://issues.apache.org/jira/browse/AIRFLOW-235,
and a first draft of the PR:
https://github.com/apache/incubator-airflow/pull/1981, more tests and doc
coming + all the changes depending on where this discussion takes us.


Le mar. 10 janv. 2017 à 21:55, Maxime Beauchemin <ma...@gmail.com>
a écrit :

> Note that the `Conection.extra` attribute was designed to store anything
> that is specific to a connection type. Here's an example of how it's used
> in the HiveHook:
>
> https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/hive_hooks.py#L75
>
> Also note that currently there's no enforcement of connection type that
> match a hook, so you can use any connection with any hook. So you could set
> a new connection for mongo with no type at all and still use your local
> MongoHook without changing the Airflow source code.
>
> The one part that is missing is the association from the connection to the
> hook here in Connection.get_hook
>
> https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L620
> This allows getting the hook from the connection, and allows for the Data
> Profiling section of the UI to know how to execute a query against that
> connection type.
>
> Max
>
> On Tue, Jan 10, 2017 at 2:58 AM, Gael Magnan <ga...@gmail.com> wrote:
>
> > @George Leslie-Waksman:
> >   It could be done, but since it's marked as deprecated everywhere and
> Alex
> > Van Boxel asked me tu used entry points I'm not sure we should do both.
> >
> > @Jeremiah Lowin:
> >   I was thinking not changing much of the api, juste the implementation
> > behind it.
> >   So for example, adding the mongodb connection type would just be doing
> > the following (assuming we have a MongodbHook):
> >   ```
> >   from airflow.models import BaseConnectionType
> >
> >   class MongodbConnectionType(BaseConnectionType):
> >       name = 'mongodb'
> >       description = 'MongoDB'
> >
> >       def get_hook(conn_id):
> >           try:
> >               from airflow.hooks.mongodb_hook import MongodbHook
> >               return MongodbHook(conn_id=conn_id)
> >           except:
> >               pass
> >   ```
> >
> >   and if we want to handle the query params (and we do) we could just do
> > something like overriding the parse_from_uri method (not final code):
> >   ```
> >       @classmethod
> >       def parse_from_uri(cls, uri):
> >           temp_uri = urlparse(uri)
> >           (host, schema, login, password, port,
> >             extras) = super().parse_from_uri(cls, uri)
> >           extras = dict([param.split('=', 1) for param in
> > temp_uri.query.split('&')])
> >           return host, schema, login, password, port, extras
> >   ```
> >
> >   So if you defined a connection using an uri the way you are able now,
> and
> > this URI started with mongodb,
> >   it would be extracted and a connection created.
> >
> >
> > @Maxime Beauchemin:
> >   By can't easily I meant that to had a connection type you have to go
> > through making a pr to airflow, you can't just use a plugin.
> >   I think all you have to do to add a new one is modify the Connection
> > class in models. But it's still hard coded.
> >
> >   I agree with you, we could easily use just a data structure for most of
> > the connections, especially the ones existing now,
> >   but it would make it harder to add a connection whose uri format is not
> > consistant with the other ones (not to mention ones that don't use uri),
> >   except if we also handle a string format that would parse the uri and
> set
> > the right fields.
> >
> > Le mar. 10 janv. 2017 à 01:35, Maxime Beauchemin <
> > maximebeauchemin@gmail.com>
> > a écrit :
> >
> > > About *"you can't **easily add a connection type without modifying the
> > > airflow code source*", can we start by listing out the places that need
> > to
> > > be touched up when adding a connection?
> > >
> > > Here's what I could find:
> > > https://github.com/apache/incubator-airflow/blob/master/
> > > airflow/models.py#L516
> > > https://github.com/apache/incubator-airflow/blob/master/
> > > airflow/utils/db.py#L109
> > > https://github.com/apache/incubator-airflow/blob/master/
> > > airflow/models.py#L620
> > >
> > > I didn't dig super deep but it seems like what we need is a
> configurable
> > > connection configuration blob that informs the connection type name,
> the
> > > verbose name and the related hook location.
> > >
> > > Maybe a solution is to have a `CONNECTION_TYPES` object in settings.py
> > that
> > > could be altered in different environments. This CONNECTION_TYPE would
> > > store the name, the verbose_name and the related hook's path as a
> string
> > > `airflow.hooks.mysql_hook.MySqlHook` or something like that. It may not
> > be
> > > that far from what you were proposing, and whether it should be a class
> > or
> > > a data structure is arguable, but I'd like to keep configuration
> elements
> > > easily serializable so I'd vote for a data structure using only
> > primitives.
> > >
> > > Max
> > >
> > > On Mon, Jan 9, 2017 at 5:13 AM, Alex Van Boxel <al...@vanboxel.be>
> wrote:
> > >
> > > > I was actually going to propose something different with
> entry-points,
> > > but
> > > > your requirement beat me to it (but that's ok :-). Actually I think
> > with
> > > > this mechanism people would be able to extend Airflow connection
> > > mechanism
> > > > (and later other stuff) by doing *pip install
> > > airflow-sexy-new-connection*
> > > > (for example).
> > > >
> > > > On Mon, Jan 9, 2017 at 1:39 PM Gael Magnan <ga...@gmail.com>
> > wrote:
> > > >
> > > > > Thank you for the read, I'm gonna look at it, it's probably gonna
> be
> > > > better
> > > > > that what I have.
> > > > >
> > > > > Point taken about the URI, I'll see if i can find something generic
> > > > enough
> > > > > to handle all those cases.
> > > > >
> > > > > Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <al...@vanboxel.be> a
> > > écrit
> > > > :
> > > > >
> > > > > > Thanks a lot, yes it clarifies a lot and I do agree you really
> need
> > > to
> > > > > hack
> > > > > > inside Airflow to add a Connection type. While you're working at
> > this
> > > > > could
> > > > > > you have a look at the standard python *entry-point mechanism*
> for
> > > > > > registering Connection types/components.
> > > > > >
> > > > > > A good read on this:
> > > > > >
> > > > > >
> > > > > http://docs.pylonsproject.org/projects/pylons-webframework/
> > > > en/latest/advanced_pylons/entry_points_and_plugins.html
> > > > > >
> > > > > > My first though would be that just by adding an entry to the
> > factory
> > > > > method
> > > > > > would be enough to register your Connection + ConnectionType and
> > UI.
> > > > > >
> > > > > > Also note that not everything works with a URI. The Google Cloud
> > > > > Connection
> > > > > > doesn't have one, it uses a secret key file stored on disk, so
> > don't
> > > > > force
> > > > > > every connection type to work with URI's.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <gaelmagnan@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Yes sure,
> > > > > > >
> > > > > > > The question was the following:
> > > > > > > "I was looking at the code of the connections, and I realized
> you
> > > > can't
> > > > > > > easily add a connection type without modifying the airflow code
> > > > > source. I
> > > > > > > wanted to create a mongodb connection type, but I think the
> best
> > > > > approche
> > > > > > > would be to refactor connections first. Thoughts anyone?"
> > > > > > >
> > > > > > > The answer of Bolke de Bruin was: "making it more generic would
> > be
> > > > > > > appreciated"
> > > > > > >
> > > > > > > So basically the way the code is set up actually every types of
> > > > > > connection
> > > > > > > existing is defined though a list in the Connection class. It
> > > > > implements
> > > > > > > exactly the same code for parsing uri to get connections info
> and
> > > > > doesn't
> > > > > > > allow for a simple way to get back the uri from the connection
> > > infos.
> > > > > > >
> > > > > > > I need to add a mongodb connection and a way to get it back as
> a
> > > uri,
> > > > > so
> > > > > > i
> > > > > > > could use an other type of connection and play around with that
> > or
> > > > > juste
> > > > > > > add one more hard coded connection type, but I though this
> might
> > be
> > > > > > > something that comes back regularly and having a simple way to
> > plug
> > > > in
> > > > > > new
> > > > > > > types of connection would make it easier for anyone to
> > contribute a
> > > > new
> > > > > > > connection type.
> > > > > > >
> > > > > > > Hope this clarifies my proposal.
> > > > > > >
> > > > > > > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <alex@vanboxel.be
> >
> > a
> > > > > écrit
> > > > > > :
> > > > > > >
> > > > > > > > Hey Gael,
> > > > > > > >
> > > > > > > > could you please recap the question here and provide some
> > > context.
> > > > > Not
> > > > > > > > everyone on the mailinglist is actively following Gitter,
> > > including
> > > > > me.
> > > > > > > > With some context it would be easier to give feedback.
> Thanks.
> > > > > > > >
> > > > > > > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <
> > > gaelmagnan@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > following my question on gitter the other day and the
> > response
> > > > from
> > > > > > > Bolke
> > > > > > > > > de Bruin, I've started working on refactoring the
> connections
> > > in
> > > > > > > airflow.
> > > > > > > > >
> > > > > > > > > Before submitting a PR I wanted to share my proposal with
> you
> > > and
> > > > > get
> > > > > > > > > feedbacks.
> > > > > > > > >
> > > > > > > > > The idea is quite simple, I've divided the Connection class
> > in
> > > > two,
> > > > > > > > > Connection and ConnectionType, connection has the same
> > > interface
> > > > it
> > > > > > had
> > > > > > > > > before plus a few methods, but the class keeps a reference
> > to a
> > > > > > > > dictionary
> > > > > > > > > of registered ConnectionType. It delegates the work of
> > parsing
> > > > from
> > > > > > > URI,
> > > > > > > > > formatting to URI (added) and getting the hook to the
> > > > > ConnectionType
> > > > > > > > > associated with the conn_type.
> > > > > > > > >
> > > > > > > > > I've thought of two ways of registering new
> ConnectionTypes,
> > > the
> > > > > > first
> > > > > > > is
> > > > > > > > > making the BaseConnectionType use a metaclass that
> registered
> > > any
> > > > > new
> > > > > > > > > ConnectionType with Connection when the class is declared,
> it
> > > > would
> > > > > > > > require
> > > > > > > > > the less work to extend the connection module, as just
> > > importing
> > > > > the
> > > > > > > file
> > > > > > > > > with the connection would do the trick.
> > > > > > > > > The second one is juste to have a function/classmethod that
> > you
> > > > > call
> > > > > > > > > manually to register your connection. It would be simpler
> to
> > > > > > understand
> > > > > > > > but
> > > > > > > > > requires more work every time you create a new
> > ConnectionType.
> > > > > > > > >
> > > > > > > > > Hope this proposal is clear enough, and I'm waiting for
> > > feebacks
> > > > > and
> > > > > > > > > possible improvements.
> > > > > > > > >
> > > > > > > > > Regards
> > > > > > > > > Gael Magnan de Bornier
> > > > > > > > >
> > > > > > > > --
> > > > > > > >   _/
> > > > > > > > _/ Alex Van Boxel
> > > > > > > >
> > > > > > >
> > > > > > --
> > > > > >   _/
> > > > > > _/ Alex Van Boxel
> > > > > >
> > > > >
> > > > --
> > > >   _/
> > > > _/ Alex Van Boxel
> > > >
> > >
> >
>

Re: Refactoring Connection

Posted by Maxime Beauchemin <ma...@gmail.com>.
Note that the `Conection.extra` attribute was designed to store anything
that is specific to a connection type. Here's an example of how it's used
in the HiveHook:
https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/hive_hooks.py#L75

Also note that currently there's no enforcement of connection type that
match a hook, so you can use any connection with any hook. So you could set
a new connection for mongo with no type at all and still use your local
MongoHook without changing the Airflow source code.

The one part that is missing is the association from the connection to the
hook here in Connection.get_hook
https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L620
This allows getting the hook from the connection, and allows for the Data
Profiling section of the UI to know how to execute a query against that
connection type.

Max

On Tue, Jan 10, 2017 at 2:58 AM, Gael Magnan <ga...@gmail.com> wrote:

> @George Leslie-Waksman:
>   It could be done, but since it's marked as deprecated everywhere and Alex
> Van Boxel asked me tu used entry points I'm not sure we should do both.
>
> @Jeremiah Lowin:
>   I was thinking not changing much of the api, juste the implementation
> behind it.
>   So for example, adding the mongodb connection type would just be doing
> the following (assuming we have a MongodbHook):
>   ```
>   from airflow.models import BaseConnectionType
>
>   class MongodbConnectionType(BaseConnectionType):
>       name = 'mongodb'
>       description = 'MongoDB'
>
>       def get_hook(conn_id):
>           try:
>               from airflow.hooks.mongodb_hook import MongodbHook
>               return MongodbHook(conn_id=conn_id)
>           except:
>               pass
>   ```
>
>   and if we want to handle the query params (and we do) we could just do
> something like overriding the parse_from_uri method (not final code):
>   ```
>       @classmethod
>       def parse_from_uri(cls, uri):
>           temp_uri = urlparse(uri)
>           (host, schema, login, password, port,
>             extras) = super().parse_from_uri(cls, uri)
>           extras = dict([param.split('=', 1) for param in
> temp_uri.query.split('&')])
>           return host, schema, login, password, port, extras
>   ```
>
>   So if you defined a connection using an uri the way you are able now, and
> this URI started with mongodb,
>   it would be extracted and a connection created.
>
>
> @Maxime Beauchemin:
>   By can't easily I meant that to had a connection type you have to go
> through making a pr to airflow, you can't just use a plugin.
>   I think all you have to do to add a new one is modify the Connection
> class in models. But it's still hard coded.
>
>   I agree with you, we could easily use just a data structure for most of
> the connections, especially the ones existing now,
>   but it would make it harder to add a connection whose uri format is not
> consistant with the other ones (not to mention ones that don't use uri),
>   except if we also handle a string format that would parse the uri and set
> the right fields.
>
> Le mar. 10 janv. 2017 à 01:35, Maxime Beauchemin <
> maximebeauchemin@gmail.com>
> a écrit :
>
> > About *"you can't **easily add a connection type without modifying the
> > airflow code source*", can we start by listing out the places that need
> to
> > be touched up when adding a connection?
> >
> > Here's what I could find:
> > https://github.com/apache/incubator-airflow/blob/master/
> > airflow/models.py#L516
> > https://github.com/apache/incubator-airflow/blob/master/
> > airflow/utils/db.py#L109
> > https://github.com/apache/incubator-airflow/blob/master/
> > airflow/models.py#L620
> >
> > I didn't dig super deep but it seems like what we need is a configurable
> > connection configuration blob that informs the connection type name, the
> > verbose name and the related hook location.
> >
> > Maybe a solution is to have a `CONNECTION_TYPES` object in settings.py
> that
> > could be altered in different environments. This CONNECTION_TYPE would
> > store the name, the verbose_name and the related hook's path as a string
> > `airflow.hooks.mysql_hook.MySqlHook` or something like that. It may not
> be
> > that far from what you were proposing, and whether it should be a class
> or
> > a data structure is arguable, but I'd like to keep configuration elements
> > easily serializable so I'd vote for a data structure using only
> primitives.
> >
> > Max
> >
> > On Mon, Jan 9, 2017 at 5:13 AM, Alex Van Boxel <al...@vanboxel.be> wrote:
> >
> > > I was actually going to propose something different with entry-points,
> > but
> > > your requirement beat me to it (but that's ok :-). Actually I think
> with
> > > this mechanism people would be able to extend Airflow connection
> > mechanism
> > > (and later other stuff) by doing *pip install
> > airflow-sexy-new-connection*
> > > (for example).
> > >
> > > On Mon, Jan 9, 2017 at 1:39 PM Gael Magnan <ga...@gmail.com>
> wrote:
> > >
> > > > Thank you for the read, I'm gonna look at it, it's probably gonna be
> > > better
> > > > that what I have.
> > > >
> > > > Point taken about the URI, I'll see if i can find something generic
> > > enough
> > > > to handle all those cases.
> > > >
> > > > Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <al...@vanboxel.be> a
> > écrit
> > > :
> > > >
> > > > > Thanks a lot, yes it clarifies a lot and I do agree you really need
> > to
> > > > hack
> > > > > inside Airflow to add a Connection type. While you're working at
> this
> > > > could
> > > > > you have a look at the standard python *entry-point mechanism* for
> > > > > registering Connection types/components.
> > > > >
> > > > > A good read on this:
> > > > >
> > > > >
> > > > http://docs.pylonsproject.org/projects/pylons-webframework/
> > > en/latest/advanced_pylons/entry_points_and_plugins.html
> > > > >
> > > > > My first though would be that just by adding an entry to the
> factory
> > > > method
> > > > > would be enough to register your Connection + ConnectionType and
> UI.
> > > > >
> > > > > Also note that not everything works with a URI. The Google Cloud
> > > > Connection
> > > > > doesn't have one, it uses a secret key file stored on disk, so
> don't
> > > > force
> > > > > every connection type to work with URI's.
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <ga...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Yes sure,
> > > > > >
> > > > > > The question was the following:
> > > > > > "I was looking at the code of the connections, and I realized you
> > > can't
> > > > > > easily add a connection type without modifying the airflow code
> > > > source. I
> > > > > > wanted to create a mongodb connection type, but I think the best
> > > > approche
> > > > > > would be to refactor connections first. Thoughts anyone?"
> > > > > >
> > > > > > The answer of Bolke de Bruin was: "making it more generic would
> be
> > > > > > appreciated"
> > > > > >
> > > > > > So basically the way the code is set up actually every types of
> > > > > connection
> > > > > > existing is defined though a list in the Connection class. It
> > > > implements
> > > > > > exactly the same code for parsing uri to get connections info and
> > > > doesn't
> > > > > > allow for a simple way to get back the uri from the connection
> > infos.
> > > > > >
> > > > > > I need to add a mongodb connection and a way to get it back as a
> > uri,
> > > > so
> > > > > i
> > > > > > could use an other type of connection and play around with that
> or
> > > > juste
> > > > > > add one more hard coded connection type, but I though this might
> be
> > > > > > something that comes back regularly and having a simple way to
> plug
> > > in
> > > > > new
> > > > > > types of connection would make it easier for anyone to
> contribute a
> > > new
> > > > > > connection type.
> > > > > >
> > > > > > Hope this clarifies my proposal.
> > > > > >
> > > > > > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <al...@vanboxel.be>
> a
> > > > écrit
> > > > > :
> > > > > >
> > > > > > > Hey Gael,
> > > > > > >
> > > > > > > could you please recap the question here and provide some
> > context.
> > > > Not
> > > > > > > everyone on the mailinglist is actively following Gitter,
> > including
> > > > me.
> > > > > > > With some context it would be easier to give feedback. Thanks.
> > > > > > >
> > > > > > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <
> > gaelmagnan@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > following my question on gitter the other day and the
> response
> > > from
> > > > > > Bolke
> > > > > > > > de Bruin, I've started working on refactoring the connections
> > in
> > > > > > airflow.
> > > > > > > >
> > > > > > > > Before submitting a PR I wanted to share my proposal with you
> > and
> > > > get
> > > > > > > > feedbacks.
> > > > > > > >
> > > > > > > > The idea is quite simple, I've divided the Connection class
> in
> > > two,
> > > > > > > > Connection and ConnectionType, connection has the same
> > interface
> > > it
> > > > > had
> > > > > > > > before plus a few methods, but the class keeps a reference
> to a
> > > > > > > dictionary
> > > > > > > > of registered ConnectionType. It delegates the work of
> parsing
> > > from
> > > > > > URI,
> > > > > > > > formatting to URI (added) and getting the hook to the
> > > > ConnectionType
> > > > > > > > associated with the conn_type.
> > > > > > > >
> > > > > > > > I've thought of two ways of registering new ConnectionTypes,
> > the
> > > > > first
> > > > > > is
> > > > > > > > making the BaseConnectionType use a metaclass that registered
> > any
> > > > new
> > > > > > > > ConnectionType with Connection when the class is declared, it
> > > would
> > > > > > > require
> > > > > > > > the less work to extend the connection module, as just
> > importing
> > > > the
> > > > > > file
> > > > > > > > with the connection would do the trick.
> > > > > > > > The second one is juste to have a function/classmethod that
> you
> > > > call
> > > > > > > > manually to register your connection. It would be simpler to
> > > > > understand
> > > > > > > but
> > > > > > > > requires more work every time you create a new
> ConnectionType.
> > > > > > > >
> > > > > > > > Hope this proposal is clear enough, and I'm waiting for
> > feebacks
> > > > and
> > > > > > > > possible improvements.
> > > > > > > >
> > > > > > > > Regards
> > > > > > > > Gael Magnan de Bornier
> > > > > > > >
> > > > > > > --
> > > > > > >   _/
> > > > > > > _/ Alex Van Boxel
> > > > > > >
> > > > > >
> > > > > --
> > > > >   _/
> > > > > _/ Alex Van Boxel
> > > > >
> > > >
> > > --
> > >   _/
> > > _/ Alex Van Boxel
> > >
> >
>

Re: Refactoring Connection

Posted by Gael Magnan <ga...@gmail.com>.
@George Leslie-Waksman:
  It could be done, but since it's marked as deprecated everywhere and Alex
Van Boxel asked me tu used entry points I'm not sure we should do both.

@Jeremiah Lowin:
  I was thinking not changing much of the api, juste the implementation
behind it.
  So for example, adding the mongodb connection type would just be doing
the following (assuming we have a MongodbHook):
  ```
  from airflow.models import BaseConnectionType

  class MongodbConnectionType(BaseConnectionType):
      name = 'mongodb'
      description = 'MongoDB'

      def get_hook(conn_id):
          try:
              from airflow.hooks.mongodb_hook import MongodbHook
              return MongodbHook(conn_id=conn_id)
          except:
              pass
  ```

  and if we want to handle the query params (and we do) we could just do
something like overriding the parse_from_uri method (not final code):
  ```
      @classmethod
      def parse_from_uri(cls, uri):
          temp_uri = urlparse(uri)
          (host, schema, login, password, port,
            extras) = super().parse_from_uri(cls, uri)
          extras = dict([param.split('=', 1) for param in
temp_uri.query.split('&')])
          return host, schema, login, password, port, extras
  ```

  So if you defined a connection using an uri the way you are able now, and
this URI started with mongodb,
  it would be extracted and a connection created.


@Maxime Beauchemin:
  By can't easily I meant that to had a connection type you have to go
through making a pr to airflow, you can't just use a plugin.
  I think all you have to do to add a new one is modify the Connection
class in models. But it's still hard coded.

  I agree with you, we could easily use just a data structure for most of
the connections, especially the ones existing now,
  but it would make it harder to add a connection whose uri format is not
consistant with the other ones (not to mention ones that don't use uri),
  except if we also handle a string format that would parse the uri and set
the right fields.

Le mar. 10 janv. 2017 à 01:35, Maxime Beauchemin <ma...@gmail.com>
a écrit :

> About *"you can't **easily add a connection type without modifying the
> airflow code source*", can we start by listing out the places that need to
> be touched up when adding a connection?
>
> Here's what I could find:
> https://github.com/apache/incubator-airflow/blob/master/
> airflow/models.py#L516
> https://github.com/apache/incubator-airflow/blob/master/
> airflow/utils/db.py#L109
> https://github.com/apache/incubator-airflow/blob/master/
> airflow/models.py#L620
>
> I didn't dig super deep but it seems like what we need is a configurable
> connection configuration blob that informs the connection type name, the
> verbose name and the related hook location.
>
> Maybe a solution is to have a `CONNECTION_TYPES` object in settings.py that
> could be altered in different environments. This CONNECTION_TYPE would
> store the name, the verbose_name and the related hook's path as a string
> `airflow.hooks.mysql_hook.MySqlHook` or something like that. It may not be
> that far from what you were proposing, and whether it should be a class or
> a data structure is arguable, but I'd like to keep configuration elements
> easily serializable so I'd vote for a data structure using only primitives.
>
> Max
>
> On Mon, Jan 9, 2017 at 5:13 AM, Alex Van Boxel <al...@vanboxel.be> wrote:
>
> > I was actually going to propose something different with entry-points,
> but
> > your requirement beat me to it (but that's ok :-). Actually I think with
> > this mechanism people would be able to extend Airflow connection
> mechanism
> > (and later other stuff) by doing *pip install
> airflow-sexy-new-connection*
> > (for example).
> >
> > On Mon, Jan 9, 2017 at 1:39 PM Gael Magnan <ga...@gmail.com> wrote:
> >
> > > Thank you for the read, I'm gonna look at it, it's probably gonna be
> > better
> > > that what I have.
> > >
> > > Point taken about the URI, I'll see if i can find something generic
> > enough
> > > to handle all those cases.
> > >
> > > Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <al...@vanboxel.be> a
> écrit
> > :
> > >
> > > > Thanks a lot, yes it clarifies a lot and I do agree you really need
> to
> > > hack
> > > > inside Airflow to add a Connection type. While you're working at this
> > > could
> > > > you have a look at the standard python *entry-point mechanism* for
> > > > registering Connection types/components.
> > > >
> > > > A good read on this:
> > > >
> > > >
> > > http://docs.pylonsproject.org/projects/pylons-webframework/
> > en/latest/advanced_pylons/entry_points_and_plugins.html
> > > >
> > > > My first though would be that just by adding an entry to the factory
> > > method
> > > > would be enough to register your Connection + ConnectionType and UI.
> > > >
> > > > Also note that not everything works with a URI. The Google Cloud
> > > Connection
> > > > doesn't have one, it uses a secret key file stored on disk, so don't
> > > force
> > > > every connection type to work with URI's.
> > > >
> > > >
> > > >
> > > > On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <ga...@gmail.com>
> > wrote:
> > > >
> > > > > Yes sure,
> > > > >
> > > > > The question was the following:
> > > > > "I was looking at the code of the connections, and I realized you
> > can't
> > > > > easily add a connection type without modifying the airflow code
> > > source. I
> > > > > wanted to create a mongodb connection type, but I think the best
> > > approche
> > > > > would be to refactor connections first. Thoughts anyone?"
> > > > >
> > > > > The answer of Bolke de Bruin was: "making it more generic would be
> > > > > appreciated"
> > > > >
> > > > > So basically the way the code is set up actually every types of
> > > > connection
> > > > > existing is defined though a list in the Connection class. It
> > > implements
> > > > > exactly the same code for parsing uri to get connections info and
> > > doesn't
> > > > > allow for a simple way to get back the uri from the connection
> infos.
> > > > >
> > > > > I need to add a mongodb connection and a way to get it back as a
> uri,
> > > so
> > > > i
> > > > > could use an other type of connection and play around with that or
> > > juste
> > > > > add one more hard coded connection type, but I though this might be
> > > > > something that comes back regularly and having a simple way to plug
> > in
> > > > new
> > > > > types of connection would make it easier for anyone to contribute a
> > new
> > > > > connection type.
> > > > >
> > > > > Hope this clarifies my proposal.
> > > > >
> > > > > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <al...@vanboxel.be> a
> > > écrit
> > > > :
> > > > >
> > > > > > Hey Gael,
> > > > > >
> > > > > > could you please recap the question here and provide some
> context.
> > > Not
> > > > > > everyone on the mailinglist is actively following Gitter,
> including
> > > me.
> > > > > > With some context it would be easier to give feedback. Thanks.
> > > > > >
> > > > > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <
> gaelmagnan@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > following my question on gitter the other day and the response
> > from
> > > > > Bolke
> > > > > > > de Bruin, I've started working on refactoring the connections
> in
> > > > > airflow.
> > > > > > >
> > > > > > > Before submitting a PR I wanted to share my proposal with you
> and
> > > get
> > > > > > > feedbacks.
> > > > > > >
> > > > > > > The idea is quite simple, I've divided the Connection class in
> > two,
> > > > > > > Connection and ConnectionType, connection has the same
> interface
> > it
> > > > had
> > > > > > > before plus a few methods, but the class keeps a reference to a
> > > > > > dictionary
> > > > > > > of registered ConnectionType. It delegates the work of parsing
> > from
> > > > > URI,
> > > > > > > formatting to URI (added) and getting the hook to the
> > > ConnectionType
> > > > > > > associated with the conn_type.
> > > > > > >
> > > > > > > I've thought of two ways of registering new ConnectionTypes,
> the
> > > > first
> > > > > is
> > > > > > > making the BaseConnectionType use a metaclass that registered
> any
> > > new
> > > > > > > ConnectionType with Connection when the class is declared, it
> > would
> > > > > > require
> > > > > > > the less work to extend the connection module, as just
> importing
> > > the
> > > > > file
> > > > > > > with the connection would do the trick.
> > > > > > > The second one is juste to have a function/classmethod that you
> > > call
> > > > > > > manually to register your connection. It would be simpler to
> > > > understand
> > > > > > but
> > > > > > > requires more work every time you create a new ConnectionType.
> > > > > > >
> > > > > > > Hope this proposal is clear enough, and I'm waiting for
> feebacks
> > > and
> > > > > > > possible improvements.
> > > > > > >
> > > > > > > Regards
> > > > > > > Gael Magnan de Bornier
> > > > > > >
> > > > > > --
> > > > > >   _/
> > > > > > _/ Alex Van Boxel
> > > > > >
> > > > >
> > > > --
> > > >   _/
> > > > _/ Alex Van Boxel
> > > >
> > >
> > --
> >   _/
> > _/ Alex Van Boxel
> >
>

Re: Refactoring Connection

Posted by Maxime Beauchemin <ma...@gmail.com>.
About *"you can't **easily add a connection type without modifying the
airflow code source*", can we start by listing out the places that need to
be touched up when adding a connection?

Here's what I could find:
https://github.com/apache/incubator-airflow/blob/master/
airflow/models.py#L516
https://github.com/apache/incubator-airflow/blob/master/
airflow/utils/db.py#L109
https://github.com/apache/incubator-airflow/blob/master/
airflow/models.py#L620

I didn't dig super deep but it seems like what we need is a configurable
connection configuration blob that informs the connection type name, the
verbose name and the related hook location.

Maybe a solution is to have a `CONNECTION_TYPES` object in settings.py that
could be altered in different environments. This CONNECTION_TYPE would
store the name, the verbose_name and the related hook's path as a string
`airflow.hooks.mysql_hook.MySqlHook` or something like that. It may not be
that far from what you were proposing, and whether it should be a class or
a data structure is arguable, but I'd like to keep configuration elements
easily serializable so I'd vote for a data structure using only primitives.

Max

On Mon, Jan 9, 2017 at 5:13 AM, Alex Van Boxel <al...@vanboxel.be> wrote:

> I was actually going to propose something different with entry-points, but
> your requirement beat me to it (but that's ok :-). Actually I think with
> this mechanism people would be able to extend Airflow connection mechanism
> (and later other stuff) by doing *pip install airflow-sexy-new-connection*
> (for example).
>
> On Mon, Jan 9, 2017 at 1:39 PM Gael Magnan <ga...@gmail.com> wrote:
>
> > Thank you for the read, I'm gonna look at it, it's probably gonna be
> better
> > that what I have.
> >
> > Point taken about the URI, I'll see if i can find something generic
> enough
> > to handle all those cases.
> >
> > Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <al...@vanboxel.be> a écrit
> :
> >
> > > Thanks a lot, yes it clarifies a lot and I do agree you really need to
> > hack
> > > inside Airflow to add a Connection type. While you're working at this
> > could
> > > you have a look at the standard python *entry-point mechanism* for
> > > registering Connection types/components.
> > >
> > > A good read on this:
> > >
> > >
> > http://docs.pylonsproject.org/projects/pylons-webframework/
> en/latest/advanced_pylons/entry_points_and_plugins.html
> > >
> > > My first though would be that just by adding an entry to the factory
> > method
> > > would be enough to register your Connection + ConnectionType and UI.
> > >
> > > Also note that not everything works with a URI. The Google Cloud
> > Connection
> > > doesn't have one, it uses a secret key file stored on disk, so don't
> > force
> > > every connection type to work with URI's.
> > >
> > >
> > >
> > > On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <ga...@gmail.com>
> wrote:
> > >
> > > > Yes sure,
> > > >
> > > > The question was the following:
> > > > "I was looking at the code of the connections, and I realized you
> can't
> > > > easily add a connection type without modifying the airflow code
> > source. I
> > > > wanted to create a mongodb connection type, but I think the best
> > approche
> > > > would be to refactor connections first. Thoughts anyone?"
> > > >
> > > > The answer of Bolke de Bruin was: "making it more generic would be
> > > > appreciated"
> > > >
> > > > So basically the way the code is set up actually every types of
> > > connection
> > > > existing is defined though a list in the Connection class. It
> > implements
> > > > exactly the same code for parsing uri to get connections info and
> > doesn't
> > > > allow for a simple way to get back the uri from the connection infos.
> > > >
> > > > I need to add a mongodb connection and a way to get it back as a uri,
> > so
> > > i
> > > > could use an other type of connection and play around with that or
> > juste
> > > > add one more hard coded connection type, but I though this might be
> > > > something that comes back regularly and having a simple way to plug
> in
> > > new
> > > > types of connection would make it easier for anyone to contribute a
> new
> > > > connection type.
> > > >
> > > > Hope this clarifies my proposal.
> > > >
> > > > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <al...@vanboxel.be> a
> > écrit
> > > :
> > > >
> > > > > Hey Gael,
> > > > >
> > > > > could you please recap the question here and provide some context.
> > Not
> > > > > everyone on the mailinglist is actively following Gitter, including
> > me.
> > > > > With some context it would be easier to give feedback. Thanks.
> > > > >
> > > > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <ga...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > following my question on gitter the other day and the response
> from
> > > > Bolke
> > > > > > de Bruin, I've started working on refactoring the connections in
> > > > airflow.
> > > > > >
> > > > > > Before submitting a PR I wanted to share my proposal with you and
> > get
> > > > > > feedbacks.
> > > > > >
> > > > > > The idea is quite simple, I've divided the Connection class in
> two,
> > > > > > Connection and ConnectionType, connection has the same interface
> it
> > > had
> > > > > > before plus a few methods, but the class keeps a reference to a
> > > > > dictionary
> > > > > > of registered ConnectionType. It delegates the work of parsing
> from
> > > > URI,
> > > > > > formatting to URI (added) and getting the hook to the
> > ConnectionType
> > > > > > associated with the conn_type.
> > > > > >
> > > > > > I've thought of two ways of registering new ConnectionTypes, the
> > > first
> > > > is
> > > > > > making the BaseConnectionType use a metaclass that registered any
> > new
> > > > > > ConnectionType with Connection when the class is declared, it
> would
> > > > > require
> > > > > > the less work to extend the connection module, as just importing
> > the
> > > > file
> > > > > > with the connection would do the trick.
> > > > > > The second one is juste to have a function/classmethod that you
> > call
> > > > > > manually to register your connection. It would be simpler to
> > > understand
> > > > > but
> > > > > > requires more work every time you create a new ConnectionType.
> > > > > >
> > > > > > Hope this proposal is clear enough, and I'm waiting for feebacks
> > and
> > > > > > possible improvements.
> > > > > >
> > > > > > Regards
> > > > > > Gael Magnan de Bornier
> > > > > >
> > > > > --
> > > > >   _/
> > > > > _/ Alex Van Boxel
> > > > >
> > > >
> > > --
> > >   _/
> > > _/ Alex Van Boxel
> > >
> >
> --
>   _/
> _/ Alex Van Boxel
>

Re: Refactoring Connection

Posted by Jeremiah Lowin <jl...@apache.org>.
I think this sounds like a great idea, though having not had to set up
connections in a while it's a little hard for me to picture (though I do
remember the pain of figuring out the very first Google Cloud connectors).
Could provide a little example?

For example, would I just define a connection as a string, like
"mongo://user@password/h.o.s.t:port/database" and trust a plugin to
register as a "mongohandler" and turn that URI into a connection? To Alex's
point, perhaps more complex needs could be handled as query parameters
(today they are just handwritten JSON in "extras", so that's not much of a
change): "gcp://project?keyfile=path/to/keyfile" Just shooting out some
ideas...

Thanks!

On Mon, Jan 9, 2017 at 3:44 PM George Leslie-Waksman
<ge...@cloverhealth.com.invalid> wrote:

Could registering new types be handled through the plugin infrastructure?

On Mon, Jan 9, 2017 at 5:14 AM Alex Van Boxel <al...@vanboxel.be> wrote:

> I was actually going to propose something different with entry-points, but
> your requirement beat me to it (but that's ok :-). Actually I think with
> this mechanism people would be able to extend Airflow connection mechanism
> (and later other stuff) by doing *pip install airflow-sexy-new-connection*
> (for example).
>
> On Mon, Jan 9, 2017 at 1:39 PM Gael Magnan <ga...@gmail.com> wrote:
>
> > Thank you for the read, I'm gonna look at it, it's probably gonna be
> better
> > that what I have.
> >
> > Point taken about the URI, I'll see if i can find something generic
> enough
> > to handle all those cases.
> >
> > Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <al...@vanboxel.be> a écrit
> :
> >
> > > Thanks a lot, yes it clarifies a lot and I do agree you really need to
> > hack
> > > inside Airflow to add a Connection type. While you're working at this
> > could
> > > you have a look at the standard python *entry-point mechanism* for
> > > registering Connection types/components.
> > >
> > > A good read on this:
> > >
> > >
> >
>
http://docs.pylonsproject.org/projects/pylons-webframework/en/latest/advanced_pylons/entry_points_and_plugins.html
> > >
> > > My first though would be that just by adding an entry to the factory
> > method
> > > would be enough to register your Connection + ConnectionType and UI.
> > >
> > > Also note that not everything works with a URI. The Google Cloud
> > Connection
> > > doesn't have one, it uses a secret key file stored on disk, so don't
> > force
> > > every connection type to work with URI's.
> > >
> > >
> > >
> > > On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <ga...@gmail.com>
> wrote:
> > >
> > > > Yes sure,
> > > >
> > > > The question was the following:
> > > > "I was looking at the code of the connections, and I realized you
> can't
> > > > easily add a connection type without modifying the airflow code
> > source. I
> > > > wanted to create a mongodb connection type, but I think the best
> > approche
> > > > would be to refactor connections first. Thoughts anyone?"
> > > >
> > > > The answer of Bolke de Bruin was: "making it more generic would be
> > > > appreciated"
> > > >
> > > > So basically the way the code is set up actually every types of
> > > connection
> > > > existing is defined though a list in the Connection class. It
> > implements
> > > > exactly the same code for parsing uri to get connections info and
> > doesn't
> > > > allow for a simple way to get back the uri from the connection
infos.
> > > >
> > > > I need to add a mongodb connection and a way to get it back as a
uri,
> > so
> > > i
> > > > could use an other type of connection and play around with that or
> > juste
> > > > add one more hard coded connection type, but I though this might be
> > > > something that comes back regularly and having a simple way to plug
> in
> > > new
> > > > types of connection would make it easier for anyone to contribute a
> new
> > > > connection type.
> > > >
> > > > Hope this clarifies my proposal.
> > > >
> > > > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <al...@vanboxel.be> a
> > écrit
> > > :
> > > >
> > > > > Hey Gael,
> > > > >
> > > > > could you please recap the question here and provide some context.
> > Not
> > > > > everyone on the mailinglist is actively following Gitter,
including
> > me.
> > > > > With some context it would be easier to give feedback. Thanks.
> > > > >
> > > > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <ga...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > following my question on gitter the other day and the response
> from
> > > > Bolke
> > > > > > de Bruin, I've started working on refactoring the connections in
> > > > airflow.
> > > > > >
> > > > > > Before submitting a PR I wanted to share my proposal with you
and
> > get
> > > > > > feedbacks.
> > > > > >
> > > > > > The idea is quite simple, I've divided the Connection class in
> two,
> > > > > > Connection and ConnectionType, connection has the same interface
> it
> > > had
> > > > > > before plus a few methods, but the class keeps a reference to a
> > > > > dictionary
> > > > > > of registered ConnectionType. It delegates the work of parsing
> from
> > > > URI,
> > > > > > formatting to URI (added) and getting the hook to the
> > ConnectionType
> > > > > > associated with the conn_type.
> > > > > >
> > > > > > I've thought of two ways of registering new ConnectionTypes, the
> > > first
> > > > is
> > > > > > making the BaseConnectionType use a metaclass that registered
any
> > new
> > > > > > ConnectionType with Connection when the class is declared, it
> would
> > > > > require
> > > > > > the less work to extend the connection module, as just importing
> > the
> > > > file
> > > > > > with the connection would do the trick.
> > > > > > The second one is juste to have a function/classmethod that you
> > call
> > > > > > manually to register your connection. It would be simpler to
> > > understand
> > > > > but
> > > > > > requires more work every time you create a new ConnectionType.
> > > > > >
> > > > > > Hope this proposal is clear enough, and I'm waiting for feebacks
> > and
> > > > > > possible improvements.
> > > > > >
> > > > > > Regards
> > > > > > Gael Magnan de Bornier
> > > > > >
> > > > > --
> > > > >   _/
> > > > > _/ Alex Van Boxel
> > > > >
> > > >
> > > --
> > >   _/
> > > _/ Alex Van Boxel
> > >
> >
> --
>   _/
> _/ Alex Van Boxel
>

Re: Refactoring Connection

Posted by George Leslie-Waksman <ge...@cloverhealth.com.INVALID>.
Could registering new types be handled through the plugin infrastructure?

On Mon, Jan 9, 2017 at 5:14 AM Alex Van Boxel <al...@vanboxel.be> wrote:

> I was actually going to propose something different with entry-points, but
> your requirement beat me to it (but that's ok :-). Actually I think with
> this mechanism people would be able to extend Airflow connection mechanism
> (and later other stuff) by doing *pip install airflow-sexy-new-connection*
> (for example).
>
> On Mon, Jan 9, 2017 at 1:39 PM Gael Magnan <ga...@gmail.com> wrote:
>
> > Thank you for the read, I'm gonna look at it, it's probably gonna be
> better
> > that what I have.
> >
> > Point taken about the URI, I'll see if i can find something generic
> enough
> > to handle all those cases.
> >
> > Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <al...@vanboxel.be> a écrit
> :
> >
> > > Thanks a lot, yes it clarifies a lot and I do agree you really need to
> > hack
> > > inside Airflow to add a Connection type. While you're working at this
> > could
> > > you have a look at the standard python *entry-point mechanism* for
> > > registering Connection types/components.
> > >
> > > A good read on this:
> > >
> > >
> >
> http://docs.pylonsproject.org/projects/pylons-webframework/en/latest/advanced_pylons/entry_points_and_plugins.html
> > >
> > > My first though would be that just by adding an entry to the factory
> > method
> > > would be enough to register your Connection + ConnectionType and UI.
> > >
> > > Also note that not everything works with a URI. The Google Cloud
> > Connection
> > > doesn't have one, it uses a secret key file stored on disk, so don't
> > force
> > > every connection type to work with URI's.
> > >
> > >
> > >
> > > On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <ga...@gmail.com>
> wrote:
> > >
> > > > Yes sure,
> > > >
> > > > The question was the following:
> > > > "I was looking at the code of the connections, and I realized you
> can't
> > > > easily add a connection type without modifying the airflow code
> > source. I
> > > > wanted to create a mongodb connection type, but I think the best
> > approche
> > > > would be to refactor connections first. Thoughts anyone?"
> > > >
> > > > The answer of Bolke de Bruin was: "making it more generic would be
> > > > appreciated"
> > > >
> > > > So basically the way the code is set up actually every types of
> > > connection
> > > > existing is defined though a list in the Connection class. It
> > implements
> > > > exactly the same code for parsing uri to get connections info and
> > doesn't
> > > > allow for a simple way to get back the uri from the connection infos.
> > > >
> > > > I need to add a mongodb connection and a way to get it back as a uri,
> > so
> > > i
> > > > could use an other type of connection and play around with that or
> > juste
> > > > add one more hard coded connection type, but I though this might be
> > > > something that comes back regularly and having a simple way to plug
> in
> > > new
> > > > types of connection would make it easier for anyone to contribute a
> new
> > > > connection type.
> > > >
> > > > Hope this clarifies my proposal.
> > > >
> > > > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <al...@vanboxel.be> a
> > écrit
> > > :
> > > >
> > > > > Hey Gael,
> > > > >
> > > > > could you please recap the question here and provide some context.
> > Not
> > > > > everyone on the mailinglist is actively following Gitter, including
> > me.
> > > > > With some context it would be easier to give feedback. Thanks.
> > > > >
> > > > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <ga...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > following my question on gitter the other day and the response
> from
> > > > Bolke
> > > > > > de Bruin, I've started working on refactoring the connections in
> > > > airflow.
> > > > > >
> > > > > > Before submitting a PR I wanted to share my proposal with you and
> > get
> > > > > > feedbacks.
> > > > > >
> > > > > > The idea is quite simple, I've divided the Connection class in
> two,
> > > > > > Connection and ConnectionType, connection has the same interface
> it
> > > had
> > > > > > before plus a few methods, but the class keeps a reference to a
> > > > > dictionary
> > > > > > of registered ConnectionType. It delegates the work of parsing
> from
> > > > URI,
> > > > > > formatting to URI (added) and getting the hook to the
> > ConnectionType
> > > > > > associated with the conn_type.
> > > > > >
> > > > > > I've thought of two ways of registering new ConnectionTypes, the
> > > first
> > > > is
> > > > > > making the BaseConnectionType use a metaclass that registered any
> > new
> > > > > > ConnectionType with Connection when the class is declared, it
> would
> > > > > require
> > > > > > the less work to extend the connection module, as just importing
> > the
> > > > file
> > > > > > with the connection would do the trick.
> > > > > > The second one is juste to have a function/classmethod that you
> > call
> > > > > > manually to register your connection. It would be simpler to
> > > understand
> > > > > but
> > > > > > requires more work every time you create a new ConnectionType.
> > > > > >
> > > > > > Hope this proposal is clear enough, and I'm waiting for feebacks
> > and
> > > > > > possible improvements.
> > > > > >
> > > > > > Regards
> > > > > > Gael Magnan de Bornier
> > > > > >
> > > > > --
> > > > >   _/
> > > > > _/ Alex Van Boxel
> > > > >
> > > >
> > > --
> > >   _/
> > > _/ Alex Van Boxel
> > >
> >
> --
>   _/
> _/ Alex Van Boxel
>

Re: Refactoring Connection

Posted by Alex Van Boxel <al...@vanboxel.be>.
I was actually going to propose something different with entry-points, but
your requirement beat me to it (but that's ok :-). Actually I think with
this mechanism people would be able to extend Airflow connection mechanism
(and later other stuff) by doing *pip install airflow-sexy-new-connection*
(for example).

On Mon, Jan 9, 2017 at 1:39 PM Gael Magnan <ga...@gmail.com> wrote:

> Thank you for the read, I'm gonna look at it, it's probably gonna be better
> that what I have.
>
> Point taken about the URI, I'll see if i can find something generic enough
> to handle all those cases.
>
> Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <al...@vanboxel.be> a écrit :
>
> > Thanks a lot, yes it clarifies a lot and I do agree you really need to
> hack
> > inside Airflow to add a Connection type. While you're working at this
> could
> > you have a look at the standard python *entry-point mechanism* for
> > registering Connection types/components.
> >
> > A good read on this:
> >
> >
> http://docs.pylonsproject.org/projects/pylons-webframework/en/latest/advanced_pylons/entry_points_and_plugins.html
> >
> > My first though would be that just by adding an entry to the factory
> method
> > would be enough to register your Connection + ConnectionType and UI.
> >
> > Also note that not everything works with a URI. The Google Cloud
> Connection
> > doesn't have one, it uses a secret key file stored on disk, so don't
> force
> > every connection type to work with URI's.
> >
> >
> >
> > On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <ga...@gmail.com> wrote:
> >
> > > Yes sure,
> > >
> > > The question was the following:
> > > "I was looking at the code of the connections, and I realized you can't
> > > easily add a connection type without modifying the airflow code
> source. I
> > > wanted to create a mongodb connection type, but I think the best
> approche
> > > would be to refactor connections first. Thoughts anyone?"
> > >
> > > The answer of Bolke de Bruin was: "making it more generic would be
> > > appreciated"
> > >
> > > So basically the way the code is set up actually every types of
> > connection
> > > existing is defined though a list in the Connection class. It
> implements
> > > exactly the same code for parsing uri to get connections info and
> doesn't
> > > allow for a simple way to get back the uri from the connection infos.
> > >
> > > I need to add a mongodb connection and a way to get it back as a uri,
> so
> > i
> > > could use an other type of connection and play around with that or
> juste
> > > add one more hard coded connection type, but I though this might be
> > > something that comes back regularly and having a simple way to plug in
> > new
> > > types of connection would make it easier for anyone to contribute a new
> > > connection type.
> > >
> > > Hope this clarifies my proposal.
> > >
> > > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <al...@vanboxel.be> a
> écrit
> > :
> > >
> > > > Hey Gael,
> > > >
> > > > could you please recap the question here and provide some context.
> Not
> > > > everyone on the mailinglist is actively following Gitter, including
> me.
> > > > With some context it would be easier to give feedback. Thanks.
> > > >
> > > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <ga...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > following my question on gitter the other day and the response from
> > > Bolke
> > > > > de Bruin, I've started working on refactoring the connections in
> > > airflow.
> > > > >
> > > > > Before submitting a PR I wanted to share my proposal with you and
> get
> > > > > feedbacks.
> > > > >
> > > > > The idea is quite simple, I've divided the Connection class in two,
> > > > > Connection and ConnectionType, connection has the same interface it
> > had
> > > > > before plus a few methods, but the class keeps a reference to a
> > > > dictionary
> > > > > of registered ConnectionType. It delegates the work of parsing from
> > > URI,
> > > > > formatting to URI (added) and getting the hook to the
> ConnectionType
> > > > > associated with the conn_type.
> > > > >
> > > > > I've thought of two ways of registering new ConnectionTypes, the
> > first
> > > is
> > > > > making the BaseConnectionType use a metaclass that registered any
> new
> > > > > ConnectionType with Connection when the class is declared, it would
> > > > require
> > > > > the less work to extend the connection module, as just importing
> the
> > > file
> > > > > with the connection would do the trick.
> > > > > The second one is juste to have a function/classmethod that you
> call
> > > > > manually to register your connection. It would be simpler to
> > understand
> > > > but
> > > > > requires more work every time you create a new ConnectionType.
> > > > >
> > > > > Hope this proposal is clear enough, and I'm waiting for feebacks
> and
> > > > > possible improvements.
> > > > >
> > > > > Regards
> > > > > Gael Magnan de Bornier
> > > > >
> > > > --
> > > >   _/
> > > > _/ Alex Van Boxel
> > > >
> > >
> > --
> >   _/
> > _/ Alex Van Boxel
> >
>
-- 
  _/
_/ Alex Van Boxel

Re: Refactoring Connection

Posted by Gael Magnan <ga...@gmail.com>.
Thank you for the read, I'm gonna look at it, it's probably gonna be better
that what I have.

Point taken about the URI, I'll see if i can find something generic enough
to handle all those cases.

Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <al...@vanboxel.be> a écrit :

> Thanks a lot, yes it clarifies a lot and I do agree you really need to hack
> inside Airflow to add a Connection type. While you're working at this could
> you have a look at the standard python *entry-point mechanism* for
> registering Connection types/components.
>
> A good read on this:
>
> http://docs.pylonsproject.org/projects/pylons-webframework/en/latest/advanced_pylons/entry_points_and_plugins.html
>
> My first though would be that just by adding an entry to the factory method
> would be enough to register your Connection + ConnectionType and UI.
>
> Also note that not everything works with a URI. The Google Cloud Connection
> doesn't have one, it uses a secret key file stored on disk, so don't force
> every connection type to work with URI's.
>
>
>
> On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <ga...@gmail.com> wrote:
>
> > Yes sure,
> >
> > The question was the following:
> > "I was looking at the code of the connections, and I realized you can't
> > easily add a connection type without modifying the airflow code source. I
> > wanted to create a mongodb connection type, but I think the best approche
> > would be to refactor connections first. Thoughts anyone?"
> >
> > The answer of Bolke de Bruin was: "making it more generic would be
> > appreciated"
> >
> > So basically the way the code is set up actually every types of
> connection
> > existing is defined though a list in the Connection class. It implements
> > exactly the same code for parsing uri to get connections info and doesn't
> > allow for a simple way to get back the uri from the connection infos.
> >
> > I need to add a mongodb connection and a way to get it back as a uri, so
> i
> > could use an other type of connection and play around with that or juste
> > add one more hard coded connection type, but I though this might be
> > something that comes back regularly and having a simple way to plug in
> new
> > types of connection would make it easier for anyone to contribute a new
> > connection type.
> >
> > Hope this clarifies my proposal.
> >
> > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <al...@vanboxel.be> a écrit
> :
> >
> > > Hey Gael,
> > >
> > > could you please recap the question here and provide some context. Not
> > > everyone on the mailinglist is actively following Gitter, including me.
> > > With some context it would be easier to give feedback. Thanks.
> > >
> > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <ga...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > following my question on gitter the other day and the response from
> > Bolke
> > > > de Bruin, I've started working on refactoring the connections in
> > airflow.
> > > >
> > > > Before submitting a PR I wanted to share my proposal with you and get
> > > > feedbacks.
> > > >
> > > > The idea is quite simple, I've divided the Connection class in two,
> > > > Connection and ConnectionType, connection has the same interface it
> had
> > > > before plus a few methods, but the class keeps a reference to a
> > > dictionary
> > > > of registered ConnectionType. It delegates the work of parsing from
> > URI,
> > > > formatting to URI (added) and getting the hook to the ConnectionType
> > > > associated with the conn_type.
> > > >
> > > > I've thought of two ways of registering new ConnectionTypes, the
> first
> > is
> > > > making the BaseConnectionType use a metaclass that registered any new
> > > > ConnectionType with Connection when the class is declared, it would
> > > require
> > > > the less work to extend the connection module, as just importing the
> > file
> > > > with the connection would do the trick.
> > > > The second one is juste to have a function/classmethod that you call
> > > > manually to register your connection. It would be simpler to
> understand
> > > but
> > > > requires more work every time you create a new ConnectionType.
> > > >
> > > > Hope this proposal is clear enough, and I'm waiting for feebacks and
> > > > possible improvements.
> > > >
> > > > Regards
> > > > Gael Magnan de Bornier
> > > >
> > > --
> > >   _/
> > > _/ Alex Van Boxel
> > >
> >
> --
>   _/
> _/ Alex Van Boxel
>

Re: Refactoring Connection

Posted by Alex Van Boxel <al...@vanboxel.be>.
Thanks a lot, yes it clarifies a lot and I do agree you really need to hack
inside Airflow to add a Connection type. While you're working at this could
you have a look at the standard python *entry-point mechanism* for
registering Connection types/components.

A good read on this:
http://docs.pylonsproject.org/projects/pylons-webframework/en/latest/advanced_pylons/entry_points_and_plugins.html

My first though would be that just by adding an entry to the factory method
would be enough to register your Connection + ConnectionType and UI.

Also note that not everything works with a URI. The Google Cloud Connection
doesn't have one, it uses a secret key file stored on disk, so don't force
every connection type to work with URI's.



On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <ga...@gmail.com> wrote:

> Yes sure,
>
> The question was the following:
> "I was looking at the code of the connections, and I realized you can't
> easily add a connection type without modifying the airflow code source. I
> wanted to create a mongodb connection type, but I think the best approche
> would be to refactor connections first. Thoughts anyone?"
>
> The answer of Bolke de Bruin was: "making it more generic would be
> appreciated"
>
> So basically the way the code is set up actually every types of connection
> existing is defined though a list in the Connection class. It implements
> exactly the same code for parsing uri to get connections info and doesn't
> allow for a simple way to get back the uri from the connection infos.
>
> I need to add a mongodb connection and a way to get it back as a uri, so i
> could use an other type of connection and play around with that or juste
> add one more hard coded connection type, but I though this might be
> something that comes back regularly and having a simple way to plug in new
> types of connection would make it easier for anyone to contribute a new
> connection type.
>
> Hope this clarifies my proposal.
>
> Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <al...@vanboxel.be> a écrit :
>
> > Hey Gael,
> >
> > could you please recap the question here and provide some context. Not
> > everyone on the mailinglist is actively following Gitter, including me.
> > With some context it would be easier to give feedback. Thanks.
> >
> > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <ga...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > following my question on gitter the other day and the response from
> Bolke
> > > de Bruin, I've started working on refactoring the connections in
> airflow.
> > >
> > > Before submitting a PR I wanted to share my proposal with you and get
> > > feedbacks.
> > >
> > > The idea is quite simple, I've divided the Connection class in two,
> > > Connection and ConnectionType, connection has the same interface it had
> > > before plus a few methods, but the class keeps a reference to a
> > dictionary
> > > of registered ConnectionType. It delegates the work of parsing from
> URI,
> > > formatting to URI (added) and getting the hook to the ConnectionType
> > > associated with the conn_type.
> > >
> > > I've thought of two ways of registering new ConnectionTypes, the first
> is
> > > making the BaseConnectionType use a metaclass that registered any new
> > > ConnectionType with Connection when the class is declared, it would
> > require
> > > the less work to extend the connection module, as just importing the
> file
> > > with the connection would do the trick.
> > > The second one is juste to have a function/classmethod that you call
> > > manually to register your connection. It would be simpler to understand
> > but
> > > requires more work every time you create a new ConnectionType.
> > >
> > > Hope this proposal is clear enough, and I'm waiting for feebacks and
> > > possible improvements.
> > >
> > > Regards
> > > Gael Magnan de Bornier
> > >
> > --
> >   _/
> > _/ Alex Van Boxel
> >
>
-- 
  _/
_/ Alex Van Boxel

Re: Refactoring Connection

Posted by Gael Magnan <ga...@gmail.com>.
Yes sure,

The question was the following:
"I was looking at the code of the connections, and I realized you can't
easily add a connection type without modifying the airflow code source. I
wanted to create a mongodb connection type, but I think the best approche
would be to refactor connections first. Thoughts anyone?"

The answer of Bolke de Bruin was: "making it more generic would be
appreciated"

So basically the way the code is set up actually every types of connection
existing is defined though a list in the Connection class. It implements
exactly the same code for parsing uri to get connections info and doesn't
allow for a simple way to get back the uri from the connection infos.

I need to add a mongodb connection and a way to get it back as a uri, so i
could use an other type of connection and play around with that or juste
add one more hard coded connection type, but I though this might be
something that comes back regularly and having a simple way to plug in new
types of connection would make it easier for anyone to contribute a new
connection type.

Hope this clarifies my proposal.

Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <al...@vanboxel.be> a écrit :

> Hey Gael,
>
> could you please recap the question here and provide some context. Not
> everyone on the mailinglist is actively following Gitter, including me.
> With some context it would be easier to give feedback. Thanks.
>
> On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <ga...@gmail.com> wrote:
>
> > Hi,
> >
> > following my question on gitter the other day and the response from Bolke
> > de Bruin, I've started working on refactoring the connections in airflow.
> >
> > Before submitting a PR I wanted to share my proposal with you and get
> > feedbacks.
> >
> > The idea is quite simple, I've divided the Connection class in two,
> > Connection and ConnectionType, connection has the same interface it had
> > before plus a few methods, but the class keeps a reference to a
> dictionary
> > of registered ConnectionType. It delegates the work of parsing from URI,
> > formatting to URI (added) and getting the hook to the ConnectionType
> > associated with the conn_type.
> >
> > I've thought of two ways of registering new ConnectionTypes, the first is
> > making the BaseConnectionType use a metaclass that registered any new
> > ConnectionType with Connection when the class is declared, it would
> require
> > the less work to extend the connection module, as just importing the file
> > with the connection would do the trick.
> > The second one is juste to have a function/classmethod that you call
> > manually to register your connection. It would be simpler to understand
> but
> > requires more work every time you create a new ConnectionType.
> >
> > Hope this proposal is clear enough, and I'm waiting for feebacks and
> > possible improvements.
> >
> > Regards
> > Gael Magnan de Bornier
> >
> --
>   _/
> _/ Alex Van Boxel
>

Re: Refactoring Connection

Posted by Alex Van Boxel <al...@vanboxel.be>.
Hey Gael,

could you please recap the question here and provide some context. Not
everyone on the mailinglist is actively following Gitter, including me.
With some context it would be easier to give feedback. Thanks.

On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan <ga...@gmail.com> wrote:

> Hi,
>
> following my question on gitter the other day and the response from Bolke
> de Bruin, I've started working on refactoring the connections in airflow.
>
> Before submitting a PR I wanted to share my proposal with you and get
> feedbacks.
>
> The idea is quite simple, I've divided the Connection class in two,
> Connection and ConnectionType, connection has the same interface it had
> before plus a few methods, but the class keeps a reference to a dictionary
> of registered ConnectionType. It delegates the work of parsing from URI,
> formatting to URI (added) and getting the hook to the ConnectionType
> associated with the conn_type.
>
> I've thought of two ways of registering new ConnectionTypes, the first is
> making the BaseConnectionType use a metaclass that registered any new
> ConnectionType with Connection when the class is declared, it would require
> the less work to extend the connection module, as just importing the file
> with the connection would do the trick.
> The second one is juste to have a function/classmethod that you call
> manually to register your connection. It would be simpler to understand but
> requires more work every time you create a new ConnectionType.
>
> Hope this proposal is clear enough, and I'm waiting for feebacks and
> possible improvements.
>
> Regards
> Gael Magnan de Bornier
>
-- 
  _/
_/ Alex Van Boxel