You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Song Liu <so...@outlook.com> on 2018/04/24 08:48:28 UTC

About the project support in Airflow

Hi,

Basically the DAGs are created for a project purpose, so if I have many different projects, will the Airflow support the Project concept and organize them separately ?

Is this a known requirement or any plan for this already ?

Thanks,
Song

Re: About the project support in Airflow

Posted by "刘松 (Cycle++开发组)" <li...@megvii.com>.
Hi James,

Yes, the “multi-user” feature is kind of way to exhaust Airflow resources actually, but “multi-user” feature is import to let the Airflow to be more like as a service.
There will be more work to let Airflow be more scalable, but the direction looks to be promising.

Thanks,
Song

On 26/04/2018, 3:05 AM, "James Meickle" <jm...@quantopian.com> wrote:

    Another reason you would want separated infrastructure is that there are a
    lot of ways to exhaust Airflow resources or otherwise cause contention -
    like having too many sensors or sub-DAGs using up all available tasks.
    
    Doesn't seem like a great idea to push for having different teams with
    co-tenancy until there is also per-team control over resource use...
    
    On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <li...@megvii.com>
    wrote:
    
    > It seems that all the current approach is pointing to multiple instance of
    > airflow, but project concept is very nature since one user might to handle
    > different type of tasks.
    >
    > Another thing about the multiple user support, one way is also to deploy
    > multiple instance, but it seems that airflow is providing multiple user
    > function builtin.
    >
    > So I can not be convinced that using multiple instance for multiple
    > project purpose.
    >
    > Thanks,
    > Song
    >
    >
    >
    >
    > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <acehaidrey@gmail.com
    > <ma...@gmail.com>> wrote:
    >
    >
    > Looks neat Taylor!
    >
    > And regarding the original question, going off of what Maxime and Bolke
    > said, at Pandora, it made more sense for us to have an instance per team
    > since each team has its own system user for prod and the instance can run
    > all processes as that user. Alternatively you could have a super user that
    > can sudo as those other system users, and have many teams on a single
    > instance but that is a security concern (what if one team sudo's as the
    > other team and accidentally overwrites data - there is nothing stopping
    > them from doing it). It depends what your org set up is, but let me know if
    > there are any questions I can help with.
    >
    > Ace
    >
    >
    > > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
    > >
    > > We use a similar approach like Bolke mentioned with running multiple
    > > Airflow instances.
    > >
    > > I haven't read the Pandora article yet, but we have an Astronomer Open
    > > Edition (fully open source) that bundles similar tools like Prometheus,
    > > Grafana, Celery, etc with Airflow and a Docker Compose file if you're
    > > looking to get a setup like that up and running quickly.
    > >
    > > https://github.com/astronomerio/astronomer/blob/master/examples/airflow-
    > enterprise/docker-compose.yml
    > > https://github.com/astronomerio/astronomer
    > >
    > > *Taylor Edmiston*
    > > Blog  | Stack Overflow CV
    > >  | LinkedIn
    > >  | AngelList
    > >
    > >
    > >
    > > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
    > > maximebeauchemin@gmail.com> wrote:
    > >
    > >> Related blog post about multi-tenant Airflow deployment out of Pandora:
    > >> https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee
    > >>
    > >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
    > >> wrote:
    > >>
    > >>> My suggestion would be to deploy airflow per project. You could even
    > use
    > >>> airflow to manage your ci/cd pipeline.
    > >>>
    > >>> B.
    > >>>
    > >>> Sent from my iPhone
    > >>>
    > >>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
    > >> maximebeauchemin@gmail.com>
    > >>> wrote:
    > >>>>
    > >>>> People have been talking about namespacing DAGs in the past. I'd
    > >>> recommend
    > >>>> using tags (many to many) instead of categories/projects (one to
    > many).
    > >>>>
    > >>>> It should be fairly easy to add this feature. One question is whether
    > >>> tags
    > >>>> are defined as code or in the UI/db only.
    > >>>>
    > >>>> Max
    > >>>>
    > >>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
    > >> wrote:
    > >>>>>
    > >>>>> Hi,
    > >>>>>
    > >>>>> Basically the DAGs are created for a project purpose, so if I have
    > >> many
    > >>>>> different projects, will the Airflow support the Project concept and
    > >>>>> organize them separately ?
    > >>>>>
    > >>>>> Is this a known requirement or any plan for this already ?
    > >>>>>
    > >>>>> Thanks,
    > >>>>> Song
    > >>>>>
    > >>>
    > >>
    >
    >
    >
    


答复: About the project support in Airflow

Posted by Song Liu <so...@outlook.com>.
Hi Taylor,

This would be one solution for the "project/folder" concept, so that if the user could create a DAG from the UI (in future ?) then user could name this DAG with "<project_name>/<dag_name>", from the UI it just like as tree view, yeah it is some kind of to solve my requirement. About the backend implementation is parsing the DAG path info or storing the project relationship in database, it's your implementation decision.

Thanks but I suggest make a survey about what does other user think about it.

At least the project concept is a real requirement for me.

Thanks,
Song
________________________________
发件人: Taylor Edmiston <te...@gmail.com>
发送时间: 2018年4月26日 15:49
收件人: dev@airflow.incubator.apache.org
主题: Re: About the project support in Airflow

We've discussed internally something like having groups or "folders" for
DAGs in the UI.  Nothing functional on the backend, purely a front end
aesthetic.  Something like having DAGs named "foo/bar" and "foo/baz" would
be grouped like a tree visually in the UI:

- Group foo
  - DAG bar
  - DAG baz

Is that what you're looking for?

Best,
Taylor

On Thu, Apr 26, 2018 at 1:51 AM 刘松(Cycle++开发组) <li...@megvii.com> wrote:

> Hi Feng,
>
> Thanks for your information, indeed I have noticed this work also.
>
> But if I am understanding correctly, it is focus on the permission
> (edit/read etc.) with the DAG itself.
>
> “project concept” is some kind of “Group” but it is more meaningful than
> the “Tag”, so if we don’t want to support “project concept”, is there any
> other solution for this requirement or any consideration behind ?
>
> Many thanks for help.
>
> Thanks,
> Song
>
> On 26/04/2018, 12:28 PM, "Tao Feng" <fe...@gmail.com> wrote:
>
>     Hi Song,
>
>     Just noted that we are also working on dag-level access on top of
>     RBAC(AIRFLOW-2267) which should provide dag-level acl functionality.
> The
>     WIP pr could be found at
>     https://github.com/apache/incubator-airflow/pull/3197
>
>     On Wed, Apr 25, 2018 at 7:42 PM, 刘松(Cycle++开发组) <li...@megvii.com>
>     wrote:
>
>     > Hi Taylor,
>     >
>     > Yes, I know that this RBAC feature would be released within the 1.10
>     > release.
>     >
>     > # About multi-user support
>     >
>     > But Why not deploy one instance of Airflow per user ? (
>     > With this feature, don’t you think that the Airflow is to be more
> likely
>     > as a platform to serve more different users.
>     > Also multi-user case would exhaust the Airflow resource more easily
> if we
>     > are talking the scalability capability of Airflow.
>     >
>     > # About multi-project support
>     >
>     > You could see the “project” concept is some kind of logical group of
> the
>     > DAGs to let the DAGs be organized more structural.
>     > I can’t see it will beat the “scalability” of Airflow somehow, it
> just let
>     > the user experience be more friendly I see.
>     >
>     > So that is why I want to use the “multi-user support” case to argue
> why
>     > suggest using multi-instance for “multi-project”,
>     > since that I think the “multi-user” support is kindly of pushing the
>     > Airflow in the way of “be more scalable”, but “multi-project” just
> be more
>     > intuitive and more user-experience friendly.
>     >
>     > Thanks,
>     > Song
>     >
>     > On 26/04/2018, 4:50 AM, "Taylor Edmiston" <te...@gmail.com>
> wrote:
>     >
>     >     Something else that might be relevant for your multi-user use
> case is
>     > the
>     >     new RBAC support that Joy Gao added.
>     >
>     >     https://github.com/apache/incubator-airflow/pull/3015
>     >
>     >     *Taylor Edmiston*
>     >     Blog <http://blog.tedmiston.com> | Stack Overflow CV
>     >     <https://stackoverflow.com/story/taylor> | LinkedIn
>     >     <https://www.linkedin.com/in/tedmiston/> | AngelList
>     >     <https://angel.co/taylor>
>     >
>     >
>     >     On Wed, Apr 25, 2018 at 3:04 PM, James Meickle <
>     > jmeickle@quantopian.com>
>     >     wrote:
>     >
>     >     > Another reason you would want separated infrastructure is that
> there
>     > are a
>     >     > lot of ways to exhaust Airflow resources or otherwise cause
>     > contention -
>     >     > like having too many sensors or sub-DAGs using up all available
>     > tasks.
>     >     >
>     >     > Doesn't seem like a great idea to push for having different
> teams
>     > with
>     >     > co-tenancy until there is also per-team control over resource
> use...
>     >     >
>     >     > On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <
>     > liusong02@megvii.com>
>     >     > wrote:
>     >     >
>     >     > > It seems that all the current approach is pointing to
> multiple
>     > instance
>     >     > of
>     >     > > airflow, but project concept is very nature since one user
> might to
>     >     > handle
>     >     > > different type of tasks.
>     >     > >
>     >     > > Another thing about the multiple user support, one way is
> also to
>     > deploy
>     >     > > multiple instance, but it seems that airflow is providing
> multiple
>     > user
>     >     > > function builtin.
>     >     > >
>     >     > > So I can not be convinced that using multiple instance for
> multiple
>     >     > > project purpose.
>     >     > >
>     >     > > Thanks,
>     >     > > Song
>     >     > >
>     >     > >
>     >     > >
>     >     > >
>     >     > > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
>     >     > acehaidrey@gmail.com
>     >     > > <ma...@gmail.com>> wrote:
>     >     > >
>     >     > >
>     >     > > Looks neat Taylor!
>     >     > >
>     >     > > And regarding the original question, going off of what
> Maxime and
>     > Bolke
>     >     > > said, at Pandora, it made more sense for us to have an
> instance
>     > per team
>     >     > > since each team has its own system user for prod and the
> instance
>     > can run
>     >     > > all processes as that user. Alternatively you could have a
> super
>     > user
>     >     > that
>     >     > > can sudo as those other system users, and have many teams on
> a
>     > single
>     >     > > instance but that is a security concern (what if one team
> sudo's
>     > as the
>     >     > > other team and accidentally overwrites data - there is
> nothing
>     > stopping
>     >     > > them from doing it). It depends what your org set up is, but
> let
>     > me know
>     >     > if
>     >     > > there are any questions I can help with.
>     >     > >
>     >     > > Ace
>     >     > >
>     >     > >
>     >     > > > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
>     >     > > >
>     >     > > > We use a similar approach like Bolke mentioned with running
>     > multiple
>     >     > > > Airflow instances.
>     >     > > >
>     >     > > > I haven't read the Pandora article yet, but we have an
>     > Astronomer Open
>     >     > > > Edition (fully open source) that bundles similar tools like
>     > Prometheus,
>     >     > > > Grafana, Celery, etc with Airflow and a Docker Compose
> file if
>     > you're
>     >     > > > looking to get a setup like that up and running quickly.
>     >     > > >
>     >     > > > https://github.com/astronomerio/astronomer/blob/
>     >     > master/examples/airflow-
>     >     > > enterprise/docker-compose.yml
>     >     > > > https://github.com/astronomerio/astronomer
>     >     > > >
>     >     > > > *Taylor Edmiston*
>     >     > > > Blog  | Stack Overflow CV
>     >     > > >  | LinkedIn
>     >     > > >  | AngelList
>     >     > > >
>     >     > > >
>     >     > > >
>     >     > > > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
>     >     > > > maximebeauchemin@gmail.com> wrote:
>     >     > > >
>     >     > > >> Related blog post about multi-tenant Airflow deployment
> out of
>     >     > Pandora:
>     >     > > >>
> https://engineering.pandora.com/apache-airflow-at-pandora-
>     >     > 1d7a844d68ee
>     >     > > >>
>     >     > > >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
>     >     > > >> wrote:
>     >     > > >>
>     >     > > >>> My suggestion would be to deploy airflow per project. You
>     > could even
>     >     > > use
>     >     > > >>> airflow to manage your ci/cd pipeline.
>     >     > > >>>
>     >     > > >>> B.
>     >     > > >>>
>     >     > > >>> Sent from my iPhone
>     >     > > >>>
>     >     > > >>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
>     >     > > >> maximebeauchemin@gmail.com>
>     >     > > >>> wrote:
>     >     > > >>>>
>     >     > > >>>> People have been talking about namespacing DAGs in the
> past.
>     > I'd
>     >     > > >>> recommend
>     >     > > >>>> using tags (many to many) instead of
> categories/projects (one
>     > to
>     >     > > many).
>     >     > > >>>>
>     >     > > >>>> It should be fairly easy to add this feature. One
> question is
>     >     > whether
>     >     > > >>> tags
>     >     > > >>>> are defined as code or in the UI/db only.
>     >     > > >>>>
>     >     > > >>>> Max
>     >     > > >>>>
>     >     > > >>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
>     >     > > >> wrote:
>     >     > > >>>>>
>     >     > > >>>>> Hi,
>     >     > > >>>>>
>     >     > > >>>>> Basically the DAGs are created for a project purpose,
> so if
>     > I have
>     >     > > >> many
>     >     > > >>>>> different projects, will the Airflow support the
> Project
>     > concept
>     >     > and
>     >     > > >>>>> organize them separately ?
>     >     > > >>>>>
>     >     > > >>>>> Is this a known requirement or any plan for this
> already ?
>     >     > > >>>>>
>     >     > > >>>>> Thanks,
>     >     > > >>>>> Song
>     >     > > >>>>>
>     >     > > >>>
>     >     > > >>
>     >     > >
>     >     > >
>     >     > >
>     >     >
>     >
>     >
>     >
>
>
>

Re: About the project support in Airflow

Posted by Taylor Edmiston <te...@gmail.com>.
We've discussed internally something like having groups or "folders" for
DAGs in the UI.  Nothing functional on the backend, purely a front end
aesthetic.  Something like having DAGs named "foo/bar" and "foo/baz" would
be grouped like a tree visually in the UI:

- Group foo
  - DAG bar
  - DAG baz

Is that what you're looking for?

Best,
Taylor

On Thu, Apr 26, 2018 at 1:51 AM 刘松(Cycle++开发组) <li...@megvii.com> wrote:

> Hi Feng,
>
> Thanks for your information, indeed I have noticed this work also.
>
> But if I am understanding correctly, it is focus on the permission
> (edit/read etc.) with the DAG itself.
>
> “project concept” is some kind of “Group” but it is more meaningful than
> the “Tag”, so if we don’t want to support “project concept”, is there any
> other solution for this requirement or any consideration behind ?
>
> Many thanks for help.
>
> Thanks,
> Song
>
> On 26/04/2018, 12:28 PM, "Tao Feng" <fe...@gmail.com> wrote:
>
>     Hi Song,
>
>     Just noted that we are also working on dag-level access on top of
>     RBAC(AIRFLOW-2267) which should provide dag-level acl functionality.
> The
>     WIP pr could be found at
>     https://github.com/apache/incubator-airflow/pull/3197
>
>     On Wed, Apr 25, 2018 at 7:42 PM, 刘松(Cycle++开发组) <li...@megvii.com>
>     wrote:
>
>     > Hi Taylor,
>     >
>     > Yes, I know that this RBAC feature would be released within the 1.10
>     > release.
>     >
>     > # About multi-user support
>     >
>     > But Why not deploy one instance of Airflow per user ? (
>     > With this feature, don’t you think that the Airflow is to be more
> likely
>     > as a platform to serve more different users.
>     > Also multi-user case would exhaust the Airflow resource more easily
> if we
>     > are talking the scalability capability of Airflow.
>     >
>     > # About multi-project support
>     >
>     > You could see the “project” concept is some kind of logical group of
> the
>     > DAGs to let the DAGs be organized more structural.
>     > I can’t see it will beat the “scalability” of Airflow somehow, it
> just let
>     > the user experience be more friendly I see.
>     >
>     > So that is why I want to use the “multi-user support” case to argue
> why
>     > suggest using multi-instance for “multi-project”,
>     > since that I think the “multi-user” support is kindly of pushing the
>     > Airflow in the way of “be more scalable”, but “multi-project” just
> be more
>     > intuitive and more user-experience friendly.
>     >
>     > Thanks,
>     > Song
>     >
>     > On 26/04/2018, 4:50 AM, "Taylor Edmiston" <te...@gmail.com>
> wrote:
>     >
>     >     Something else that might be relevant for your multi-user use
> case is
>     > the
>     >     new RBAC support that Joy Gao added.
>     >
>     >     https://github.com/apache/incubator-airflow/pull/3015
>     >
>     >     *Taylor Edmiston*
>     >     Blog <http://blog.tedmiston.com> | Stack Overflow CV
>     >     <https://stackoverflow.com/story/taylor> | LinkedIn
>     >     <https://www.linkedin.com/in/tedmiston/> | AngelList
>     >     <https://angel.co/taylor>
>     >
>     >
>     >     On Wed, Apr 25, 2018 at 3:04 PM, James Meickle <
>     > jmeickle@quantopian.com>
>     >     wrote:
>     >
>     >     > Another reason you would want separated infrastructure is that
> there
>     > are a
>     >     > lot of ways to exhaust Airflow resources or otherwise cause
>     > contention -
>     >     > like having too many sensors or sub-DAGs using up all available
>     > tasks.
>     >     >
>     >     > Doesn't seem like a great idea to push for having different
> teams
>     > with
>     >     > co-tenancy until there is also per-team control over resource
> use...
>     >     >
>     >     > On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <
>     > liusong02@megvii.com>
>     >     > wrote:
>     >     >
>     >     > > It seems that all the current approach is pointing to
> multiple
>     > instance
>     >     > of
>     >     > > airflow, but project concept is very nature since one user
> might to
>     >     > handle
>     >     > > different type of tasks.
>     >     > >
>     >     > > Another thing about the multiple user support, one way is
> also to
>     > deploy
>     >     > > multiple instance, but it seems that airflow is providing
> multiple
>     > user
>     >     > > function builtin.
>     >     > >
>     >     > > So I can not be convinced that using multiple instance for
> multiple
>     >     > > project purpose.
>     >     > >
>     >     > > Thanks,
>     >     > > Song
>     >     > >
>     >     > >
>     >     > >
>     >     > >
>     >     > > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
>     >     > acehaidrey@gmail.com
>     >     > > <ma...@gmail.com>> wrote:
>     >     > >
>     >     > >
>     >     > > Looks neat Taylor!
>     >     > >
>     >     > > And regarding the original question, going off of what
> Maxime and
>     > Bolke
>     >     > > said, at Pandora, it made more sense for us to have an
> instance
>     > per team
>     >     > > since each team has its own system user for prod and the
> instance
>     > can run
>     >     > > all processes as that user. Alternatively you could have a
> super
>     > user
>     >     > that
>     >     > > can sudo as those other system users, and have many teams on
> a
>     > single
>     >     > > instance but that is a security concern (what if one team
> sudo's
>     > as the
>     >     > > other team and accidentally overwrites data - there is
> nothing
>     > stopping
>     >     > > them from doing it). It depends what your org set up is, but
> let
>     > me know
>     >     > if
>     >     > > there are any questions I can help with.
>     >     > >
>     >     > > Ace
>     >     > >
>     >     > >
>     >     > > > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
>     >     > > >
>     >     > > > We use a similar approach like Bolke mentioned with running
>     > multiple
>     >     > > > Airflow instances.
>     >     > > >
>     >     > > > I haven't read the Pandora article yet, but we have an
>     > Astronomer Open
>     >     > > > Edition (fully open source) that bundles similar tools like
>     > Prometheus,
>     >     > > > Grafana, Celery, etc with Airflow and a Docker Compose
> file if
>     > you're
>     >     > > > looking to get a setup like that up and running quickly.
>     >     > > >
>     >     > > > https://github.com/astronomerio/astronomer/blob/
>     >     > master/examples/airflow-
>     >     > > enterprise/docker-compose.yml
>     >     > > > https://github.com/astronomerio/astronomer
>     >     > > >
>     >     > > > *Taylor Edmiston*
>     >     > > > Blog  | Stack Overflow CV
>     >     > > >  | LinkedIn
>     >     > > >  | AngelList
>     >     > > >
>     >     > > >
>     >     > > >
>     >     > > > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
>     >     > > > maximebeauchemin@gmail.com> wrote:
>     >     > > >
>     >     > > >> Related blog post about multi-tenant Airflow deployment
> out of
>     >     > Pandora:
>     >     > > >>
> https://engineering.pandora.com/apache-airflow-at-pandora-
>     >     > 1d7a844d68ee
>     >     > > >>
>     >     > > >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
>     >     > > >> wrote:
>     >     > > >>
>     >     > > >>> My suggestion would be to deploy airflow per project. You
>     > could even
>     >     > > use
>     >     > > >>> airflow to manage your ci/cd pipeline.
>     >     > > >>>
>     >     > > >>> B.
>     >     > > >>>
>     >     > > >>> Sent from my iPhone
>     >     > > >>>
>     >     > > >>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
>     >     > > >> maximebeauchemin@gmail.com>
>     >     > > >>> wrote:
>     >     > > >>>>
>     >     > > >>>> People have been talking about namespacing DAGs in the
> past.
>     > I'd
>     >     > > >>> recommend
>     >     > > >>>> using tags (many to many) instead of
> categories/projects (one
>     > to
>     >     > > many).
>     >     > > >>>>
>     >     > > >>>> It should be fairly easy to add this feature. One
> question is
>     >     > whether
>     >     > > >>> tags
>     >     > > >>>> are defined as code or in the UI/db only.
>     >     > > >>>>
>     >     > > >>>> Max
>     >     > > >>>>
>     >     > > >>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
>     >     > > >> wrote:
>     >     > > >>>>>
>     >     > > >>>>> Hi,
>     >     > > >>>>>
>     >     > > >>>>> Basically the DAGs are created for a project purpose,
> so if
>     > I have
>     >     > > >> many
>     >     > > >>>>> different projects, will the Airflow support the
> Project
>     > concept
>     >     > and
>     >     > > >>>>> organize them separately ?
>     >     > > >>>>>
>     >     > > >>>>> Is this a known requirement or any plan for this
> already ?
>     >     > > >>>>>
>     >     > > >>>>> Thanks,
>     >     > > >>>>> Song
>     >     > > >>>>>
>     >     > > >>>
>     >     > > >>
>     >     > >
>     >     > >
>     >     > >
>     >     >
>     >
>     >
>     >
>
>
>

Re: About the project support in Airflow

Posted by "刘松 (Cycle++开发组)" <li...@megvii.com>.
Hi Feng,

Thanks for your information, indeed I have noticed this work also.

But if I am understanding correctly, it is focus on the permission (edit/read etc.) with the DAG itself.

“project concept” is some kind of “Group” but it is more meaningful than the “Tag”, so if we don’t want to support “project concept”, is there any other solution for this requirement or any consideration behind ?

Many thanks for help.

Thanks,
Song

On 26/04/2018, 12:28 PM, "Tao Feng" <fe...@gmail.com> wrote:

    Hi Song,
    
    Just noted that we are also working on dag-level access on top of
    RBAC(AIRFLOW-2267) which should provide dag-level acl functionality. The
    WIP pr could be found at
    https://github.com/apache/incubator-airflow/pull/3197
    
    On Wed, Apr 25, 2018 at 7:42 PM, 刘松(Cycle++开发组) <li...@megvii.com>
    wrote:
    
    > Hi Taylor,
    >
    > Yes, I know that this RBAC feature would be released within the 1.10
    > release.
    >
    > # About multi-user support
    >
    > But Why not deploy one instance of Airflow per user ? (
    > With this feature, don’t you think that the Airflow is to be more likely
    > as a platform to serve more different users.
    > Also multi-user case would exhaust the Airflow resource more easily if we
    > are talking the scalability capability of Airflow.
    >
    > # About multi-project support
    >
    > You could see the “project” concept is some kind of logical group of the
    > DAGs to let the DAGs be organized more structural.
    > I can’t see it will beat the “scalability” of Airflow somehow, it just let
    > the user experience be more friendly I see.
    >
    > So that is why I want to use the “multi-user support” case to argue why
    > suggest using multi-instance for “multi-project”,
    > since that I think the “multi-user” support is kindly of pushing the
    > Airflow in the way of “be more scalable”, but “multi-project” just be more
    > intuitive and more user-experience friendly.
    >
    > Thanks,
    > Song
    >
    > On 26/04/2018, 4:50 AM, "Taylor Edmiston" <te...@gmail.com> wrote:
    >
    >     Something else that might be relevant for your multi-user use case is
    > the
    >     new RBAC support that Joy Gao added.
    >
    >     https://github.com/apache/incubator-airflow/pull/3015
    >
    >     *Taylor Edmiston*
    >     Blog <http://blog.tedmiston.com> | Stack Overflow CV
    >     <https://stackoverflow.com/story/taylor> | LinkedIn
    >     <https://www.linkedin.com/in/tedmiston/> | AngelList
    >     <https://angel.co/taylor>
    >
    >
    >     On Wed, Apr 25, 2018 at 3:04 PM, James Meickle <
    > jmeickle@quantopian.com>
    >     wrote:
    >
    >     > Another reason you would want separated infrastructure is that there
    > are a
    >     > lot of ways to exhaust Airflow resources or otherwise cause
    > contention -
    >     > like having too many sensors or sub-DAGs using up all available
    > tasks.
    >     >
    >     > Doesn't seem like a great idea to push for having different teams
    > with
    >     > co-tenancy until there is also per-team control over resource use...
    >     >
    >     > On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <
    > liusong02@megvii.com>
    >     > wrote:
    >     >
    >     > > It seems that all the current approach is pointing to multiple
    > instance
    >     > of
    >     > > airflow, but project concept is very nature since one user might to
    >     > handle
    >     > > different type of tasks.
    >     > >
    >     > > Another thing about the multiple user support, one way is also to
    > deploy
    >     > > multiple instance, but it seems that airflow is providing multiple
    > user
    >     > > function builtin.
    >     > >
    >     > > So I can not be convinced that using multiple instance for multiple
    >     > > project purpose.
    >     > >
    >     > > Thanks,
    >     > > Song
    >     > >
    >     > >
    >     > >
    >     > >
    >     > > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
    >     > acehaidrey@gmail.com
    >     > > <ma...@gmail.com>> wrote:
    >     > >
    >     > >
    >     > > Looks neat Taylor!
    >     > >
    >     > > And regarding the original question, going off of what Maxime and
    > Bolke
    >     > > said, at Pandora, it made more sense for us to have an instance
    > per team
    >     > > since each team has its own system user for prod and the instance
    > can run
    >     > > all processes as that user. Alternatively you could have a super
    > user
    >     > that
    >     > > can sudo as those other system users, and have many teams on a
    > single
    >     > > instance but that is a security concern (what if one team sudo's
    > as the
    >     > > other team and accidentally overwrites data - there is nothing
    > stopping
    >     > > them from doing it). It depends what your org set up is, but let
    > me know
    >     > if
    >     > > there are any questions I can help with.
    >     > >
    >     > > Ace
    >     > >
    >     > >
    >     > > > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
    >     > > >
    >     > > > We use a similar approach like Bolke mentioned with running
    > multiple
    >     > > > Airflow instances.
    >     > > >
    >     > > > I haven't read the Pandora article yet, but we have an
    > Astronomer Open
    >     > > > Edition (fully open source) that bundles similar tools like
    > Prometheus,
    >     > > > Grafana, Celery, etc with Airflow and a Docker Compose file if
    > you're
    >     > > > looking to get a setup like that up and running quickly.
    >     > > >
    >     > > > https://github.com/astronomerio/astronomer/blob/
    >     > master/examples/airflow-
    >     > > enterprise/docker-compose.yml
    >     > > > https://github.com/astronomerio/astronomer
    >     > > >
    >     > > > *Taylor Edmiston*
    >     > > > Blog  | Stack Overflow CV
    >     > > >  | LinkedIn
    >     > > >  | AngelList
    >     > > >
    >     > > >
    >     > > >
    >     > > > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
    >     > > > maximebeauchemin@gmail.com> wrote:
    >     > > >
    >     > > >> Related blog post about multi-tenant Airflow deployment out of
    >     > Pandora:
    >     > > >> https://engineering.pandora.com/apache-airflow-at-pandora-
    >     > 1d7a844d68ee
    >     > > >>
    >     > > >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
    >     > > >> wrote:
    >     > > >>
    >     > > >>> My suggestion would be to deploy airflow per project. You
    > could even
    >     > > use
    >     > > >>> airflow to manage your ci/cd pipeline.
    >     > > >>>
    >     > > >>> B.
    >     > > >>>
    >     > > >>> Sent from my iPhone
    >     > > >>>
    >     > > >>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
    >     > > >> maximebeauchemin@gmail.com>
    >     > > >>> wrote:
    >     > > >>>>
    >     > > >>>> People have been talking about namespacing DAGs in the past.
    > I'd
    >     > > >>> recommend
    >     > > >>>> using tags (many to many) instead of categories/projects (one
    > to
    >     > > many).
    >     > > >>>>
    >     > > >>>> It should be fairly easy to add this feature. One question is
    >     > whether
    >     > > >>> tags
    >     > > >>>> are defined as code or in the UI/db only.
    >     > > >>>>
    >     > > >>>> Max
    >     > > >>>>
    >     > > >>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
    >     > > >> wrote:
    >     > > >>>>>
    >     > > >>>>> Hi,
    >     > > >>>>>
    >     > > >>>>> Basically the DAGs are created for a project purpose, so if
    > I have
    >     > > >> many
    >     > > >>>>> different projects, will the Airflow support the Project
    > concept
    >     > and
    >     > > >>>>> organize them separately ?
    >     > > >>>>>
    >     > > >>>>> Is this a known requirement or any plan for this already ?
    >     > > >>>>>
    >     > > >>>>> Thanks,
    >     > > >>>>> Song
    >     > > >>>>>
    >     > > >>>
    >     > > >>
    >     > >
    >     > >
    >     > >
    >     >
    >
    >
    >
    


Re: About the project support in Airflow

Posted by Tao Feng <fe...@gmail.com>.
Hi Song,

Just noted that we are also working on dag-level access on top of
RBAC(AIRFLOW-2267) which should provide dag-level acl functionality. The
WIP pr could be found at
https://github.com/apache/incubator-airflow/pull/3197

On Wed, Apr 25, 2018 at 7:42 PM, 刘松(Cycle++开发组) <li...@megvii.com>
wrote:

> Hi Taylor,
>
> Yes, I know that this RBAC feature would be released within the 1.10
> release.
>
> # About multi-user support
>
> But Why not deploy one instance of Airflow per user ? (
> With this feature, don’t you think that the Airflow is to be more likely
> as a platform to serve more different users.
> Also multi-user case would exhaust the Airflow resource more easily if we
> are talking the scalability capability of Airflow.
>
> # About multi-project support
>
> You could see the “project” concept is some kind of logical group of the
> DAGs to let the DAGs be organized more structural.
> I can’t see it will beat the “scalability” of Airflow somehow, it just let
> the user experience be more friendly I see.
>
> So that is why I want to use the “multi-user support” case to argue why
> suggest using multi-instance for “multi-project”,
> since that I think the “multi-user” support is kindly of pushing the
> Airflow in the way of “be more scalable”, but “multi-project” just be more
> intuitive and more user-experience friendly.
>
> Thanks,
> Song
>
> On 26/04/2018, 4:50 AM, "Taylor Edmiston" <te...@gmail.com> wrote:
>
>     Something else that might be relevant for your multi-user use case is
> the
>     new RBAC support that Joy Gao added.
>
>     https://github.com/apache/incubator-airflow/pull/3015
>
>     *Taylor Edmiston*
>     Blog <http://blog.tedmiston.com> | Stack Overflow CV
>     <https://stackoverflow.com/story/taylor> | LinkedIn
>     <https://www.linkedin.com/in/tedmiston/> | AngelList
>     <https://angel.co/taylor>
>
>
>     On Wed, Apr 25, 2018 at 3:04 PM, James Meickle <
> jmeickle@quantopian.com>
>     wrote:
>
>     > Another reason you would want separated infrastructure is that there
> are a
>     > lot of ways to exhaust Airflow resources or otherwise cause
> contention -
>     > like having too many sensors or sub-DAGs using up all available
> tasks.
>     >
>     > Doesn't seem like a great idea to push for having different teams
> with
>     > co-tenancy until there is also per-team control over resource use...
>     >
>     > On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <
> liusong02@megvii.com>
>     > wrote:
>     >
>     > > It seems that all the current approach is pointing to multiple
> instance
>     > of
>     > > airflow, but project concept is very nature since one user might to
>     > handle
>     > > different type of tasks.
>     > >
>     > > Another thing about the multiple user support, one way is also to
> deploy
>     > > multiple instance, but it seems that airflow is providing multiple
> user
>     > > function builtin.
>     > >
>     > > So I can not be convinced that using multiple instance for multiple
>     > > project purpose.
>     > >
>     > > Thanks,
>     > > Song
>     > >
>     > >
>     > >
>     > >
>     > > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
>     > acehaidrey@gmail.com
>     > > <ma...@gmail.com>> wrote:
>     > >
>     > >
>     > > Looks neat Taylor!
>     > >
>     > > And regarding the original question, going off of what Maxime and
> Bolke
>     > > said, at Pandora, it made more sense for us to have an instance
> per team
>     > > since each team has its own system user for prod and the instance
> can run
>     > > all processes as that user. Alternatively you could have a super
> user
>     > that
>     > > can sudo as those other system users, and have many teams on a
> single
>     > > instance but that is a security concern (what if one team sudo's
> as the
>     > > other team and accidentally overwrites data - there is nothing
> stopping
>     > > them from doing it). It depends what your org set up is, but let
> me know
>     > if
>     > > there are any questions I can help with.
>     > >
>     > > Ace
>     > >
>     > >
>     > > > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
>     > > >
>     > > > We use a similar approach like Bolke mentioned with running
> multiple
>     > > > Airflow instances.
>     > > >
>     > > > I haven't read the Pandora article yet, but we have an
> Astronomer Open
>     > > > Edition (fully open source) that bundles similar tools like
> Prometheus,
>     > > > Grafana, Celery, etc with Airflow and a Docker Compose file if
> you're
>     > > > looking to get a setup like that up and running quickly.
>     > > >
>     > > > https://github.com/astronomerio/astronomer/blob/
>     > master/examples/airflow-
>     > > enterprise/docker-compose.yml
>     > > > https://github.com/astronomerio/astronomer
>     > > >
>     > > > *Taylor Edmiston*
>     > > > Blog  | Stack Overflow CV
>     > > >  | LinkedIn
>     > > >  | AngelList
>     > > >
>     > > >
>     > > >
>     > > > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
>     > > > maximebeauchemin@gmail.com> wrote:
>     > > >
>     > > >> Related blog post about multi-tenant Airflow deployment out of
>     > Pandora:
>     > > >> https://engineering.pandora.com/apache-airflow-at-pandora-
>     > 1d7a844d68ee
>     > > >>
>     > > >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
>     > > >> wrote:
>     > > >>
>     > > >>> My suggestion would be to deploy airflow per project. You
> could even
>     > > use
>     > > >>> airflow to manage your ci/cd pipeline.
>     > > >>>
>     > > >>> B.
>     > > >>>
>     > > >>> Sent from my iPhone
>     > > >>>
>     > > >>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
>     > > >> maximebeauchemin@gmail.com>
>     > > >>> wrote:
>     > > >>>>
>     > > >>>> People have been talking about namespacing DAGs in the past.
> I'd
>     > > >>> recommend
>     > > >>>> using tags (many to many) instead of categories/projects (one
> to
>     > > many).
>     > > >>>>
>     > > >>>> It should be fairly easy to add this feature. One question is
>     > whether
>     > > >>> tags
>     > > >>>> are defined as code or in the UI/db only.
>     > > >>>>
>     > > >>>> Max
>     > > >>>>
>     > > >>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
>     > > >> wrote:
>     > > >>>>>
>     > > >>>>> Hi,
>     > > >>>>>
>     > > >>>>> Basically the DAGs are created for a project purpose, so if
> I have
>     > > >> many
>     > > >>>>> different projects, will the Airflow support the Project
> concept
>     > and
>     > > >>>>> organize them separately ?
>     > > >>>>>
>     > > >>>>> Is this a known requirement or any plan for this already ?
>     > > >>>>>
>     > > >>>>> Thanks,
>     > > >>>>> Song
>     > > >>>>>
>     > > >>>
>     > > >>
>     > >
>     > >
>     > >
>     >
>
>
>

Re: About the project support in Airflow

Posted by "刘松 (Cycle++开发组)" <li...@megvii.com>.
Hi Taylor,

Yes, I know that this RBAC feature would be released within the 1.10 release.

# About multi-user support

But Why not deploy one instance of Airflow per user ? (
With this feature, don’t you think that the Airflow is to be more likely as a platform to serve more different users.
Also multi-user case would exhaust the Airflow resource more easily if we are talking the scalability capability of Airflow.

# About multi-project support

You could see the “project” concept is some kind of logical group of the DAGs to let the DAGs be organized more structural.
I can’t see it will beat the “scalability” of Airflow somehow, it just let the user experience be more friendly I see.

So that is why I want to use the “multi-user support” case to argue why suggest using multi-instance for “multi-project”,
since that I think the “multi-user” support is kindly of pushing the Airflow in the way of “be more scalable”, but “multi-project” just be more intuitive and more user-experience friendly.  

Thanks,
Song

On 26/04/2018, 4:50 AM, "Taylor Edmiston" <te...@gmail.com> wrote:

    Something else that might be relevant for your multi-user use case is the
    new RBAC support that Joy Gao added.
    
    https://github.com/apache/incubator-airflow/pull/3015
    
    *Taylor Edmiston*
    Blog <http://blog.tedmiston.com> | Stack Overflow CV
    <https://stackoverflow.com/story/taylor> | LinkedIn
    <https://www.linkedin.com/in/tedmiston/> | AngelList
    <https://angel.co/taylor>
    
    
    On Wed, Apr 25, 2018 at 3:04 PM, James Meickle <jm...@quantopian.com>
    wrote:
    
    > Another reason you would want separated infrastructure is that there are a
    > lot of ways to exhaust Airflow resources or otherwise cause contention -
    > like having too many sensors or sub-DAGs using up all available tasks.
    >
    > Doesn't seem like a great idea to push for having different teams with
    > co-tenancy until there is also per-team control over resource use...
    >
    > On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <li...@megvii.com>
    > wrote:
    >
    > > It seems that all the current approach is pointing to multiple instance
    > of
    > > airflow, but project concept is very nature since one user might to
    > handle
    > > different type of tasks.
    > >
    > > Another thing about the multiple user support, one way is also to deploy
    > > multiple instance, but it seems that airflow is providing multiple user
    > > function builtin.
    > >
    > > So I can not be convinced that using multiple instance for multiple
    > > project purpose.
    > >
    > > Thanks,
    > > Song
    > >
    > >
    > >
    > >
    > > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
    > acehaidrey@gmail.com
    > > <ma...@gmail.com>> wrote:
    > >
    > >
    > > Looks neat Taylor!
    > >
    > > And regarding the original question, going off of what Maxime and Bolke
    > > said, at Pandora, it made more sense for us to have an instance per team
    > > since each team has its own system user for prod and the instance can run
    > > all processes as that user. Alternatively you could have a super user
    > that
    > > can sudo as those other system users, and have many teams on a single
    > > instance but that is a security concern (what if one team sudo's as the
    > > other team and accidentally overwrites data - there is nothing stopping
    > > them from doing it). It depends what your org set up is, but let me know
    > if
    > > there are any questions I can help with.
    > >
    > > Ace
    > >
    > >
    > > > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
    > > >
    > > > We use a similar approach like Bolke mentioned with running multiple
    > > > Airflow instances.
    > > >
    > > > I haven't read the Pandora article yet, but we have an Astronomer Open
    > > > Edition (fully open source) that bundles similar tools like Prometheus,
    > > > Grafana, Celery, etc with Airflow and a Docker Compose file if you're
    > > > looking to get a setup like that up and running quickly.
    > > >
    > > > https://github.com/astronomerio/astronomer/blob/
    > master/examples/airflow-
    > > enterprise/docker-compose.yml
    > > > https://github.com/astronomerio/astronomer
    > > >
    > > > *Taylor Edmiston*
    > > > Blog  | Stack Overflow CV
    > > >  | LinkedIn
    > > >  | AngelList
    > > >
    > > >
    > > >
    > > > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
    > > > maximebeauchemin@gmail.com> wrote:
    > > >
    > > >> Related blog post about multi-tenant Airflow deployment out of
    > Pandora:
    > > >> https://engineering.pandora.com/apache-airflow-at-pandora-
    > 1d7a844d68ee
    > > >>
    > > >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
    > > >> wrote:
    > > >>
    > > >>> My suggestion would be to deploy airflow per project. You could even
    > > use
    > > >>> airflow to manage your ci/cd pipeline.
    > > >>>
    > > >>> B.
    > > >>>
    > > >>> Sent from my iPhone
    > > >>>
    > > >>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
    > > >> maximebeauchemin@gmail.com>
    > > >>> wrote:
    > > >>>>
    > > >>>> People have been talking about namespacing DAGs in the past. I'd
    > > >>> recommend
    > > >>>> using tags (many to many) instead of categories/projects (one to
    > > many).
    > > >>>>
    > > >>>> It should be fairly easy to add this feature. One question is
    > whether
    > > >>> tags
    > > >>>> are defined as code or in the UI/db only.
    > > >>>>
    > > >>>> Max
    > > >>>>
    > > >>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
    > > >> wrote:
    > > >>>>>
    > > >>>>> Hi,
    > > >>>>>
    > > >>>>> Basically the DAGs are created for a project purpose, so if I have
    > > >> many
    > > >>>>> different projects, will the Airflow support the Project concept
    > and
    > > >>>>> organize them separately ?
    > > >>>>>
    > > >>>>> Is this a known requirement or any plan for this already ?
    > > >>>>>
    > > >>>>> Thanks,
    > > >>>>> Song
    > > >>>>>
    > > >>>
    > > >>
    > >
    > >
    > >
    >
    


Re: About the project support in Airflow

Posted by Taylor Edmiston <te...@gmail.com>.
Something else that might be relevant for your multi-user use case is the
new RBAC support that Joy Gao added.

https://github.com/apache/incubator-airflow/pull/3015

*Taylor Edmiston*
Blog <http://blog.tedmiston.com> | Stack Overflow CV
<https://stackoverflow.com/story/taylor> | LinkedIn
<https://www.linkedin.com/in/tedmiston/> | AngelList
<https://angel.co/taylor>


On Wed, Apr 25, 2018 at 3:04 PM, James Meickle <jm...@quantopian.com>
wrote:

> Another reason you would want separated infrastructure is that there are a
> lot of ways to exhaust Airflow resources or otherwise cause contention -
> like having too many sensors or sub-DAGs using up all available tasks.
>
> Doesn't seem like a great idea to push for having different teams with
> co-tenancy until there is also per-team control over resource use...
>
> On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <li...@megvii.com>
> wrote:
>
> > It seems that all the current approach is pointing to multiple instance
> of
> > airflow, but project concept is very nature since one user might to
> handle
> > different type of tasks.
> >
> > Another thing about the multiple user support, one way is also to deploy
> > multiple instance, but it seems that airflow is providing multiple user
> > function builtin.
> >
> > So I can not be convinced that using multiple instance for multiple
> > project purpose.
> >
> > Thanks,
> > Song
> >
> >
> >
> >
> > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
> acehaidrey@gmail.com
> > <ma...@gmail.com>> wrote:
> >
> >
> > Looks neat Taylor!
> >
> > And regarding the original question, going off of what Maxime and Bolke
> > said, at Pandora, it made more sense for us to have an instance per team
> > since each team has its own system user for prod and the instance can run
> > all processes as that user. Alternatively you could have a super user
> that
> > can sudo as those other system users, and have many teams on a single
> > instance but that is a security concern (what if one team sudo's as the
> > other team and accidentally overwrites data - there is nothing stopping
> > them from doing it). It depends what your org set up is, but let me know
> if
> > there are any questions I can help with.
> >
> > Ace
> >
> >
> > > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
> > >
> > > We use a similar approach like Bolke mentioned with running multiple
> > > Airflow instances.
> > >
> > > I haven't read the Pandora article yet, but we have an Astronomer Open
> > > Edition (fully open source) that bundles similar tools like Prometheus,
> > > Grafana, Celery, etc with Airflow and a Docker Compose file if you're
> > > looking to get a setup like that up and running quickly.
> > >
> > > https://github.com/astronomerio/astronomer/blob/
> master/examples/airflow-
> > enterprise/docker-compose.yml
> > > https://github.com/astronomerio/astronomer
> > >
> > > *Taylor Edmiston*
> > > Blog  | Stack Overflow CV
> > >  | LinkedIn
> > >  | AngelList
> > >
> > >
> > >
> > > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
> > > maximebeauchemin@gmail.com> wrote:
> > >
> > >> Related blog post about multi-tenant Airflow deployment out of
> Pandora:
> > >> https://engineering.pandora.com/apache-airflow-at-pandora-
> 1d7a844d68ee
> > >>
> > >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
> > >> wrote:
> > >>
> > >>> My suggestion would be to deploy airflow per project. You could even
> > use
> > >>> airflow to manage your ci/cd pipeline.
> > >>>
> > >>> B.
> > >>>
> > >>> Sent from my iPhone
> > >>>
> > >>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
> > >> maximebeauchemin@gmail.com>
> > >>> wrote:
> > >>>>
> > >>>> People have been talking about namespacing DAGs in the past. I'd
> > >>> recommend
> > >>>> using tags (many to many) instead of categories/projects (one to
> > many).
> > >>>>
> > >>>> It should be fairly easy to add this feature. One question is
> whether
> > >>> tags
> > >>>> are defined as code or in the UI/db only.
> > >>>>
> > >>>> Max
> > >>>>
> > >>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
> > >> wrote:
> > >>>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> Basically the DAGs are created for a project purpose, so if I have
> > >> many
> > >>>>> different projects, will the Airflow support the Project concept
> and
> > >>>>> organize them separately ?
> > >>>>>
> > >>>>> Is this a known requirement or any plan for this already ?
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Song
> > >>>>>
> > >>>
> > >>
> >
> >
> >
>

Re: About the project support in Airflow

Posted by Brian Greene <br...@heisenbergwoodworking.com>.
+1

Sent from a device with less than stellar autocorrect

> On Apr 25, 2018, at 12:04 PM, James Meickle <jm...@quantopian.com> wrote:
> 
> Another reason you would want separated infrastructure is that there are a
> lot of ways to exhaust Airflow resources or otherwise cause contention -
> like having too many sensors or sub-DAGs using up all available tasks.
> 
> Doesn't seem like a great idea to push for having different teams with
> co-tenancy until there is also per-team control over resource use...
> 
> On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <li...@megvii.com>
> wrote:
> 
>> It seems that all the current approach is pointing to multiple instance of
>> airflow, but project concept is very nature since one user might to handle
>> different type of tasks.
>> 
>> Another thing about the multiple user support, one way is also to deploy
>> multiple instance, but it seems that airflow is providing multiple user
>> function builtin.
>> 
>> So I can not be convinced that using multiple instance for multiple
>> project purpose.
>> 
>> Thanks,
>> Song
>> 
>> 
>> 
>> 
>> On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <acehaidrey@gmail.com
>> <ma...@gmail.com>> wrote:
>> 
>> 
>> Looks neat Taylor!
>> 
>> And regarding the original question, going off of what Maxime and Bolke
>> said, at Pandora, it made more sense for us to have an instance per team
>> since each team has its own system user for prod and the instance can run
>> all processes as that user. Alternatively you could have a super user that
>> can sudo as those other system users, and have many teams on a single
>> instance but that is a security concern (what if one team sudo's as the
>> other team and accidentally overwrites data - there is nothing stopping
>> them from doing it). It depends what your org set up is, but let me know if
>> there are any questions I can help with.
>> 
>> Ace
>> 
>> 
>>> On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
>>> 
>>> We use a similar approach like Bolke mentioned with running multiple
>>> Airflow instances.
>>> 
>>> I haven't read the Pandora article yet, but we have an Astronomer Open
>>> Edition (fully open source) that bundles similar tools like Prometheus,
>>> Grafana, Celery, etc with Airflow and a Docker Compose file if you're
>>> looking to get a setup like that up and running quickly.
>>> 
>>> https://github.com/astronomerio/astronomer/blob/master/examples/airflow-
>> enterprise/docker-compose.yml
>>> https://github.com/astronomerio/astronomer
>>> 
>>> *Taylor Edmiston*
>>> Blog  | Stack Overflow CV
>>> | LinkedIn
>>> | AngelList
>>> 
>>> 
>>> 
>>> On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
>>> maximebeauchemin@gmail.com> wrote:
>>> 
>>>> Related blog post about multi-tenant Airflow deployment out of Pandora:
>>>> https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee
>>>> 
>>>> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
>>>> wrote:
>>>> 
>>>>> My suggestion would be to deploy airflow per project. You could even
>> use
>>>>> airflow to manage your ci/cd pipeline.
>>>>> 
>>>>> B.
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
>>>> maximebeauchemin@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> People have been talking about namespacing DAGs in the past. I'd
>>>>> recommend
>>>>>> using tags (many to many) instead of categories/projects (one to
>> many).
>>>>>> 
>>>>>> It should be fairly easy to add this feature. One question is whether
>>>>> tags
>>>>>> are defined as code or in the UI/db only.
>>>>>> 
>>>>>> Max
>>>>>> 
>>>>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
>>>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Basically the DAGs are created for a project purpose, so if I have
>>>> many
>>>>>>> different projects, will the Airflow support the Project concept and
>>>>>>> organize them separately ?
>>>>>>> 
>>>>>>> Is this a known requirement or any plan for this already ?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Song
>>>>>>> 
>>>>> 
>>>> 
>> 
>> 
>> 

Re: About the project support in Airflow

Posted by James Meickle <jm...@quantopian.com>.
Another reason you would want separated infrastructure is that there are a
lot of ways to exhaust Airflow resources or otherwise cause contention -
like having too many sensors or sub-DAGs using up all available tasks.

Doesn't seem like a great idea to push for having different teams with
co-tenancy until there is also per-team control over resource use...

On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <li...@megvii.com>
wrote:

> It seems that all the current approach is pointing to multiple instance of
> airflow, but project concept is very nature since one user might to handle
> different type of tasks.
>
> Another thing about the multiple user support, one way is also to deploy
> multiple instance, but it seems that airflow is providing multiple user
> function builtin.
>
> So I can not be convinced that using multiple instance for multiple
> project purpose.
>
> Thanks,
> Song
>
>
>
>
> On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <acehaidrey@gmail.com
> <ma...@gmail.com>> wrote:
>
>
> Looks neat Taylor!
>
> And regarding the original question, going off of what Maxime and Bolke
> said, at Pandora, it made more sense for us to have an instance per team
> since each team has its own system user for prod and the instance can run
> all processes as that user. Alternatively you could have a super user that
> can sudo as those other system users, and have many teams on a single
> instance but that is a security concern (what if one team sudo's as the
> other team and accidentally overwrites data - there is nothing stopping
> them from doing it). It depends what your org set up is, but let me know if
> there are any questions I can help with.
>
> Ace
>
>
> > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
> >
> > We use a similar approach like Bolke mentioned with running multiple
> > Airflow instances.
> >
> > I haven't read the Pandora article yet, but we have an Astronomer Open
> > Edition (fully open source) that bundles similar tools like Prometheus,
> > Grafana, Celery, etc with Airflow and a Docker Compose file if you're
> > looking to get a setup like that up and running quickly.
> >
> > https://github.com/astronomerio/astronomer/blob/master/examples/airflow-
> enterprise/docker-compose.yml
> > https://github.com/astronomerio/astronomer
> >
> > *Taylor Edmiston*
> > Blog  | Stack Overflow CV
> >  | LinkedIn
> >  | AngelList
> >
> >
> >
> > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
> > maximebeauchemin@gmail.com> wrote:
> >
> >> Related blog post about multi-tenant Airflow deployment out of Pandora:
> >> https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee
> >>
> >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
> >> wrote:
> >>
> >>> My suggestion would be to deploy airflow per project. You could even
> use
> >>> airflow to manage your ci/cd pipeline.
> >>>
> >>> B.
> >>>
> >>> Sent from my iPhone
> >>>
> >>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
> >> maximebeauchemin@gmail.com>
> >>> wrote:
> >>>>
> >>>> People have been talking about namespacing DAGs in the past. I'd
> >>> recommend
> >>>> using tags (many to many) instead of categories/projects (one to
> many).
> >>>>
> >>>> It should be fairly easy to add this feature. One question is whether
> >>> tags
> >>>> are defined as code or in the UI/db only.
> >>>>
> >>>> Max
> >>>>
> >>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
> >> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> Basically the DAGs are created for a project purpose, so if I have
> >> many
> >>>>> different projects, will the Airflow support the Project concept and
> >>>>> organize them separately ?
> >>>>>
> >>>>> Is this a known requirement or any plan for this already ?
> >>>>>
> >>>>> Thanks,
> >>>>> Song
> >>>>>
> >>>
> >>
>
>
>

Re: About the project support in Airflow

Posted by "刘松 (Cycle++开发组)" <li...@megvii.com>.
It seems that all the current approach is pointing to multiple instance of airflow, but project concept is very nature since one user might to handle different type of tasks.

Another thing about the multiple user support, one way is also to deploy multiple instance, but it seems that airflow is providing multiple user function builtin.

So I can not be convinced that using multiple instance for multiple project purpose.

Thanks,
Song




On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <ac...@gmail.com>> wrote:


Looks neat Taylor!

And regarding the original question, going off of what Maxime and Bolke said, at Pandora, it made more sense for us to have an instance per team since each team has its own system user for prod and the instance can run all processes as that user. Alternatively you could have a super user that can sudo as those other system users, and have many teams on a single instance but that is a security concern (what if one team sudo's as the other team and accidentally overwrites data - there is nothing stopping them from doing it). It depends what your org set up is, but let me know if there are any questions I can help with.

Ace


> On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
>
> We use a similar approach like Bolke mentioned with running multiple
> Airflow instances.
>
> I haven't read the Pandora article yet, but we have an Astronomer Open
> Edition (fully open source) that bundles similar tools like Prometheus,
> Grafana, Celery, etc with Airflow and a Docker Compose file if you're
> looking to get a setup like that up and running quickly.
>
> https://github.com/astronomerio/astronomer/blob/master/examples/airflow-enterprise/docker-compose.yml
> https://github.com/astronomerio/astronomer
>
> *Taylor Edmiston*
> Blog  | Stack Overflow CV
>  | LinkedIn
>  | AngelList
>
>
>
> On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
>
>> Related blog post about multi-tenant Airflow deployment out of Pandora:
>> https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee
>>
>> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
>> wrote:
>>
>>> My suggestion would be to deploy airflow per project. You could even use
>>> airflow to manage your ci/cd pipeline.
>>>
>>> B.
>>>
>>> Sent from my iPhone
>>>
>>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
>> maximebeauchemin@gmail.com>
>>> wrote:
>>>>
>>>> People have been talking about namespacing DAGs in the past. I'd
>>> recommend
>>>> using tags (many to many) instead of categories/projects (one to many).
>>>>
>>>> It should be fairly easy to add this feature. One question is whether
>>> tags
>>>> are defined as code or in the UI/db only.
>>>>
>>>> Max
>>>>
>>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Basically the DAGs are created for a project purpose, so if I have
>> many
>>>>> different projects, will the Airflow support the Project concept and
>>>>> organize them separately ?
>>>>>
>>>>> Is this a known requirement or any plan for this already ?
>>>>>
>>>>> Thanks,
>>>>> Song
>>>>>
>>>
>>



Re: About the project support in Airflow

Posted by Ace Haidrey <ac...@gmail.com>.
Looks neat Taylor!

And regarding the original question, going off of what Maxime and Bolke said, at Pandora, it made more sense for us to have an instance per team since each team has its own system user for prod and the instance can run all processes as that user. Alternatively you could have a super user that can sudo as those other system users, and have many teams on a single instance but that is a security concern (what if one team sudo's as the other team and accidentally overwrites data - there is nothing stopping them from doing it). It depends what your org set up is, but let me know if there are any questions I can help with.

Ace


> On Apr 24, 2018, at 1:16 PM, Taylor Edmiston <te...@gmail.com> wrote:
> 
> We use a similar approach like Bolke mentioned with running multiple
> Airflow instances.
> 
> I haven't read the Pandora article yet, but we have an Astronomer Open
> Edition (fully open source) that bundles similar tools like Prometheus,
> Grafana, Celery, etc with Airflow and a Docker Compose file if you're
> looking to get a setup like that up and running quickly.
> 
> https://github.com/astronomerio/astronomer/blob/master/examples/airflow-enterprise/docker-compose.yml
> https://github.com/astronomerio/astronomer
> 
> *Taylor Edmiston*
> Blog <http://blog.tedmiston.com> | Stack Overflow CV
> <https://stackoverflow.com/story/taylor> | LinkedIn
> <https://www.linkedin.com/in/tedmiston/> | AngelList
> <https://angel.co/taylor>
> 
> 
> On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
> 
>> Related blog post about multi-tenant Airflow deployment out of Pandora:
>> https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee
>> 
>> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin <bd...@gmail.com>
>> wrote:
>> 
>>> My suggestion would be to deploy airflow per project. You could even use
>>> airflow to manage your ci/cd pipeline.
>>> 
>>> B.
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
>> maximebeauchemin@gmail.com>
>>> wrote:
>>>> 
>>>> People have been talking about namespacing DAGs in the past. I'd
>>> recommend
>>>> using tags (many to many) instead of categories/projects (one to many).
>>>> 
>>>> It should be fairly easy to add this feature. One question is whether
>>> tags
>>>> are defined as code or in the UI/db only.
>>>> 
>>>> Max
>>>> 
>>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu <so...@outlook.com>
>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Basically the DAGs are created for a project purpose, so if I have
>> many
>>>>> different projects, will the Airflow support the Project concept and
>>>>> organize them separately ?
>>>>> 
>>>>> Is this a known requirement or any plan for this already ?
>>>>> 
>>>>> Thanks,
>>>>> Song
>>>>> 
>>> 
>> 


Re: About the project support in Airflow

Posted by Taylor Edmiston <te...@gmail.com>.
We use a similar approach like Bolke mentioned with running multiple
Airflow instances.

I haven't read the Pandora article yet, but we have an Astronomer Open
Edition (fully open source) that bundles similar tools like Prometheus,
Grafana, Celery, etc with Airflow and a Docker Compose file if you're
looking to get a setup like that up and running quickly.

https://github.com/astronomerio/astronomer/blob/master/examples/airflow-enterprise/docker-compose.yml
https://github.com/astronomerio/astronomer

*Taylor Edmiston*
Blog <http://blog.tedmiston.com> | Stack Overflow CV
<https://stackoverflow.com/story/taylor> | LinkedIn
<https://www.linkedin.com/in/tedmiston/> | AngelList
<https://angel.co/taylor>


On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> Related blog post about multi-tenant Airflow deployment out of Pandora:
> https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee
>
> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin <bd...@gmail.com>
> wrote:
>
> > My suggestion would be to deploy airflow per project. You could even use
> > airflow to manage your ci/cd pipeline.
> >
> > B.
> >
> > Sent from my iPhone
> >
> > > On 24 Apr 2018, at 18:33, Maxime Beauchemin <
> maximebeauchemin@gmail.com>
> > wrote:
> > >
> > > People have been talking about namespacing DAGs in the past. I'd
> > recommend
> > > using tags (many to many) instead of categories/projects (one to many).
> > >
> > > It should be fairly easy to add this feature. One question is whether
> > tags
> > > are defined as code or in the UI/db only.
> > >
> > > Max
> > >
> > >> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu <so...@outlook.com>
> wrote:
> > >>
> > >> Hi,
> > >>
> > >> Basically the DAGs are created for a project purpose, so if I have
> many
> > >> different projects, will the Airflow support the Project concept and
> > >> organize them separately ?
> > >>
> > >> Is this a known requirement or any plan for this already ?
> > >>
> > >> Thanks,
> > >> Song
> > >>
> >
>

Re: About the project support in Airflow

Posted by Maxime Beauchemin <ma...@gmail.com>.
Related blog post about multi-tenant Airflow deployment out of Pandora:
https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee

On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin <bd...@gmail.com> wrote:

> My suggestion would be to deploy airflow per project. You could even use
> airflow to manage your ci/cd pipeline.
>
> B.
>
> Sent from my iPhone
>
> > On 24 Apr 2018, at 18:33, Maxime Beauchemin <ma...@gmail.com>
> wrote:
> >
> > People have been talking about namespacing DAGs in the past. I'd
> recommend
> > using tags (many to many) instead of categories/projects (one to many).
> >
> > It should be fairly easy to add this feature. One question is whether
> tags
> > are defined as code or in the UI/db only.
> >
> > Max
> >
> >> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu <so...@outlook.com> wrote:
> >>
> >> Hi,
> >>
> >> Basically the DAGs are created for a project purpose, so if I have many
> >> different projects, will the Airflow support the Project concept and
> >> organize them separately ?
> >>
> >> Is this a known requirement or any plan for this already ?
> >>
> >> Thanks,
> >> Song
> >>
>

Re: About the project support in Airflow

Posted by Bolke de Bruin <bd...@gmail.com>.
My suggestion would be to deploy airflow per project. You could even use airflow to manage your ci/cd pipeline. 

B.

Sent from my iPhone

> On 24 Apr 2018, at 18:33, Maxime Beauchemin <ma...@gmail.com> wrote:
> 
> People have been talking about namespacing DAGs in the past. I'd recommend
> using tags (many to many) instead of categories/projects (one to many).
> 
> It should be fairly easy to add this feature. One question is whether tags
> are defined as code or in the UI/db only.
> 
> Max
> 
>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu <so...@outlook.com> wrote:
>> 
>> Hi,
>> 
>> Basically the DAGs are created for a project purpose, so if I have many
>> different projects, will the Airflow support the Project concept and
>> organize them separately ?
>> 
>> Is this a known requirement or any plan for this already ?
>> 
>> Thanks,
>> Song
>> 

Re: About the project support in Airflow

Posted by Maxime Beauchemin <ma...@gmail.com>.
People have been talking about namespacing DAGs in the past. I'd recommend
using tags (many to many) instead of categories/projects (one to many).

It should be fairly easy to add this feature. One question is whether tags
are defined as code or in the UI/db only.

Max

On Tue, Apr 24, 2018 at 1:48 AM, Song Liu <so...@outlook.com> wrote:

> Hi,
>
> Basically the DAGs are created for a project purpose, so if I have many
> different projects, will the Airflow support the Project concept and
> organize them separately ?
>
> Is this a known requirement or any plan for this already ?
>
> Thanks,
> Song
>