You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Song Liu <so...@outlook.com> on 2018/04/26 17:04:38 UTC

答复: About the project support in Airflow

Hi Taylor,

This would be one solution for the "project/folder" concept, so that if the user could create a DAG from the UI (in future ?) then user could name this DAG with "<project_name>/<dag_name>", from the UI it just like as tree view, yeah it is some kind of to solve my requirement. About the backend implementation is parsing the DAG path info or storing the project relationship in database, it's your implementation decision.

Thanks but I suggest make a survey about what does other user think about it.

At least the project concept is a real requirement for me.

Thanks,
Song
________________________________
发件人: Taylor Edmiston <te...@gmail.com>
发送时间: 2018年4月26日 15:49
收件人: dev@airflow.incubator.apache.org
主题: Re: About the project support in Airflow

We've discussed internally something like having groups or "folders" for
DAGs in the UI.  Nothing functional on the backend, purely a front end
aesthetic.  Something like having DAGs named "foo/bar" and "foo/baz" would
be grouped like a tree visually in the UI:

- Group foo
  - DAG bar
  - DAG baz

Is that what you're looking for?

Best,
Taylor

On Thu, Apr 26, 2018 at 1:51 AM 刘松(Cycle++开发组) <li...@megvii.com> wrote:

> Hi Feng,
>
> Thanks for your information, indeed I have noticed this work also.
>
> But if I am understanding correctly, it is focus on the permission
> (edit/read etc.) with the DAG itself.
>
> “project concept” is some kind of “Group” but it is more meaningful than
> the “Tag”, so if we don’t want to support “project concept”, is there any
> other solution for this requirement or any consideration behind ?
>
> Many thanks for help.
>
> Thanks,
> Song
>
> On 26/04/2018, 12:28 PM, "Tao Feng" <fe...@gmail.com> wrote:
>
>     Hi Song,
>
>     Just noted that we are also working on dag-level access on top of
>     RBAC(AIRFLOW-2267) which should provide dag-level acl functionality.
> The
>     WIP pr could be found at
>     https://github.com/apache/incubator-airflow/pull/3197
>
>     On Wed, Apr 25, 2018 at 7:42 PM, 刘松(Cycle++开发组) <li...@megvii.com>
>     wrote:
>
>     > Hi Taylor,
>     >
>     > Yes, I know that this RBAC feature would be released within the 1.10
>     > release.
>     >
>     > # About multi-user support
>     >
>     > But Why not deploy one instance of Airflow per user ? (
>     > With this feature, don’t you think that the Airflow is to be more
> likely
>     > as a platform to serve more different users.
>     > Also multi-user case would exhaust the Airflow resource more easily
> if we
>     > are talking the scalability capability of Airflow.
>     >
>     > # About multi-project support
>     >
>     > You could see the “project” concept is some kind of logical group of
> the
>     > DAGs to let the DAGs be organized more structural.
>     > I can’t see it will beat the “scalability” of Airflow somehow, it
> just let
>     > the user experience be more friendly I see.
>     >
>     > So that is why I want to use the “multi-user support” case to argue
> why
>     > suggest using multi-instance for “multi-project”,
>     > since that I think the “multi-user” support is kindly of pushing the
>     > Airflow in the way of “be more scalable”, but “multi-project” just
> be more
>     > intuitive and more user-experience friendly.
>     >
>     > Thanks,
>     > Song
>     >
>     > On 26/04/2018, 4:50 AM, "Taylor Edmiston" <te...@gmail.com>
> wrote:
>     >
>     >     Something else that might be relevant for your multi-user use
> case is
>     > the
>     >     new RBAC support that Joy Gao added.
>     >
>     >     https://github.com/apache/incubator-airflow/pull/3015
>     >
>     >     *Taylor Edmiston*
>     >     Blog <http://blog.tedmiston.com> | Stack Overflow CV
>     >     <https://stackoverflow.com/story/taylor> | LinkedIn
>     >     <https://www.linkedin.com/in/tedmiston/> | AngelList
>     >     <https://angel.co/taylor>
>     >
>     >
>     >     On Wed, Apr 25, 2018 at 3:04 PM, James Meickle <
>     > jmeickle@quantopian.com>
>     >     wrote:
>     >
>     >     > Another reason you would want separated infrastructure is that
> there
>     > are a
>     >     > lot of ways to exhaust Airflow resources or otherwise cause
>     > contention -
>     >     > like having too many sensors or sub-DAGs using up all available
>     > tasks.
>     >     >
>     >     > Doesn't seem like a great idea to push for having different
> teams
>     > with
>     >     > co-tenancy until there is also per-team control over resource
> use...
>     >     >
>     >     > On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <
>     > liusong02@megvii.com>
>     >     > wrote:
>     >     >
>     >     > > It seems that all the current approach is pointing to
> multiple
>     > instance
>     >     > of
>     >     > > airflow, but project concept is very nature since one user
> might to
>     >     > handle
>     >     > > different type of tasks.
>     >     > >
>     >     > > Another thing about the multiple user support, one way is
> also to
>     > deploy
>     >     > > multiple instance, but it seems that airflow is providing
> multiple
>     > user
>     >     > > function builtin.
>     >     > >
>     >     > > So I can not be convinced that using multiple instance for
> multiple
>     >     > > project purpose.
>     >     > >
>     >     > > Thanks,
>     >     > > Song
>     >     > >
>     >     > >
>     >     > >
>     >     > >
>     >     > > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
>     >     > acehaidrey@gmail.com
>     >     > > <ma...@gmail.com>> wrote:
>     >     > >
>     >     > >
>     >     > > Looks neat Taylor!
>     >     > >
>     >     > > And regarding the original question, going off of what
> Maxime and
>     > Bolke
>     >     > > said, at Pandora, it made more sense for us to have an
> instance
>     > per team
>     >     > > since each team has its own system user for prod and the
> instance
>     > can run
>     >     > > all processes as that user. Alternatively you could have a
> super
>     > user
>     >     > that
>     >     > > can sudo as those other system users, and have many teams on
> a
>     > single
>     >     > > instance but that is a security concern (what if one team
> sudo's
>     > as the
>     >     > > other team and accidentally overwrites data - there is
> nothing
>     > stopping
>     >     > > them from doing it). It depends what your org set up is, but
> let
>     > me know
>     >     > if
>     >     > > there are any questions I can help with.
>     >     > >
>     >     > > Ace
>     >     > >
>     >     > >
>     >     > > > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
>     >     > > >
>     >     > > > We use a similar approach like Bolke mentioned with running
>     > multiple
>     >     > > > Airflow instances.
>     >     > > >
>     >     > > > I haven't read the Pandora article yet, but we have an
>     > Astronomer Open
>     >     > > > Edition (fully open source) that bundles similar tools like
>     > Prometheus,
>     >     > > > Grafana, Celery, etc with Airflow and a Docker Compose
> file if
>     > you're
>     >     > > > looking to get a setup like that up and running quickly.
>     >     > > >
>     >     > > > https://github.com/astronomerio/astronomer/blob/
>     >     > master/examples/airflow-
>     >     > > enterprise/docker-compose.yml
>     >     > > > https://github.com/astronomerio/astronomer
>     >     > > >
>     >     > > > *Taylor Edmiston*
>     >     > > > Blog  | Stack Overflow CV
>     >     > > >  | LinkedIn
>     >     > > >  | AngelList
>     >     > > >
>     >     > > >
>     >     > > >
>     >     > > > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
>     >     > > > maximebeauchemin@gmail.com> wrote:
>     >     > > >
>     >     > > >> Related blog post about multi-tenant Airflow deployment
> out of
>     >     > Pandora:
>     >     > > >>
> https://engineering.pandora.com/apache-airflow-at-pandora-
>     >     > 1d7a844d68ee
>     >     > > >>
>     >     > > >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
>     >     > > >> wrote:
>     >     > > >>
>     >     > > >>> My suggestion would be to deploy airflow per project. You
>     > could even
>     >     > > use
>     >     > > >>> airflow to manage your ci/cd pipeline.
>     >     > > >>>
>     >     > > >>> B.
>     >     > > >>>
>     >     > > >>> Sent from my iPhone
>     >     > > >>>
>     >     > > >>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
>     >     > > >> maximebeauchemin@gmail.com>
>     >     > > >>> wrote:
>     >     > > >>>>
>     >     > > >>>> People have been talking about namespacing DAGs in the
> past.
>     > I'd
>     >     > > >>> recommend
>     >     > > >>>> using tags (many to many) instead of
> categories/projects (one
>     > to
>     >     > > many).
>     >     > > >>>>
>     >     > > >>>> It should be fairly easy to add this feature. One
> question is
>     >     > whether
>     >     > > >>> tags
>     >     > > >>>> are defined as code or in the UI/db only.
>     >     > > >>>>
>     >     > > >>>> Max
>     >     > > >>>>
>     >     > > >>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
>     >     > > >> wrote:
>     >     > > >>>>>
>     >     > > >>>>> Hi,
>     >     > > >>>>>
>     >     > > >>>>> Basically the DAGs are created for a project purpose,
> so if
>     > I have
>     >     > > >> many
>     >     > > >>>>> different projects, will the Airflow support the
> Project
>     > concept
>     >     > and
>     >     > > >>>>> organize them separately ?
>     >     > > >>>>>
>     >     > > >>>>> Is this a known requirement or any plan for this
> already ?
>     >     > > >>>>>
>     >     > > >>>>> Thanks,
>     >     > > >>>>> Song
>     >     > > >>>>>
>     >     > > >>>
>     >     > > >>
>     >     > >
>     >     > >
>     >     > >
>     >     >
>     >
>     >
>     >
>
>
>