You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by "Ryabchuk, Pavlo" <ex...@here.com> on 2016/05/30 12:03:45 UTC

Airflow scheduler/worker inefficient time

Hi all,

Maybe I am misusing airflow a bit, because I am using it as on demand (triggered) complex data processing system, but still, the question is, what are the actual parameters I should play around with in order to speedup execution?
I have around 250 Dummy tasks (which do nothing) in my DAG and running it locally with celery executor takes around 1000 sec, which is pretty strange.  I've noticed that a single Dummy task takes some milliseconds. I've tried playing around with celery concurrency, airflow executor parallelism and heartbeat, but with almost no result... it's really strange, what am I doing wrong :)

Best,
Pavlo



Re: Airflow scheduler/worker inefficient time

Posted by Maxime Beauchemin <ma...@gmail.com>.
Note that in general, Airflow isn't designed to run thousands of small
tasks per minute. The celery library on its own does that well without any
oversight from Airflow, though then you miss out on what Airflow has to
provide (complex dependency management, state handling, logging, retries,
...).

Airflow typically assume long running batch processes, in the minutes to
hours range. If you need sub-second or even sub-minute latency between your
tasks, Airflow probably isn't the right choice.

One goal we have for the project is to allow for the scheduler to trigger
jobs roughly every minute and maintain that at scale.

Max

On Fri, Jun 3, 2016 at 10:07 AM, Ryabchuk, Pavlo <
ext-pavlo.ryabchuk@here.com> wrote:

> Hey,
> Had a look at this celery config option, but no luck. Also tried setting
> executor to Local executor - same result
> Each task takes no more than 0.1 sec but overall time is huge
> Thought that it could be due to disabled pickling, enabled it - almost no
> change :(
>
> -----Original Message-----
> From: Bolke de Bruin [mailto:bdbruin@gmail.com]
> Sent: Monday, May 30, 2016 3:09 PM
> To: dev@airflow.incubator.apache.org
> Subject: Re: Airflow scheduler/worker inefficient time
>
> Have a look at this: https://github.com/apache/incubator-airflow/pull/1509
>
>
> Sent from my iPhone
>
> > On 30 mei 2016, at 14:03, Ryabchuk, Pavlo <ex...@here.com>
> wrote:
> >
> > Hi all,
> >
> > Maybe I am misusing airflow a bit, because I am using it as on demand
> (triggered) complex data processing system, but still, the question is,
> what are the actual parameters I should play around with in order to
> speedup execution?
> > I have around 250 Dummy tasks (which do nothing) in my DAG and running
> it locally with celery executor takes around 1000 sec, which is pretty
> strange.  I've noticed that a single Dummy task takes some milliseconds.
> I've tried playing around with celery concurrency, airflow executor
> parallelism and heartbeat, but with almost no result... it's really
> strange, what am I doing wrong :)
> >
> > Best,
> > Pavlo
> >
> >
>

RE: Airflow scheduler/worker inefficient time

Posted by "Ryabchuk, Pavlo" <ex...@here.com>.
Hey,
Had a look at this celery config option, but no luck. Also tried setting executor to Local executor - same result
Each task takes no more than 0.1 sec but overall time is huge
Thought that it could be due to disabled pickling, enabled it - almost no change :(

-----Original Message-----
From: Bolke de Bruin [mailto:bdbruin@gmail.com] 
Sent: Monday, May 30, 2016 3:09 PM
To: dev@airflow.incubator.apache.org
Subject: Re: Airflow scheduler/worker inefficient time

Have a look at this: https://github.com/apache/incubator-airflow/pull/1509


Sent from my iPhone

> On 30 mei 2016, at 14:03, Ryabchuk, Pavlo <ex...@here.com> wrote:
> 
> Hi all,
> 
> Maybe I am misusing airflow a bit, because I am using it as on demand (triggered) complex data processing system, but still, the question is, what are the actual parameters I should play around with in order to speedup execution?
> I have around 250 Dummy tasks (which do nothing) in my DAG and running it locally with celery executor takes around 1000 sec, which is pretty strange.  I've noticed that a single Dummy task takes some milliseconds. I've tried playing around with celery concurrency, airflow executor parallelism and heartbeat, but with almost no result... it's really strange, what am I doing wrong :)
> 
> Best,
> Pavlo
> 
> 

Re: Airflow scheduler/worker inefficient time

Posted by Bolke de Bruin <bd...@gmail.com>.
Have a look at this: https://github.com/apache/incubator-airflow/pull/1509


Sent from my iPhone

> On 30 mei 2016, at 14:03, Ryabchuk, Pavlo <ex...@here.com> wrote:
> 
> Hi all,
> 
> Maybe I am misusing airflow a bit, because I am using it as on demand (triggered) complex data processing system, but still, the question is, what are the actual parameters I should play around with in order to speedup execution?
> I have around 250 Dummy tasks (which do nothing) in my DAG and running it locally with celery executor takes around 1000 sec, which is pretty strange.  I've noticed that a single Dummy task takes some milliseconds. I've tried playing around with celery concurrency, airflow executor parallelism and heartbeat, but with almost no result... it's really strange, what am I doing wrong :)
> 
> Best,
> Pavlo
> 
>