You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Nadeem Ahmed Nazeer <na...@neon-lab.com> on 2016/08/08 22:20:40 UTC

Re: airflow scheduler error

Hi Bolke,

I have created AIRFLOW-401
<https://issues.apache.org/jira/browse/AIRFLOW-401> for this issue and
attached required info. Would be happy to provide any other details if
required. Would really appreciate if someone can help me on this.

Thanks,
Nadeem

On Mon, Jul 25, 2016 at 12:23 PM, Bolke de Bruin <bd...@gmail.com> wrote:

> Please include logging, dag structure, anything else relevant. Preferably
> add them in a jira.  This is really little to go on. Sorry!
>
> Sent from my iPhone
>
> > On 25 jul. 2016, at 20:23, Nadeem Ahmed Nazeer <na...@neon-lab.com>
> wrote:
> >
> > Hello,
> >
> > Would really appreciate any response on this.
> >
> > I've also observed that the scheduler gets stuck when running big jobs
> that
> > run for hours. We have a mapreduce job that runs for 5-6 hours. After
> this
> > job is complete, scheduler doesn't seem to run the downstream tasks and
> its
> > stuck until its manually restarted. This time i dont even see the HTTP
> > errors that i mentioned earlier. Not sure why it would get stuck,
> >
> > Airflow version 1.7.1.2
> >
> > Thanks,
> > Nadeem
> >
> > On Wed, Jul 20, 2016 at 5:18 PM, Nadeem Ahmed Nazeer <
> nazeer@neon-lab.com>
> > wrote:
> >
> >> Hello,
> >>
> >> My airflow scheduler seems to be getting stuck due to an error.
> >>
> >> From scheduler logs,
> >>
> >> HTTPError: HTTP 502: socket error
> >> Logged from file jobs.py, line 574
> >>
> >> Looks like it happens when the scheduler is trying to get the list of
> >> queued tasks from the metadata database. There are no errors being
> reported
> >> on the DB side though. The metadata database is a mysql RDS instance
> >> running on aws.
> >>
> >> I will have to restart the scheduler service manually multiple times to
> >> get it going before it gets stuck again. It appears that the scheduler
> has
> >> some trouble polling the db occasionally. But, this is only error i see
> >> from the logs.
> >>
> >> Below is my config,
> >>
> >> sql_alchemy_pool_recycle = 3600
> >> parallelism = 32
> >> celeryd_concurrency = 4
> >> scheduler_heartbeat_sec = 120
> >>
> >> Has someone faced this similar error with the scheduler or metadata db?
> >> Please share any inputs that could help me resolve this issue.
> >>
> >> Is there an optimal configuration for the scheduler that i can put in
> >> airflow.cfg to enable the scheduler run smoothly and be fast? Please
> share
> >> the scheduler related configs if you have one that is running without
> >> problems.
> >>
> >> Thanks,
> >> Nadeem
> >>
>