You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Zsolt Tóth <to...@gmail.com> on 2017/06/01 08:31:17 UTC

Subdag failure

Hi devs,

I got an issue with airflow subdags, that I can't debug or trace.
A specific subdag fails nondeterministically (usually it fails) when it's
scheduled by Airflow, and runs fine when the task is re-executed manually.

For the first subtask in the dag the logs can't be found by airflow, but
that may be a different issue. If the subtask succeeds, Airflow finds the
logs, too.

*** Log file isn't local.…*** Fetching here:
http://None:8793/log/<parent_dag_id.subdag_id>/<task_id>/2017-05-28T00:45:00…***
Failed to fetch log file from worker.

If I look into the logs, I can't find any specific details. The
SubdagOperator's logs:

[2017-05-30 14:06:06,117] {models.py:1219} INFO - Executing
<Task(SubDagOperator): <subdag_id>> on 2017-05-18 00:45:00
[2017-05-30 14:06:06,247] {base_executor.py:36} INFO - Adding to queue:
airflow run <parent_dag_id.subdag_id> <task_id> 2017-05-18T00:45:00 --local
-sd DAGS_FOLDER/parent_dag_file.py
[2017-05-30 14:06:11,151] {sequential_executor.py:26} INFO - Executing
command: airflow run <parent_dag_id.subdag_id> <task_id>
2017-05-18T00:45:00 --local -sd DAGS_FOLDER/parent_dag_file.py
[2017-05-30 14:06:12,119] {sequential_executor.py:33} ERROR - Failed to
execute task Command 'airflow run <parent_dag_id.subdag_id> <task_id>
2017-05-18T00:45:00 --local -sd DAGS_FOLDER/parent_dag_file.py ' returned
non-zero exit status 1:
[2017-05-30 14:06:12,125] {jobs.py:924} ERROR - Task instance
('<parent_dag_id.subdag_id>', '<task_id>', datetime.datetime(2017, 5, 18,
0, 45)) failed
[2017-05-30 14:06:12,125] {jobs.py:997} INFO - [backfill progress] |
waiting: 2 | succeeded: 0 | kicked_off: 1 | failed: 1 | skipped: 0 |
deadlocked: 0
[2017-05-30 14:06:16,150] {jobs.py:997} INFO - [backfill progress] |
waiting: 0 | succeeded: 0 | kicked_off: 1 | failed: 1 | skipped: 0 |
deadlocked: 2
[2017-05-30 14:06:16,182] {models.py:1286} ERROR -
---------------------------------------------------
Some task instances failed:
set([('<parent_dag_id.subdag_id>', '<task_id>', datetime.datetime(2017, 5,
18, 0, 45))])
...

Regarding the subdag's tasks, only the first task contains any logs, and
only this single line:
[2017-05-30 14:06:11,980] {models.py:154} INFO - Filling up the DagBag from
<parent_dag_file>

I suspected that the issue may be that Airflow is configured to use the
SequentialExecutor, but the airflow run command in the logs has the --local
switch, which I think forces the LocalExecutor. I tried to change this to
SequentialExecutor by setting the SubdagOperator's executor field, but the
--local switch is still there.

I'm using Airflow 1.7.1.3.

Thanks,
Zsolt

Re: Subdag failure

Posted by Zsolt Tóth <to...@gmail.com>.
Hi Max,

thanks for the clarification.
I believe this means that the --local switch has nothing to do with the
issue. Can you suggests some way to track down the subdag error?

Thanks,
Zsolt

2017-06-01 18:08 GMT+02:00 Maxime Beauchemin <ma...@gmail.com>:

> For the record:
> https://github.com/apache/incubator-airflow/pull/2336/files
>
> On Thu, Jun 1, 2017 at 8:50 AM, Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
>
> > From memory, I'm pretty sure `--local` simply means "run locally without
> > going through an executor" (as opposed to "run through LocalExecutor"
> that
> > you assume). For the record, there are 3 levels of `airflow run`, let's
> go
> > deeper:
> >
> > 1. `airflow run` which typically is only called through a user using the
> > CLI, and talks to the executor
> > 2. `airflow run --local` which is executed locally, "on the other side"
> of
> > the executor, and wraps level 3 (explained bellow) while managing
> > heartbeats and looking for external kill signals. If you us the CGROUP or
> > impersonation feature, I believe this has to run with elevated
> priviledges.
> > Workers typically receive and execute these commands.
> > 3. `airflow run --raw` which actually runs the operator's execute method,
> > impersonation and CGROUPs restrictions apply here
> >
> > This is a bit confusing and it could help to make `--local` and `--raw`
> > more "private", as in for system/internal use only.
> >
> > I also just noticed that the `--local` for `airflow run` is wrong in the
> > docs as it's re-using the backfill help text, I'll send a PR to fix the
> > docs.
> >
> > Max
> >
> > On Thu, Jun 1, 2017 at 1:31 AM, Zsolt Tóth <to...@gmail.com>
> > wrote:
> >
> >> Hi devs,
> >>
> >> I got an issue with airflow subdags, that I can't debug or trace.
> >> A specific subdag fails nondeterministically (usually it fails) when
> it's
> >> scheduled by Airflow, and runs fine when the task is re-executed
> manually.
> >>
> >> For the first subtask in the dag the logs can't be found by airflow, but
> >> that may be a different issue. If the subtask succeeds, Airflow finds
> the
> >> logs, too.
> >>
> >> *** Log file isn't local.…*** Fetching here:
> >> http://None:8793/log/<parent_dag_id.subdag_id>/<task_id>/201
> >> 7-05-28T00:45:00…***
> >> Failed to fetch log file from worker.
> >>
> >> If I look into the logs, I can't find any specific details. The
> >> SubdagOperator's logs:
> >>
> >> [2017-05-30 14:06:06,117] {models.py:1219} INFO - Executing
> >> <Task(SubDagOperator): <subdag_id>> on 2017-05-18 00:45:00
> >> [2017-05-30 14:06:06,247] {base_executor.py:36} INFO - Adding to queue:
> >> airflow run <parent_dag_id.subdag_id> <task_id> 2017-05-18T00:45:00
> >> --local
> >> -sd DAGS_FOLDER/parent_dag_file.py
> >> [2017-05-30 14:06:11,151] {sequential_executor.py:26} INFO - Executing
> >> command: airflow run <parent_dag_id.subdag_id> <task_id>
> >> 2017-05-18T00:45:00 --local -sd DAGS_FOLDER/parent_dag_file.py
> >> [2017-05-30 14:06:12,119] {sequential_executor.py:33} ERROR - Failed to
> >> execute task Command 'airflow run <parent_dag_id.subdag_id> <task_id>
> >> 2017-05-18T00:45:00 --local -sd DAGS_FOLDER/parent_dag_file.py '
> returned
> >> non-zero exit status 1:
> >> [2017-05-30 14:06:12,125] {jobs.py:924} ERROR - Task instance
> >> ('<parent_dag_id.subdag_id>', '<task_id>', datetime.datetime(2017, 5,
> 18,
> >> 0, 45)) failed
> >> [2017-05-30 14:06:12,125] {jobs.py:997} INFO - [backfill progress] |
> >> waiting: 2 | succeeded: 0 | kicked_off: 1 | failed: 1 | skipped: 0 |
> >> deadlocked: 0
> >> [2017-05-30 14:06:16,150] {jobs.py:997} INFO - [backfill progress] |
> >> waiting: 0 | succeeded: 0 | kicked_off: 1 | failed: 1 | skipped: 0 |
> >> deadlocked: 2
> >> [2017-05-30 14:06:16,182] {models.py:1286} ERROR -
> >> ---------------------------------------------------
> >> Some task instances failed:
> >> set([('<parent_dag_id.subdag_id>', '<task_id>', datetime.datetime(2017,
> >> 5,
> >> 18, 0, 45))])
> >> ...
> >>
> >> Regarding the subdag's tasks, only the first task contains any logs, and
> >> only this single line:
> >> [2017-05-30 14:06:11,980] {models.py:154} INFO - Filling up the DagBag
> >> from
> >> <parent_dag_file>
> >>
> >> I suspected that the issue may be that Airflow is configured to use the
> >> SequentialExecutor, but the airflow run command in the logs has the
> >> --local
> >> switch, which I think forces the LocalExecutor. I tried to change this
> to
> >> SequentialExecutor by setting the SubdagOperator's executor field, but
> the
> >> --local switch is still there.
> >>
> >> I'm using Airflow 1.7.1.3.
> >>
> >> Thanks,
> >> Zsolt
> >>
> >
> >
>

Re: Subdag failure

Posted by Maxime Beauchemin <ma...@gmail.com>.
For the record:
https://github.com/apache/incubator-airflow/pull/2336/files

On Thu, Jun 1, 2017 at 8:50 AM, Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> From memory, I'm pretty sure `--local` simply means "run locally without
> going through an executor" (as opposed to "run through LocalExecutor" that
> you assume). For the record, there are 3 levels of `airflow run`, let's go
> deeper:
>
> 1. `airflow run` which typically is only called through a user using the
> CLI, and talks to the executor
> 2. `airflow run --local` which is executed locally, "on the other side" of
> the executor, and wraps level 3 (explained bellow) while managing
> heartbeats and looking for external kill signals. If you us the CGROUP or
> impersonation feature, I believe this has to run with elevated priviledges.
> Workers typically receive and execute these commands.
> 3. `airflow run --raw` which actually runs the operator's execute method,
> impersonation and CGROUPs restrictions apply here
>
> This is a bit confusing and it could help to make `--local` and `--raw`
> more "private", as in for system/internal use only.
>
> I also just noticed that the `--local` for `airflow run` is wrong in the
> docs as it's re-using the backfill help text, I'll send a PR to fix the
> docs.
>
> Max
>
> On Thu, Jun 1, 2017 at 1:31 AM, Zsolt Tóth <to...@gmail.com>
> wrote:
>
>> Hi devs,
>>
>> I got an issue with airflow subdags, that I can't debug or trace.
>> A specific subdag fails nondeterministically (usually it fails) when it's
>> scheduled by Airflow, and runs fine when the task is re-executed manually.
>>
>> For the first subtask in the dag the logs can't be found by airflow, but
>> that may be a different issue. If the subtask succeeds, Airflow finds the
>> logs, too.
>>
>> *** Log file isn't local.…*** Fetching here:
>> http://None:8793/log/<parent_dag_id.subdag_id>/<task_id>/201
>> 7-05-28T00:45:00…***
>> Failed to fetch log file from worker.
>>
>> If I look into the logs, I can't find any specific details. The
>> SubdagOperator's logs:
>>
>> [2017-05-30 14:06:06,117] {models.py:1219} INFO - Executing
>> <Task(SubDagOperator): <subdag_id>> on 2017-05-18 00:45:00
>> [2017-05-30 14:06:06,247] {base_executor.py:36} INFO - Adding to queue:
>> airflow run <parent_dag_id.subdag_id> <task_id> 2017-05-18T00:45:00
>> --local
>> -sd DAGS_FOLDER/parent_dag_file.py
>> [2017-05-30 14:06:11,151] {sequential_executor.py:26} INFO - Executing
>> command: airflow run <parent_dag_id.subdag_id> <task_id>
>> 2017-05-18T00:45:00 --local -sd DAGS_FOLDER/parent_dag_file.py
>> [2017-05-30 14:06:12,119] {sequential_executor.py:33} ERROR - Failed to
>> execute task Command 'airflow run <parent_dag_id.subdag_id> <task_id>
>> 2017-05-18T00:45:00 --local -sd DAGS_FOLDER/parent_dag_file.py ' returned
>> non-zero exit status 1:
>> [2017-05-30 14:06:12,125] {jobs.py:924} ERROR - Task instance
>> ('<parent_dag_id.subdag_id>', '<task_id>', datetime.datetime(2017, 5, 18,
>> 0, 45)) failed
>> [2017-05-30 14:06:12,125] {jobs.py:997} INFO - [backfill progress] |
>> waiting: 2 | succeeded: 0 | kicked_off: 1 | failed: 1 | skipped: 0 |
>> deadlocked: 0
>> [2017-05-30 14:06:16,150] {jobs.py:997} INFO - [backfill progress] |
>> waiting: 0 | succeeded: 0 | kicked_off: 1 | failed: 1 | skipped: 0 |
>> deadlocked: 2
>> [2017-05-30 14:06:16,182] {models.py:1286} ERROR -
>> ---------------------------------------------------
>> Some task instances failed:
>> set([('<parent_dag_id.subdag_id>', '<task_id>', datetime.datetime(2017,
>> 5,
>> 18, 0, 45))])
>> ...
>>
>> Regarding the subdag's tasks, only the first task contains any logs, and
>> only this single line:
>> [2017-05-30 14:06:11,980] {models.py:154} INFO - Filling up the DagBag
>> from
>> <parent_dag_file>
>>
>> I suspected that the issue may be that Airflow is configured to use the
>> SequentialExecutor, but the airflow run command in the logs has the
>> --local
>> switch, which I think forces the LocalExecutor. I tried to change this to
>> SequentialExecutor by setting the SubdagOperator's executor field, but the
>> --local switch is still there.
>>
>> I'm using Airflow 1.7.1.3.
>>
>> Thanks,
>> Zsolt
>>
>
>

Re: Subdag failure

Posted by Maxime Beauchemin <ma...@gmail.com>.
From memory, I'm pretty sure `--local` simply means "run locally without
going through an executor" (as opposed to "run through LocalExecutor" that
you assume). For the record, there are 3 levels of `airflow run`, let's go
deeper:

1. `airflow run` which typically is only called through a user using the
CLI, and talks to the executor
2. `airflow run --local` which is executed locally, "on the other side" of
the executor, and wraps level 3 (explained bellow) while managing
heartbeats and looking for external kill signals. If you us the CGROUP or
impersonation feature, I believe this has to run with elevated priviledges.
Workers typically receive and execute these commands.
3. `airflow run --raw` which actually runs the operator's execute method,
impersonation and CGROUPs restrictions apply here

This is a bit confusing and it could help to make `--local` and `--raw`
more "private", as in for system/internal use only.

I also just noticed that the `--local` for `airflow run` is wrong in the
docs as it's re-using the backfill help text, I'll send a PR to fix the
docs.

Max

On Thu, Jun 1, 2017 at 1:31 AM, Zsolt Tóth <to...@gmail.com> wrote:

> Hi devs,
>
> I got an issue with airflow subdags, that I can't debug or trace.
> A specific subdag fails nondeterministically (usually it fails) when it's
> scheduled by Airflow, and runs fine when the task is re-executed manually.
>
> For the first subtask in the dag the logs can't be found by airflow, but
> that may be a different issue. If the subtask succeeds, Airflow finds the
> logs, too.
>
> *** Log file isn't local.…*** Fetching here:
> http://None:8793/log/<parent_dag_id.subdag_id>/<task_id>/
> 2017-05-28T00:45:00…***
> Failed to fetch log file from worker.
>
> If I look into the logs, I can't find any specific details. The
> SubdagOperator's logs:
>
> [2017-05-30 14:06:06,117] {models.py:1219} INFO - Executing
> <Task(SubDagOperator): <subdag_id>> on 2017-05-18 00:45:00
> [2017-05-30 14:06:06,247] {base_executor.py:36} INFO - Adding to queue:
> airflow run <parent_dag_id.subdag_id> <task_id> 2017-05-18T00:45:00 --local
> -sd DAGS_FOLDER/parent_dag_file.py
> [2017-05-30 14:06:11,151] {sequential_executor.py:26} INFO - Executing
> command: airflow run <parent_dag_id.subdag_id> <task_id>
> 2017-05-18T00:45:00 --local -sd DAGS_FOLDER/parent_dag_file.py
> [2017-05-30 14:06:12,119] {sequential_executor.py:33} ERROR - Failed to
> execute task Command 'airflow run <parent_dag_id.subdag_id> <task_id>
> 2017-05-18T00:45:00 --local -sd DAGS_FOLDER/parent_dag_file.py ' returned
> non-zero exit status 1:
> [2017-05-30 14:06:12,125] {jobs.py:924} ERROR - Task instance
> ('<parent_dag_id.subdag_id>', '<task_id>', datetime.datetime(2017, 5, 18,
> 0, 45)) failed
> [2017-05-30 14:06:12,125] {jobs.py:997} INFO - [backfill progress] |
> waiting: 2 | succeeded: 0 | kicked_off: 1 | failed: 1 | skipped: 0 |
> deadlocked: 0
> [2017-05-30 14:06:16,150] {jobs.py:997} INFO - [backfill progress] |
> waiting: 0 | succeeded: 0 | kicked_off: 1 | failed: 1 | skipped: 0 |
> deadlocked: 2
> [2017-05-30 14:06:16,182] {models.py:1286} ERROR -
> ---------------------------------------------------
> Some task instances failed:
> set([('<parent_dag_id.subdag_id>', '<task_id>', datetime.datetime(2017, 5,
> 18, 0, 45))])
> ...
>
> Regarding the subdag's tasks, only the first task contains any logs, and
> only this single line:
> [2017-05-30 14:06:11,980] {models.py:154} INFO - Filling up the DagBag from
> <parent_dag_file>
>
> I suspected that the issue may be that Airflow is configured to use the
> SequentialExecutor, but the airflow run command in the logs has the --local
> switch, which I think forces the LocalExecutor. I tried to change this to
> SequentialExecutor by setting the SubdagOperator's executor field, but the
> --local switch is still there.
>
> I'm using Airflow 1.7.1.3.
>
> Thanks,
> Zsolt
>