You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Joe Schmid (JIRA)" <ji...@apache.org> on 2017/03/20 14:13:41 UTC

[jira] [Comment Edited] (AIRFLOW-1011) Task Instance Results not stored for SubDAG Tasks

    [ https://issues.apache.org/jira/browse/AIRFLOW-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932685#comment-15932685 ] 

Joe Schmid edited comment on AIRFLOW-1011 at 3/20/17 2:13 PM:
--------------------------------------------------------------

Vijay, what you are describing is NOT what I'm observing. I just confirmed by doing:

1. airflow resetdb
2. Confirm there are no rows in task_instances (SELECT * FROM task_instances)
3. Trigger Test_SubDAG
4. Confirm the DAG finished executing successfully
5. SELECT * FROM task_instances only returns 3 rows and no rows for any of the SubDAG task instances

||task_id|dag_id|execution_date|start_date|end_date|duration|state|try_number|hostname|unixname|job_id|pool|queue|priority_weight|operator|queued_dttm|pid||
|DAG_Task1|Test_SubDAG|2017-03-20 14:01:22.193546|2017-03-20 14:01:49.452930|2017-03-20 14:01:49.508669|0.055739|success|1|5afc1ff6f68b|root|3||default|3|DummyOperator|2017-03-20 14:01:39.841692|19188|
|DAG_Task2|Test_SubDAG|2017-03-20 14:01:22.193546|2017-03-20 14:03:49.826690|2017-03-20 14:03:49.968047|0.141357|success|1|5afc1ff6f68b|root|6||default|1|DummyOperator|2017-03-20 14:03:41.486849|19606|
|SubDagOp|Test_SubDAG|2017-03-20 14:01:22.193546|2017-03-20 14:02:48.214548|2017-03-20 14:02:48.399452|0.184904|success|1|5afc1ff6f68b|root|4||default|2|SubDagOperator|2017-03-20 14:02:40.413805|19392|


In fact, it looks like the backfill for the subdag is not actually running. Log of the SubDagOperator task instance show that it immediately logs "Backfill done. Exiting." right after if starts executing:

{code:none}
[2017-03-20 14:02:43,919] {models.py:167} INFO - Filling up the DagBag from /usr/local/airflow/dags/common/util/test_subdag.py
[2017-03-20 14:02:44,245] {base_task_runner.py:112} INFO - Running: ['bash', '-c', 'airflow run Test_SubDAG SubDagOp 2017-03-20T14:01:22.193546 --job_id 4 --raw -sd DAGS_FOLDER/common/util/test_subdag.py']
[2017-03-20 14:02:46,661] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:46,656] {__init__.py:57} INFO - Using executor CeleryExecutor
[2017-03-20 14:02:47,842] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:47,841] {models.py:167} INFO - Filling up the DagBag from /usr/local/airflow/dags/common/util/test_subdag.py
[2017-03-20 14:02:48,214] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,214] {models.py:1126} INFO - Dependencies all met for <TaskInstance: Test_SubDAG.SubDagOp 2017-03-20 14:01:22.193546 [queued]>
[2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,229] {models.py:1126} INFO - Dependencies all met for <TaskInstance: Test_SubDAG.SubDagOp 2017-03-20 14:01:22.193546 [queued]>
[2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,230] {models.py:1318} INFO - 
[2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: --------------------------------------------------------------------------------
[2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: Starting attempt 1 of 1
[2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: --------------------------------------------------------------------------------
[2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: 
[2017-03-20 14:02:48,265] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,265] {models.py:1342} INFO - Executing <Task(SubDagOperator): SubDagOp> on 2017-03-20 14:01:22.193546
[2017-03-20 14:02:48,371] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,370] {jobs.py:2023} INFO - Backfill done. Exiting.
[2017-03-20 14:02:48,433] {base_task_runner.py:95} INFO - Subtask: /usr/local/lib/python3.5/site-packages/airflow/ti_deps/deps/base_ti_dep.py:94: PendingDeprecationWarning: generator '_get_dep_statuses' raised StopIteration
[2017-03-20 14:02:48,433] {base_task_runner.py:95} INFO - Subtask:   for dep_status in self._get_dep_statuses(ti, session, dep_context):
[2017-03-20 14:02:49,275] {jobs.py:2083} INFO - Task exited with return code 0
{code}





was (Author: jschmid):
Vijay, what you are describing is NOT what I'm observing. I just confirmed by doing:

1. airflow resetdb
2. Confirm there are no rows in task_instances (SELECT * FROM task_instances)
3. Trigger Test_SubDAG
4. Confirm the DAG finished executing successfully
5. SELECT * FROM task_instances only returns 3 rows and no rows for any of the SubDAG task instances

```
task_id,dag_id,execution_date,start_date,end_date,duration,state,try_number,hostname,unixname,job_id,pool,queue,priority_weight,operator,queued_dttm,pid
DAG_Task1,Test_SubDAG,2017-03-20 14:01:22.193546,2017-03-20 14:01:49.452930,2017-03-20 14:01:49.508669,0.055739,success,1,5afc1ff6f68b,root,3,,default,3,DummyOperator,2017-03-20 14:01:39.841692,19188
DAG_Task2,Test_SubDAG,2017-03-20 14:01:22.193546,2017-03-20 14:03:49.826690,2017-03-20 14:03:49.968047,0.141357,success,1,5afc1ff6f68b,root,6,,default,1,DummyOperator,2017-03-20 14:03:41.486849,19606
SubDagOp,Test_SubDAG,2017-03-20 14:01:22.193546,2017-03-20 14:02:48.214548,2017-03-20 14:02:48.399452,0.184904,success,1,5afc1ff6f68b,root,4,,default,2,SubDagOperator,2017-03-20 14:02:40.413805,19392
```

In fact, it looks like the backfill for the subdag is not actually running. Log of the SubDagOperator task instance:

```
[2017-03-20 14:02:43,919] {models.py:167} INFO - Filling up the DagBag from /usr/local/airflow/dags/common/util/test_subdag.py
[2017-03-20 14:02:44,245] {base_task_runner.py:112} INFO - Running: ['bash', '-c', 'airflow run Test_SubDAG SubDagOp 2017-03-20T14:01:22.193546 --job_id 4 --raw -sd DAGS_FOLDER/common/util/test_subdag.py']
[2017-03-20 14:02:46,661] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:46,656] {__init__.py:57} INFO - Using executor CeleryExecutor
[2017-03-20 14:02:47,842] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:47,841] {models.py:167} INFO - Filling up the DagBag from /usr/local/airflow/dags/common/util/test_subdag.py
[2017-03-20 14:02:48,214] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,214] {models.py:1126} INFO - Dependencies all met for <TaskInstance: Test_SubDAG.SubDagOp 2017-03-20 14:01:22.193546 [queued]>
[2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,229] {models.py:1126} INFO - Dependencies all met for <TaskInstance: Test_SubDAG.SubDagOp 2017-03-20 14:01:22.193546 [queued]>
[2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,230] {models.py:1318} INFO - 
[2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: --------------------------------------------------------------------------------
[2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: Starting attempt 1 of 1
[2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: --------------------------------------------------------------------------------
[2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: 
[2017-03-20 14:02:48,265] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,265] {models.py:1342} INFO - Executing <Task(SubDagOperator): SubDagOp> on 2017-03-20 14:01:22.193546
[2017-03-20 14:02:48,371] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,370] {jobs.py:2023} INFO - Backfill done. Exiting.
[2017-03-20 14:02:48,433] {base_task_runner.py:95} INFO - Subtask: /usr/local/lib/python3.5/site-packages/airflow/ti_deps/deps/base_ti_dep.py:94: PendingDeprecationWarning: generator '_get_dep_statuses' raised StopIteration
[2017-03-20 14:02:48,433] {base_task_runner.py:95} INFO - Subtask:   for dep_status in self._get_dep_statuses(ti, session, dep_context):
[2017-03-20 14:02:49,275] {jobs.py:2083} INFO - Task exited with return code 0
```





> Task Instance Results not stored for SubDAG Tasks
> -------------------------------------------------
>
>                 Key: AIRFLOW-1011
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1011
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: backfill, subdag
>    Affects Versions: Airflow 1.8
>            Reporter: Joe Schmid
>            Priority: Critical
>         Attachments: 1-TopLevelDAGTaskInstancesShownCorrectly.png, 2-ZoomedSubDAG-NoTaskInstances-v1.8.png, 3-ZoomedSubDAG-TaskInstances-v1.7.1.3.png, test_subdag.py
>
>
> In previous Airflow versions, results for tasks executed as a subdag were written as rows to task_instances. In Airflow 1.8 only rows for tasks inside the top-level DAG (non-subdag tasks) seem to get written to the database.
> This results in being unable to check the status of task instances inside the subdag from the UI, check the logs for those task instances from the UI, etc.
> Attached is a simple test DAG that exhibits the issue along with screenshots showing the UI differences between v1.8 and v1.7.1.3.
> Note that if the DAG is run via backfill from command line (e.g. "airflow backfill Test_SubDAG -s 2017-03-18 -e 2017-03-18") the task instances show up successfully.
> Also, we're using CeleryExecutor and not specifying a different executor for our subdags.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)