You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "jack (JIRA)" <ji...@apache.org> on 2019/08/04 11:30:00 UTC

[jira] [Commented] (AIRFLOW-1113) spark_submit_operator run again after an hour and log is not realtime

    [ https://issues.apache.org/jira/browse/AIRFLOW-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899617#comment-16899617 ] 

jack commented on AIRFLOW-1113:
-------------------------------

Can you please check this with newer version? 1.8.0 is very old.

> spark_submit_operator run again after an hour and log is not realtime
> ---------------------------------------------------------------------
>
>                 Key: AIRFLOW-1113
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1113
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: contrib
>    Affects Versions: 1.8.0
>            Reporter: chasuner
>            Priority: Major
>
> I have a dag that contains a spark_submit_operator task,  which is really wield and confused me is that the spark_submit task run again after an hour the spark task run, but actually the spark job has been successfully finished in about 47 min, but the subprocess of the spark run task is Sl state in the worker machine; but another dag that contains another spark_submit task that runs about 5 min works normally.
> the log of the abnormal spark task:
> [2017-04-15 07:01:06,048] {models.py:167} INFO - Filling up the DagBag from /data/airflow/dags/dag_etl.py
> [2017-04-15 07:01:06,742] {base_task_runner.py:112} INFO - Running: ['bash', '-c', u'airflow run dag_etl spark_etl 2017-04-14T07:00:00 --job_id 1239 --raw -sd DAGS_FOLDER/dag_etl.py']
> [2017-04-15 07:01:07,844] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:07,843] {__init__.py:57} INFO - Using executor CeleryExecutor
> [2017-04-15 07:01:08,504] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:08,504] {models.py:167} INFO - Filling up the DagBag from /data/airflow/dags/dag_etl.py
> [2017-04-15 07:01:09,423] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:09,423] {models.py:1126} INFO - Dependencies all met for <TaskInstance: dag_etl.spark_etl 2017-04-14 07:00:00 [queued]>
> [2017-04-15 07:01:09,440] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:09,440] {models.py:1126} INFO - Dependencies all met for <TaskInstance: dag_etl.spark_etl 2017-04-14 07:00:00 [queued]>
> [2017-04-15 07:01:09,441] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:09,440] {models.py:1318} INFO - 
> [2017-04-15 07:01:09,441] {base_task_runner.py:95} INFO - Subtask: --------------------------------------------------------------------------------
> [2017-04-15 07:01:09,441] {base_task_runner.py:95} INFO - Subtask: Starting attempt 1 of 2
> [2017-04-15 07:01:09,441] {base_task_runner.py:95} INFO - Subtask: --------------------------------------------------------------------------------
> [2017-04-15 07:01:09,441] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-15 07:01:09,453] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:09,453] {models.py:1342} INFO - Executing <Task(SparkSubmitOperator): spark_etl> on 2017-04-14 07:00:00
> [2017-04-15 07:01:09,501] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:09,500] {base_hook.py:67} INFO - Using connection to: yarn
> [2017-04-15 08:01:54,030] {models.py:167} INFO - Filling up the DagBag from /data/airflow/dags/dag_etl.py
> [2017-04-15 08:01:54,713] {base_task_runner.py:112} INFO - Running: ['bash', '-c', u'airflow run dag_etl spark_etl 2017-04-14T07:00:00 --job_id 1243 --raw -sd DAGS_FOLDER/dag_etl.py']
> [2017-04-15 08:01:56,131] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 08:01:56,130] {__init__.py:57} INFO - Using executor CeleryExecutor
> [2017-04-15 08:01:56,867] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 08:01:56,867] {models.py:167} INFO - Filling up the DagBag from /data/airflow/dags/dag_etl.py
> [2017-04-15 08:01:57,503] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 08:01:57,502] {models.py:1120} INFO - Dependencies not met for <TaskInstance: dag_etl.spark_etl 2017-04-14 07:00:00 [running]>, dependency 'Task Instance Not Already Running' FAILED: Task is already running, it started on 2017-04-15 07:01:09.
> [2017-04-15 08:01:57,512] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 08:01:57,512] {models.py:1120} INFO - Dependencies not met for <TaskInstance: dag_etl.spark_etl 2017-04-14 07:00:00 [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run.
> [2017-04-15 08:01:59,104] {jobs.py:2148} WARNING - Recorded pid 26249 is not a descendant of the current pid 9046



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)