You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "kasim (JIRA)" <ji...@apache.org> on 2019/06/18 08:10:00 UTC

[jira] [Created] (AIRFLOW-4806) Raise `Cannot execute error` even if that dag successed

kasim created AIRFLOW-4806:
------------------------------

             Summary: Raise `Cannot execute error` even if that dag successed
                 Key: AIRFLOW-4806
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4806
             Project: Apache Airflow
          Issue Type: Bug
          Components: DagRun
    Affects Versions: 1.10.3
            Reporter: kasim


 

log : 

 

 
{code:java}
*** Log file does not exist: /opt/airflow/logs/test_wordcount/test_wordcount/2019-05-31T16:00:00+00:00/7.log
*** Fetching from: http://dc07:8793/log/test_wordcount/test_wordcount/2019-05-31T16:00:00+00:00/7.log

[2019-06-18 15:58:33,562] {__init__.py:1139} INFO - Dependencies all met for <TaskInstance: test_wordcount.test_wordcount 2019-05-31T16:00:00+00:00 [queued]>
[2019-06-18 15:58:33,585] {__init__.py:1139} INFO - Dependencies all met for <TaskInstance: test_wordcount.test_wordcount 2019-05-31T16:00:00+00:00 [queued]>
[2019-06-18 15:58:33,585] {__init__.py:1353} INFO - 
--------------------------------------------------------------------------------
[2019-06-18 15:58:33,585] {__init__.py:1354} INFO - Starting attempt 7 of 7
[2019-06-18 15:58:33,585] {__init__.py:1355} INFO - 
--------------------------------------------------------------------------------
[2019-06-18 15:58:33,594] {__init__.py:1374} INFO - Executing <Task(SparkSubmitOperator): test_wordcount> on 2019-05-31T16:00:00+00:00
[2019-06-18 15:58:33,594] {base_task_runner.py:119} INFO - Running: ['airflow', 'run', 'test_wordcount', 'test_wordcount', '2019-05-31T16:00:00+00:00', '--job_id', '413', '--raw', '-sd', 'DAGS_FOLDER/wordcount.py', '--cfg_path', '/tmp/tmpkqb2n943']
[2019-06-18 15:58:34,094] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount /opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/configuration.py:590: DeprecationWarning: You have two airflow.cfg files: /opt/airflow/airflow/airflow.cfg and /opt/airflow/airflow.cfg. Airflow used to look at ~/airflow/airflow.cfg, even when AIRFLOW_HOME was set to a different value. Airflow will now only read /opt/airflow/airflow.cfg, and you should remove the other file
[2019-06-18 15:58:34,095] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   category=DeprecationWarning,
[2019-06-18 15:58:34,191] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount [2019-06-18 15:58:34,191] {settings.py:182} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, pid=30605
[2019-06-18 15:58:34,429] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount [2019-06-18 15:58:34,429] {default_celery.py:90} WARNING - You have configured a result_backend of redis://192.168.20.17/1, it is highly recommended to use an alternative result_backend (i.e. a database).
[2019-06-18 15:58:34,430] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount [2019-06-18 15:58:34,430] {__init__.py:51} INFO - Using executor CeleryExecutor
[2019-06-18 15:58:34,704] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount [2019-06-18 15:58:34,704] {__init__.py:305} INFO - Filling up the DagBag from /opt/airflow/dags/wordcount.py
[2019-06-18 15:58:34,754] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount [2019-06-18 15:58:34,754] {cli.py:517} INFO - Running <TaskInstance: test_wordcount.test_wordcount 2019-05-31T16:00:00+00:00 [running]> on host dc07
[2019-06-18 15:58:34,875] {logging_mixin.py:95} INFO - [2019-06-18 15:58:34,875] {base_hook.py:83} INFO - Using connection to: id: spark_default. Host: yarn, Port: None, Schema: None, Login: None, Password: None, extra: {'master': 'yarn', 'deploy-mode': 'cluster', 'queue': 'data', 'env_vars': {'HADOOP_USER_NAME': 'hdfs'}, 'spark_home': '/opt/cloudera/parcels/CDH/lib/spark/'}
[2019-06-18 15:58:34,876] {logging_mixin.py:95} INFO - [2019-06-18 15:58:34,876] {spark_submit_hook.py:295} INFO - Spark-Submit cmd: ['spark-submit', '--master', 'yarn', '--py-files', '/opt/airflow/dags/jobs.zip,/opt/airflow/dags/libs.zip', '--num-executors', '1', '--executor-cores', '1', '--executor-memory', '1g', '--driver-memory', '1g', '--name', 'test_wordcount', '--queue', 'data', '--deploy-mode', 'cluster', '/opt/airflow/dags/main.py', '--job', 'wordcount', '--job-args', 'input_path=/test/words.txt', 'output_path=/test/wordcount.csv']
[2019-06-18 15:58:41,557] {logging_mixin.py:95} INFO - [2019-06-18 15:58:41,557] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:44,791] {logging_mixin.py:95} INFO - [2019-06-18 15:58:44,790] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:44,929] {logging_mixin.py:95} INFO - [2019-06-18 15:58:44,929] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:45,061] {logging_mixin.py:95} INFO - [2019-06-18 15:58:45,061] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:45,186] {logging_mixin.py:95} INFO - [2019-06-18 15:58:45,186] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:45,320] {logging_mixin.py:95} INFO - [2019-06-18 15:58:45,320] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:45,602] {logging_mixin.py:95} INFO - [2019-06-18 15:58:45,601] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:45,970] {logging_mixin.py:95} INFO - [2019-06-18 15:58:45,970] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:46,217] {logging_mixin.py:95} INFO - [2019-06-18 15:58:46,217] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:47,222] {logging_mixin.py:95} INFO - [2019-06-18 15:58:47,221] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:47,228] {logging_mixin.py:95} INFO - [2019-06-18 15:58:47,228] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:48,231] {logging_mixin.py:95} INFO - [2019-06-18 15:58:48,231] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:49,233] {logging_mixin.py:95} INFO - [2019-06-18 15:58:49,233] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:50,236] {logging_mixin.py:95} INFO - [2019-06-18 15:58:50,236] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:51,239] {logging_mixin.py:95} INFO - [2019-06-18 15:58:51,238] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:52,241] {logging_mixin.py:95} INFO - [2019-06-18 15:58:52,241] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:53,244] {logging_mixin.py:95} INFO - [2019-06-18 15:58:53,243] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:54,246] {logging_mixin.py:95} INFO - [2019-06-18 15:58:54,246] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:54,247] {logging_mixin.py:95} INFO - [2019-06-18 15:58:54,247] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:55,250] {logging_mixin.py:95} INFO - [2019-06-18 15:58:55,249] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:56,252] {logging_mixin.py:95} INFO - [2019-06-18 15:58:56,252] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:57,254] {logging_mixin.py:95} INFO - [2019-06-18 15:58:57,254] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:58,257] {logging_mixin.py:95} INFO - [2019-06-18 15:58:58,257] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:58:59,259] {logging_mixin.py:95} INFO - [2019-06-18 15:58:59,259] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:00,265] {logging_mixin.py:95} INFO - [2019-06-18 15:59:00,265] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:01,268] {logging_mixin.py:95} INFO - [2019-06-18 15:59:01,268] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:02,271] {logging_mixin.py:95} INFO - [2019-06-18 15:59:02,270] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:03,273] {logging_mixin.py:95} INFO - [2019-06-18 15:59:03,273] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:04,276] {logging_mixin.py:95} INFO - [2019-06-18 15:59:04,275] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:05,278] {logging_mixin.py:95} INFO - [2019-06-18 15:59:05,278] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:06,281] {logging_mixin.py:95} INFO - [2019-06-18 15:59:06,281] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:07,283] {logging_mixin.py:95} INFO - [2019-06-18 15:59:07,283] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:07,284] {logging_mixin.py:95} INFO - [2019-06-18 15:59:07,283] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:08,286] {logging_mixin.py:95} INFO - [2019-06-18 15:59:08,286] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:09,288] {logging_mixin.py:95} INFO - [2019-06-18 15:59:09,288] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:10,290] {logging_mixin.py:95} INFO - [2019-06-18 15:59:10,290] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:11,293] {logging_mixin.py:95} INFO - [2019-06-18 15:59:11,292] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:12,295] {logging_mixin.py:95} INFO - [2019-06-18 15:59:12,295] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:13,297] {logging_mixin.py:95} INFO - [2019-06-18 15:59:13,297] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:14,299] {logging_mixin.py:95} INFO - [2019-06-18 15:59:14,299] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:15,301] {logging_mixin.py:95} INFO - [2019-06-18 15:59:15,301] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:15,301] {logging_mixin.py:95} INFO - [2019-06-18 15:59:15,301] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:16,304] {logging_mixin.py:95} INFO - [2019-06-18 15:59:16,304] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:17,306] {logging_mixin.py:95} INFO - [2019-06-18 15:59:17,306] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:18,309] {logging_mixin.py:95} INFO - [2019-06-18 15:59:18,308] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:19,311] {logging_mixin.py:95} INFO - [2019-06-18 15:59:19,311] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:20,313] {logging_mixin.py:95} INFO - [2019-06-18 15:59:20,313] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:21,315] {logging_mixin.py:95} INFO - [2019-06-18 15:59:21,315] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:22,318] {logging_mixin.py:95} INFO - [2019-06-18 15:59:22,318] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:23,320] {logging_mixin.py:95} INFO - [2019-06-18 15:59:23,320] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:24,323] {logging_mixin.py:95} INFO - [2019-06-18 15:59:24,322] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:25,326] {logging_mixin.py:95} INFO - [2019-06-18 15:59:25,325] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:25,326] {logging_mixin.py:95} INFO - [2019-06-18 15:59:25,326] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
[2019-06-18 15:59:25,342] {logging_mixin.py:95} INFO - [2019-06-18 15:59:25,342] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
############# note #############
this dag was already successed at this point.
################################
[2019-06-18 15:59:25,430] {__init__.py:1580} ERROR - Cannot execute: ['spark-submit', '--master', 'yarn', '--py-files', '/opt/airflow/dags/jobs.zip,/opt/airflow/dags/libs.zip', '--num-executors', '1', '--executor-cores', '1', '--executor-memory', '1g', '--driver-memory', '1g', '--name', 'test_wordcount', '--queue', 'data', '--deploy-mode', 'cluster', '/opt/airflow/dags/main.py', '--job', 'wordcount', '--job-args', 'input_path=/test/words.txt', 'output_path=/test/wordcount.csv']. Error code is: 1.
Traceback (most recent call last):
  File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/models/__init__.py", line 1441, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py", line 176, in execute
    self._hook.submit(self._application)
  File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 352, in submit
    spark_submit_cmd, returncode
airflow.exceptions.AirflowException: Cannot execute: ['spark-submit', '--master', 'yarn', '--py-files', '/opt/airflow/dags/jobs.zip,/opt/airflow/dags/libs.zip', '--num-executors', '1', '--executor-cores', '1', '--executor-memory', '1g', '--driver-memory', '1g', '--name', 'test_wordcount', '--queue', 'data', '--deploy-mode', 'cluster', '/opt/airflow/dags/main.py', '--job', 'wordcount', '--job-args', 'input_path=/test/words.txt', 'output_path=/test/wordcount.csv']. Error code is: 1.
[2019-06-18 15:59:25,433] {__init__.py:1611} INFO - Marking task as FAILED.
[2019-06-18 15:59:25,465] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount Traceback (most recent call last):
[2019-06-18 15:59:25,465] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/bin/airflow", line 32, in <module>
[2019-06-18 15:59:25,465] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     args.func(args)
[2019-06-18 15:59:25,465] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/utils/cli.py", line 74, in wrapper
[2019-06-18 15:59:25,465] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     return f(*args, **kwargs)
[2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/bin/cli.py", line 523, in run
[2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     _run(args, dag, ti)
[2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/bin/cli.py", line 442, in _run
[2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     pool=args.pool,
[2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/utils/db.py", line 73, in wrapper
[2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     return func(*args, **kwargs)
[2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/models/__init__.py", line 1441, in _run_raw_task
[2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     result = task_copy.execute(context=context)
[2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py", line 176, in execute
[2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     self._hook.submit(self._application)
[2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 352, in submit
[2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     spark_submit_cmd, returncode
[2019-06-18 15:59:25,467] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount airflow.exceptions.AirflowException: Cannot execute: ['spark-submit', '--master', 'yarn', '--py-files', '/opt/airflow/dags/jobs.zip,/opt/airflow/dags/libs.zip', '--num-executors', '1', '--executor-cores', '1', '--executor-memory', '1g', '--driver-memory', '1g', '--name', 'test_wordcount', '--queue', 'data', '--deploy-mode', 'cluster', '/opt/airflow/dags/main.py', '--job', 'wordcount', '--job-args', 'input_path=/test/words.txt', 'output_path=/test/wordcount.csv']. Error code is: 1.
[2019-06-18 15:59:28,811] {logging_mixin.py:95} INFO - [2019-06-18 15:59:28,809] {jobs.py:2562} INFO - Task exited with return code 1

{code}
 

Directly `spark-submit` successed too
{code:java}
$ sudo -u hdfs spark-submit --master yarn --py-files /opt/airflow/dags/jobs.zip,/opt/airflow/dags/libs.zip --num-executors 1 --executor-cores 1 --executor-memory 1g --driver-memory 1g --name test_wordcount --queue data --deploy-mode cluster /opt/airflow/dags/main.py --job wordcount --job-args input_path=/test/words.txt output_path=/test/wordcount.csv


2019-06-18 15:57:07 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-06-18 15:57:08 INFO ConfiguredRMFailoverProxyProvider:100 - Failing over to rm59
2019-06-18 15:57:08 INFO Client:57 - Requesting a new application from cluster with 5 NodeManagers
2019-06-18 15:57:08 INFO Configuration:2662 - resource-types.xml not found
2019-06-18 15:57:08 INFO ResourceUtils:419 - Unable to find 'resource-types.xml'.
2019-06-18 15:57:08 INFO Client:57 - Verifying our application has not requested more than the maximum memory capability of the cluster (28672 MB per container)
2019-06-18 15:57:08 INFO Client:57 - Will allocate AM container, with 1408 MB memory including 384 MB overhead
2019-06-18 15:57:08 INFO Client:57 - Setting up container launch context for our AM
2019-06-18 15:57:08 INFO Client:57 - Setting up the launch environment for our AM container
2019-06-18 15:57:08 INFO Client:57 - Preparing resources for our AM container
2019-06-18 15:57:08 WARN Client:69 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2019-06-18 15:57:11 INFO Client:57 - Uploading resource file:/tmp/spark-e2587f77-0b82-458b-8dcd-2fc24cb4e049/__spark_libs__2238696212892957522.zip -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/__spark_libs__2238696212892957522.zip
2019-06-18 15:57:14 INFO Client:57 - Uploading resource file:/opt/airflow/dags/main.py -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/main.py
2019-06-18 15:57:15 INFO Client:57 - Uploading resource file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark/python/lib/pyspark.zip -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/pyspark.zip
2019-06-18 15:57:15 INFO Client:57 - Uploading resource file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/py4j-0.10.7-src.zip
2019-06-18 15:57:15 INFO Client:57 - Uploading resource file:/opt/airflow/dags/jobs.zip -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/jobs.zip
2019-06-18 15:57:15 INFO Client:57 - Uploading resource file:/opt/airflow/dags/libs.zip -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/libs.zip
2019-06-18 15:57:15 INFO Client:57 - Uploading resource file:/tmp/spark-e2587f77-0b82-458b-8dcd-2fc24cb4e049/__spark_conf__8942251889350261257.zip -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/__spark_conf__.zip
2019-06-18 15:57:15 INFO SecurityManager:57 - Changing view acls to: hdfs
2019-06-18 15:57:15 INFO SecurityManager:57 - Changing modify acls to: hdfs
2019-06-18 15:57:15 INFO SecurityManager:57 - Changing view acls groups to:
2019-06-18 15:57:15 INFO SecurityManager:57 - Changing modify acls groups to:
2019-06-18 15:57:15 INFO SecurityManager:57 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); groups with view permissions: Set(); users with modify permissions: Set(hdfs); groups with modify permissions: Set()
2019-06-18 15:57:15 INFO HiveConf:188 - Found configuration file null
2019-06-18 15:57:16 INFO Client:57 - Submitting application application_1560762064551_0031 to ResourceManager
2019-06-18 15:57:16 INFO YarnClientImpl:310 - Submitted application application_1560762064551_0031
2019-06-18 15:57:17 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
2019-06-18 15:57:17 INFO Client:57 -
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.data
start time: 1560844635943
final status: UNDEFINED
tracking URL: http://dc06:8088/proxy/application_1560762064551_0031/
user: hdfs
2019-06-18 15:57:18 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
2019-06-18 15:57:19 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
2019-06-18 15:57:20 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
2019-06-18 15:57:21 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
2019-06-18 15:57:22 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
2019-06-18 15:57:23 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
2019-06-18 15:57:24 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:24 INFO Client:57 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: dc08
ApplicationMaster RPC port: 34069
queue: root.data
start time: 1560844635943
final status: UNDEFINED
tracking URL: http://dc06:8088/proxy/application_1560762064551_0031/
user: hdfs
2019-06-18 15:57:25 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:26 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:27 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:28 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:29 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:30 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:31 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:32 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:33 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:34 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:35 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:36 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:37 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:38 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
2019-06-18 15:57:39 INFO Client:57 - Application report for application_1560762064551_0031 (state: FINISHED)
2019-06-18 15:57:39 INFO Client:57 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: dc08
ApplicationMaster RPC port: 34069
queue: root.data
start time: 1560844635943
final status: SUCCEEDED
tracking URL: http://dc06:8088/proxy/application_1560762064551_0031/
2019-06-18 15:57:39 INFO Client:57 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: dc08
ApplicationMaster RPC port: 34069
queue: root.data
start time: 1560844635943
final status: SUCCEEDED
tracking URL: http://dc06:8088/proxy/application_1560762064551_0031/
user: hdfs
2019-06-18 15:57:39 INFO ShutdownHookManager:57 - Shutdown hook called
2019-06-18 15:57:39 INFO ShutdownHookManager:57 - Deleting directory /tmp/spark-be964647-9652-4947-9729-c4e9aba1aa15
2019-06-18 15:57:39 INFO ShutdownHookManager:57 - Deleting directory /tmp/spark-e2587f77-0b82-458b-8dcd-2fc24cb4e049


{code}
my dag : 
{code:java}
from os.path import dirname, abspath, join
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator
from airflow.models import Variable
from datetime import datetime, timedelta



CURRENT_DIR = dirname(abspath(__file__))


default_args = {
'owner': 'mithril',
'depends_on_past': False,
'start_date': datetime(2019, 6, 1),
'email': ['mithril'],
'email_on_failure': False,
'email_on_retry': False,
# 'retries': 1,
# 'retry_delay': timedelta(minutes=5),
}



dag_name = 'test_wordcount'
dag = DAG(dag_name, default_args=default_args, schedule_interval='@once')


t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)


t2 = BashOperator(
task_id='print_path_env',
bash_command='echo $PATH',
dag=dag)



TEST_CONFIG = {
'num_executors': 1,
'executor_cores': 1,
'executor_memory': '1g',
'driver_memory' : '1g',
}



input_path = '/test/words.txt'
output_path = '/test/wordcount.csv'


def ensure_abspath(pathstr):
return ','.join(map(lambda x: join(CURRENT_DIR,x), pathstr.split(',')) )


t3 = SparkSubmitOperator(
task_id=dag_name,
conn_id='spark_default',
name=dag_name,
dag=dag,
py_files=ensure_abspath('jobs.zip,libs.zip'),
application=ensure_abspath('main.py'),
application_args= [
'--job', 'wordcount',
'--job-args', f'input_path={input_path}', f'output_path={output_path}',
],
**TEST_CONFIG
)



t1 >> t2 >> t3

{code}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)