You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Ash Berlin-Taylor (JIRA)" <ji...@apache.org> on 2019/06/18 09:20:00 UTC

[jira] [Commented] (AIRFLOW-4806) Raise `Cannot execute error` even if that dag successed

    [ https://issues.apache.org/jira/browse/AIRFLOW-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866384#comment-16866384 ] 

Ash Berlin-Taylor commented on AIRFLOW-4806:
--------------------------------------------

The error from airflow is saying that the {{spark-submit}} command returned an error. If you run this what does it show:

{code}
spark-submit --master yarn --py-files /opt/airflow/dags/jobs.zip,/opt/airflow/dags/libs.zip --num-executors 1 --executor-cores 1 --executor-memory 1g --driver-memory 1g --name test_wordcount --queue data --deploy-mode cluster /opt/airflow/dags/main.py --job wordcount --job-args input_path=/test/words.txt output_path=/test/wordcount.csv
; echo "Last command exit code: $?"
{code}

> Raise `Cannot execute error` even if that dag successed
> -------------------------------------------------------
>
>                 Key: AIRFLOW-4806
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4806
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DagRun
>    Affects Versions: 1.10.3
>            Reporter: kasim
>            Priority: Major
>
>  
> airflow log : 
>  
> {code:java}
> *** Log file does not exist: /opt/airflow/logs/test_wordcount/test_wordcount/2019-05-31T16:00:00+00:00/7.log
> *** Fetching from: http://dc07:8793/log/test_wordcount/test_wordcount/2019-05-31T16:00:00+00:00/7.log
> [2019-06-18 15:58:33,562] {__init__.py:1139} INFO - Dependencies all met for <TaskInstance: test_wordcount.test_wordcount 2019-05-31T16:00:00+00:00 [queued]>
> [2019-06-18 15:58:33,585] {__init__.py:1139} INFO - Dependencies all met for <TaskInstance: test_wordcount.test_wordcount 2019-05-31T16:00:00+00:00 [queued]>
> [2019-06-18 15:58:33,585] {__init__.py:1353} INFO - 
> --------------------------------------------------------------------------------
> [2019-06-18 15:58:33,585] {__init__.py:1354} INFO - Starting attempt 7 of 7
> [2019-06-18 15:58:33,585] {__init__.py:1355} INFO - 
> --------------------------------------------------------------------------------
> [2019-06-18 15:58:33,594] {__init__.py:1374} INFO - Executing <Task(SparkSubmitOperator): test_wordcount> on 2019-05-31T16:00:00+00:00
> [2019-06-18 15:58:33,594] {base_task_runner.py:119} INFO - Running: ['airflow', 'run', 'test_wordcount', 'test_wordcount', '2019-05-31T16:00:00+00:00', '--job_id', '413', '--raw', '-sd', 'DAGS_FOLDER/wordcount.py', '--cfg_path', '/tmp/tmpkqb2n943']
> [2019-06-18 15:58:34,094] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount /opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/configuration.py:590: DeprecationWarning: You have two airflow.cfg files: /opt/airflow/airflow/airflow.cfg and /opt/airflow/airflow.cfg. Airflow used to look at ~/airflow/airflow.cfg, even when AIRFLOW_HOME was set to a different value. Airflow will now only read /opt/airflow/airflow.cfg, and you should remove the other file
> [2019-06-18 15:58:34,095] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   category=DeprecationWarning,
> [2019-06-18 15:58:34,191] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount [2019-06-18 15:58:34,191] {settings.py:182} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, pid=30605
> [2019-06-18 15:58:34,429] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount [2019-06-18 15:58:34,429] {default_celery.py:90} WARNING - You have configured a result_backend of redis://192.168.20.17/1, it is highly recommended to use an alternative result_backend (i.e. a database).
> [2019-06-18 15:58:34,430] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount [2019-06-18 15:58:34,430] {__init__.py:51} INFO - Using executor CeleryExecutor
> [2019-06-18 15:58:34,704] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount [2019-06-18 15:58:34,704] {__init__.py:305} INFO - Filling up the DagBag from /opt/airflow/dags/wordcount.py
> [2019-06-18 15:58:34,754] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount [2019-06-18 15:58:34,754] {cli.py:517} INFO - Running <TaskInstance: test_wordcount.test_wordcount 2019-05-31T16:00:00+00:00 [running]> on host dc07
> [2019-06-18 15:58:34,875] {logging_mixin.py:95} INFO - [2019-06-18 15:58:34,875] {base_hook.py:83} INFO - Using connection to: id: spark_default. Host: yarn, Port: None, Schema: None, Login: None, Password: None, extra: {'master': 'yarn', 'deploy-mode': 'cluster', 'queue': 'data', 'env_vars': {'HADOOP_USER_NAME': 'hdfs'}, 'spark_home': '/opt/cloudera/parcels/CDH/lib/spark/'}
> [2019-06-18 15:58:34,876] {logging_mixin.py:95} INFO - [2019-06-18 15:58:34,876] {spark_submit_hook.py:295} INFO - Spark-Submit cmd: ['spark-submit', '--master', 'yarn', '--py-files', '/opt/airflow/dags/jobs.zip,/opt/airflow/dags/libs.zip', '--num-executors', '1', '--executor-cores', '1', '--executor-memory', '1g', '--driver-memory', '1g', '--name', 'test_wordcount', '--queue', 'data', '--deploy-mode', 'cluster', '/opt/airflow/dags/main.py', '--job', 'wordcount', '--job-args', 'input_path=/test/words.txt', 'output_path=/test/wordcount.csv']
> [2019-06-18 15:58:41,557] {logging_mixin.py:95} INFO - [2019-06-18 15:58:41,557] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:44,791] {logging_mixin.py:95} INFO - [2019-06-18 15:58:44,790] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:44,929] {logging_mixin.py:95} INFO - [2019-06-18 15:58:44,929] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:45,061] {logging_mixin.py:95} INFO - [2019-06-18 15:58:45,061] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:45,186] {logging_mixin.py:95} INFO - [2019-06-18 15:58:45,186] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:45,320] {logging_mixin.py:95} INFO - [2019-06-18 15:58:45,320] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:45,602] {logging_mixin.py:95} INFO - [2019-06-18 15:58:45,601] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:45,970] {logging_mixin.py:95} INFO - [2019-06-18 15:58:45,970] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:46,217] {logging_mixin.py:95} INFO - [2019-06-18 15:58:46,217] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:47,222] {logging_mixin.py:95} INFO - [2019-06-18 15:58:47,221] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:47,228] {logging_mixin.py:95} INFO - [2019-06-18 15:58:47,228] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:48,231] {logging_mixin.py:95} INFO - [2019-06-18 15:58:48,231] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:49,233] {logging_mixin.py:95} INFO - [2019-06-18 15:58:49,233] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:50,236] {logging_mixin.py:95} INFO - [2019-06-18 15:58:50,236] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:51,239] {logging_mixin.py:95} INFO - [2019-06-18 15:58:51,238] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:52,241] {logging_mixin.py:95} INFO - [2019-06-18 15:58:52,241] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:53,244] {logging_mixin.py:95} INFO - [2019-06-18 15:58:53,243] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:54,246] {logging_mixin.py:95} INFO - [2019-06-18 15:58:54,246] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:54,247] {logging_mixin.py:95} INFO - [2019-06-18 15:58:54,247] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:55,250] {logging_mixin.py:95} INFO - [2019-06-18 15:58:55,249] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:56,252] {logging_mixin.py:95} INFO - [2019-06-18 15:58:56,252] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:57,254] {logging_mixin.py:95} INFO - [2019-06-18 15:58:57,254] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:58,257] {logging_mixin.py:95} INFO - [2019-06-18 15:58:58,257] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:58:59,259] {logging_mixin.py:95} INFO - [2019-06-18 15:58:59,259] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:00,265] {logging_mixin.py:95} INFO - [2019-06-18 15:59:00,265] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:01,268] {logging_mixin.py:95} INFO - [2019-06-18 15:59:01,268] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:02,271] {logging_mixin.py:95} INFO - [2019-06-18 15:59:02,270] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:03,273] {logging_mixin.py:95} INFO - [2019-06-18 15:59:03,273] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:04,276] {logging_mixin.py:95} INFO - [2019-06-18 15:59:04,275] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:05,278] {logging_mixin.py:95} INFO - [2019-06-18 15:59:05,278] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:06,281] {logging_mixin.py:95} INFO - [2019-06-18 15:59:06,281] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:07,283] {logging_mixin.py:95} INFO - [2019-06-18 15:59:07,283] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:07,284] {logging_mixin.py:95} INFO - [2019-06-18 15:59:07,283] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:08,286] {logging_mixin.py:95} INFO - [2019-06-18 15:59:08,286] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:09,288] {logging_mixin.py:95} INFO - [2019-06-18 15:59:09,288] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:10,290] {logging_mixin.py:95} INFO - [2019-06-18 15:59:10,290] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:11,293] {logging_mixin.py:95} INFO - [2019-06-18 15:59:11,292] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:12,295] {logging_mixin.py:95} INFO - [2019-06-18 15:59:12,295] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:13,297] {logging_mixin.py:95} INFO - [2019-06-18 15:59:13,297] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:14,299] {logging_mixin.py:95} INFO - [2019-06-18 15:59:14,299] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:15,301] {logging_mixin.py:95} INFO - [2019-06-18 15:59:15,301] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:15,301] {logging_mixin.py:95} INFO - [2019-06-18 15:59:15,301] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:16,304] {logging_mixin.py:95} INFO - [2019-06-18 15:59:16,304] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:17,306] {logging_mixin.py:95} INFO - [2019-06-18 15:59:17,306] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:18,309] {logging_mixin.py:95} INFO - [2019-06-18 15:59:18,308] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:19,311] {logging_mixin.py:95} INFO - [2019-06-18 15:59:19,311] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:20,313] {logging_mixin.py:95} INFO - [2019-06-18 15:59:20,313] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:21,315] {logging_mixin.py:95} INFO - [2019-06-18 15:59:21,315] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:22,318] {logging_mixin.py:95} INFO - [2019-06-18 15:59:22,318] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:23,320] {logging_mixin.py:95} INFO - [2019-06-18 15:59:23,320] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:24,323] {logging_mixin.py:95} INFO - [2019-06-18 15:59:24,322] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:25,326] {logging_mixin.py:95} INFO - [2019-06-18 15:59:25,325] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:25,326] {logging_mixin.py:95} INFO - [2019-06-18 15:59:25,326] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> [2019-06-18 15:59:25,342] {logging_mixin.py:95} INFO - [2019-06-18 15:59:25,342] {spark_submit_hook.py:400} INFO - Identified spark driver id: application_1560762064551_0032
> ############# note #############
> this dag was already successed at this point. Because I saw the output files be created
> ################################
> [2019-06-18 15:59:25,430] {__init__.py:1580} ERROR - Cannot execute: ['spark-submit', '--master', 'yarn', '--py-files', '/opt/airflow/dags/jobs.zip,/opt/airflow/dags/libs.zip', '--num-executors', '1', '--executor-cores', '1', '--executor-memory', '1g', '--driver-memory', '1g', '--name', 'test_wordcount', '--queue', 'data', '--deploy-mode', 'cluster', '/opt/airflow/dags/main.py', '--job', 'wordcount', '--job-args', 'input_path=/test/words.txt', 'output_path=/test/wordcount.csv']. Error code is: 1.
> Traceback (most recent call last):
>   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/models/__init__.py", line 1441, in _run_raw_task
>     result = task_copy.execute(context=context)
>   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py", line 176, in execute
>     self._hook.submit(self._application)
>   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 352, in submit
>     spark_submit_cmd, returncode
> airflow.exceptions.AirflowException: Cannot execute: ['spark-submit', '--master', 'yarn', '--py-files', '/opt/airflow/dags/jobs.zip,/opt/airflow/dags/libs.zip', '--num-executors', '1', '--executor-cores', '1', '--executor-memory', '1g', '--driver-memory', '1g', '--name', 'test_wordcount', '--queue', 'data', '--deploy-mode', 'cluster', '/opt/airflow/dags/main.py', '--job', 'wordcount', '--job-args', 'input_path=/test/words.txt', 'output_path=/test/wordcount.csv']. Error code is: 1.
> [2019-06-18 15:59:25,433] {__init__.py:1611} INFO - Marking task as FAILED.
> [2019-06-18 15:59:25,465] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount Traceback (most recent call last):
> [2019-06-18 15:59:25,465] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/bin/airflow", line 32, in <module>
> [2019-06-18 15:59:25,465] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     args.func(args)
> [2019-06-18 15:59:25,465] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/utils/cli.py", line 74, in wrapper
> [2019-06-18 15:59:25,465] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     return f(*args, **kwargs)
> [2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/bin/cli.py", line 523, in run
> [2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     _run(args, dag, ti)
> [2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/bin/cli.py", line 442, in _run
> [2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     pool=args.pool,
> [2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/utils/db.py", line 73, in wrapper
> [2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     return func(*args, **kwargs)
> [2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/models/__init__.py", line 1441, in _run_raw_task
> [2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     result = task_copy.execute(context=context)
> [2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py", line 176, in execute
> [2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     self._hook.submit(self._application)
> [2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount   File "/opt/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 352, in submit
> [2019-06-18 15:59:25,466] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount     spark_submit_cmd, returncode
> [2019-06-18 15:59:25,467] {base_task_runner.py:101} INFO - Job 413: Subtask test_wordcount airflow.exceptions.AirflowException: Cannot execute: ['spark-submit', '--master', 'yarn', '--py-files', '/opt/airflow/dags/jobs.zip,/opt/airflow/dags/libs.zip', '--num-executors', '1', '--executor-cores', '1', '--executor-memory', '1g', '--driver-memory', '1g', '--name', 'test_wordcount', '--queue', 'data', '--deploy-mode', 'cluster', '/opt/airflow/dags/main.py', '--job', 'wordcount', '--job-args', 'input_path=/test/words.txt', 'output_path=/test/wordcount.csv']. Error code is: 1.
> [2019-06-18 15:59:28,811] {logging_mixin.py:95} INFO - [2019-06-18 15:59:28,809] {jobs.py:2562} INFO - Task exited with return code 1
> {code}
>  
>  It actually finished , why  `Cannot execute error`  raised ?
>  
> Directly `spark-submit` successed without any error 
> {code:java}
> $ sudo -u hdfs spark-submit --master yarn --py-files /opt/airflow/dags/jobs.zip,/opt/airflow/dags/libs.zip --num-executors 1 --executor-cores 1 --executor-memory 1g --driver-memory 1g --name test_wordcount --queue data --deploy-mode cluster /opt/airflow/dags/main.py --job wordcount --job-args input_path=/test/words.txt output_path=/test/wordcount.csv
> 2019-06-18 15:57:07 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 2019-06-18 15:57:08 INFO ConfiguredRMFailoverProxyProvider:100 - Failing over to rm59
> 2019-06-18 15:57:08 INFO Client:57 - Requesting a new application from cluster with 5 NodeManagers
> 2019-06-18 15:57:08 INFO Configuration:2662 - resource-types.xml not found
> 2019-06-18 15:57:08 INFO ResourceUtils:419 - Unable to find 'resource-types.xml'.
> 2019-06-18 15:57:08 INFO Client:57 - Verifying our application has not requested more than the maximum memory capability of the cluster (28672 MB per container)
> 2019-06-18 15:57:08 INFO Client:57 - Will allocate AM container, with 1408 MB memory including 384 MB overhead
> 2019-06-18 15:57:08 INFO Client:57 - Setting up container launch context for our AM
> 2019-06-18 15:57:08 INFO Client:57 - Setting up the launch environment for our AM container
> 2019-06-18 15:57:08 INFO Client:57 - Preparing resources for our AM container
> 2019-06-18 15:57:08 WARN Client:69 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
> 2019-06-18 15:57:11 INFO Client:57 - Uploading resource file:/tmp/spark-e2587f77-0b82-458b-8dcd-2fc24cb4e049/__spark_libs__2238696212892957522.zip -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/__spark_libs__2238696212892957522.zip
> 2019-06-18 15:57:14 INFO Client:57 - Uploading resource file:/opt/airflow/dags/main.py -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/main.py
> 2019-06-18 15:57:15 INFO Client:57 - Uploading resource file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark/python/lib/pyspark.zip -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/pyspark.zip
> 2019-06-18 15:57:15 INFO Client:57 - Uploading resource file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/py4j-0.10.7-src.zip
> 2019-06-18 15:57:15 INFO Client:57 - Uploading resource file:/opt/airflow/dags/jobs.zip -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/jobs.zip
> 2019-06-18 15:57:15 INFO Client:57 - Uploading resource file:/opt/airflow/dags/libs.zip -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/libs.zip
> 2019-06-18 15:57:15 INFO Client:57 - Uploading resource file:/tmp/spark-e2587f77-0b82-458b-8dcd-2fc24cb4e049/__spark_conf__8942251889350261257.zip -> hdfs://pupuxdc/user/hdfs/.sparkStaging/application_1560762064551_0031/__spark_conf__.zip
> 2019-06-18 15:57:15 INFO SecurityManager:57 - Changing view acls to: hdfs
> 2019-06-18 15:57:15 INFO SecurityManager:57 - Changing modify acls to: hdfs
> 2019-06-18 15:57:15 INFO SecurityManager:57 - Changing view acls groups to:
> 2019-06-18 15:57:15 INFO SecurityManager:57 - Changing modify acls groups to:
> 2019-06-18 15:57:15 INFO SecurityManager:57 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); groups with view permissions: Set(); users with modify permissions: Set(hdfs); groups with modify permissions: Set()
> 2019-06-18 15:57:15 INFO HiveConf:188 - Found configuration file null
> 2019-06-18 15:57:16 INFO Client:57 - Submitting application application_1560762064551_0031 to ResourceManager
> 2019-06-18 15:57:16 INFO YarnClientImpl:310 - Submitted application application_1560762064551_0031
> 2019-06-18 15:57:17 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
> 2019-06-18 15:57:17 INFO Client:57 -
> client token: N/A
> diagnostics: AM container is launched, waiting for AM container to Register with RM
> ApplicationMaster host: N/A
> ApplicationMaster RPC port: -1
> queue: root.data
> start time: 1560844635943
> final status: UNDEFINED
> tracking URL: http://dc06:8088/proxy/application_1560762064551_0031/
> user: hdfs
> 2019-06-18 15:57:18 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
> 2019-06-18 15:57:19 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
> 2019-06-18 15:57:20 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
> 2019-06-18 15:57:21 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
> 2019-06-18 15:57:22 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
> 2019-06-18 15:57:23 INFO Client:57 - Application report for application_1560762064551_0031 (state: ACCEPTED)
> 2019-06-18 15:57:24 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:24 INFO Client:57 -
> client token: N/A
> diagnostics: N/A
> ApplicationMaster host: dc08
> ApplicationMaster RPC port: 34069
> queue: root.data
> start time: 1560844635943
> final status: UNDEFINED
> tracking URL: http://dc06:8088/proxy/application_1560762064551_0031/
> user: hdfs
> 2019-06-18 15:57:25 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:26 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:27 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:28 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:29 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:30 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:31 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:32 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:33 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:34 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:35 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:36 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:37 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:38 INFO Client:57 - Application report for application_1560762064551_0031 (state: RUNNING)
> 2019-06-18 15:57:39 INFO Client:57 - Application report for application_1560762064551_0031 (state: FINISHED)
> 2019-06-18 15:57:39 INFO Client:57 -
> client token: N/A
> diagnostics: N/A
> ApplicationMaster host: dc08
> ApplicationMaster RPC port: 34069
> queue: root.data
> start time: 1560844635943
> final status: SUCCEEDED
> tracking URL: http://dc06:8088/proxy/application_1560762064551_0031/
> 2019-06-18 15:57:39 INFO Client:57 -
> client token: N/A
> diagnostics: N/A
> ApplicationMaster host: dc08
> ApplicationMaster RPC port: 34069
> queue: root.data
> start time: 1560844635943
> final status: SUCCEEDED
> tracking URL: http://dc06:8088/proxy/application_1560762064551_0031/
> user: hdfs
> 2019-06-18 15:57:39 INFO ShutdownHookManager:57 - Shutdown hook called
> 2019-06-18 15:57:39 INFO ShutdownHookManager:57 - Deleting directory /tmp/spark-be964647-9652-4947-9729-c4e9aba1aa15
> 2019-06-18 15:57:39 INFO ShutdownHookManager:57 - Deleting directory /tmp/spark-e2587f77-0b82-458b-8dcd-2fc24cb4e049
> {code}
> my dag : 
> {code:java}
> from os.path import dirname, abspath, join
> from airflow import DAG
> from airflow.operators.bash_operator import BashOperator
> from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator
> from airflow.models import Variable
> from datetime import datetime, timedelta
> CURRENT_DIR = dirname(abspath(__file__))
> default_args = {
> 'owner': 'mithril',
> 'depends_on_past': False,
> 'start_date': datetime(2019, 6, 1),
> 'email': ['mithril'],
> 'email_on_failure': False,
> 'email_on_retry': False,
> # 'retries': 1,
> # 'retry_delay': timedelta(minutes=5),
> }
> dag_name = 'test_wordcount'
> dag = DAG(dag_name, default_args=default_args, schedule_interval='@once')
> t1 = BashOperator(
> task_id='print_date',
> bash_command='date',
> dag=dag)
> t2 = BashOperator(
> task_id='print_path_env',
> bash_command='echo $PATH',
> dag=dag)
> TEST_CONFIG = {
> 'num_executors': 1,
> 'executor_cores': 1,
> 'executor_memory': '1g',
> 'driver_memory' : '1g',
> }
> input_path = '/test/words.txt'
> output_path = '/test/wordcount.csv'
> def ensure_abspath(pathstr):
> return ','.join(map(lambda x: join(CURRENT_DIR,x), pathstr.split(',')) )
> t3 = SparkSubmitOperator(
> task_id=dag_name,
> conn_id='spark_default',
> name=dag_name,
> dag=dag,
> py_files=ensure_abspath('jobs.zip,libs.zip'),
> application=ensure_abspath('main.py'),
> application_args= [
> '--job', 'wordcount',
> '--job-args', f'input_path={input_path}', f'output_path={output_path}',
> ],
> **TEST_CONFIG
> )
> t1 >> t2 >> t3
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)