You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/22 01:34:30 UTC
[GitHub] [airflow] ywan2017 opened a new issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
ywan2017 opened a new issue #8963:
URL: https://github.com/apache/airflow/issues/8963
# SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
## description
I use airflow to schedule spark jobs on k8s using SparkSubmitOperator.
when spark jobs run on k8s for long time (>30mins), airflow often mark job failed status when the job is still running even the job finish successfully.
### when it happen ,this exception often appears at the same time but not always
```java
20/05/20 10:49:58 INFO TaskSetManager: Finished task 7.0 in stage 91.0 (TID 13313) in 34319 ms on 172.30.238.243 (executor 5) (10/20)
20/05/20 10:49:59 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 140189600 (140238574)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:307)
at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:222)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:262)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:201)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/05/20 10:50:19 INFO TaskSetManager: Finished task 10.0 in stage 91.0 (TID 13316) in 27478 ms on 172.30.244.131 (executor 1) (11/20)
20/05/20 10:50:19 INFO TaskSetManager: Finished task 12.0 in stage 91.0 (TID 13318) in 27052 ms on 172.30.244.131 (executor 1) (12/20)
```
## env
uat env
airflow version:1.10.10
python:3.6.8
k8s server 1.5
k8s client ?
spark 2.4.4-h2.7
## submit jobs by manual,get k8s logs directly
### Scenario 1: job succeed, log return from k8s
```log
2020-05-11 11:58:21 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-05-11 11:58:27 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: empreporting-1589212705263-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
creation time: 2020-05-11T15:58:27Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: N/A
start time: N/A
container images: N/A
phase: Pending
status: []
2020-05-11 11:58:27 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: empreporting-1589212705263-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
creation time: 2020-05-11T15:58:27Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: 10.74.200.157
start time: N/A
container images: N/A
phase: Pending
status: []
2020-05-11 11:58:28 INFO Client:54 - Waiting for application emp_reporting to finish...
2020-05-11 11:58:48 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: empreporting-1589212705263-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
creation time: 2020-05-11T15:58:27Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: 10.74.200.157
start time: 2020-05-11T15:58:48Z
container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
phase: Pending
status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2020-05-11 11:59:28 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: empreporting-1589212705263-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
creation time: 2020-05-11T15:58:27Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: 10.74.200.157
start time: 2020-05-11T15:58:48Z
container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
phase: Running
status: [ContainerStatus(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-11T15:59:26Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
2020-05-11 12:01:15 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: empreporting-1589212705263-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
creation time: 2020-05-11T15:58:27Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: 10.74.200.157
start time: 2020-05-11T15:58:48Z
container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
phase: Succeeded
status: [ContainerStatus(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b, exitCode=0, finishedAt=Time(time=2020-05-11T16:01:08Z, additionalProperties={}), message=null, reason=Completed, signal=null, startedAt=Time(time=2020-05-11T15:59:26Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
2020-05-11 12:01:15 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver
Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
Container state: Terminated
Exit code: 0
2020-05-11 12:01:15 INFO Client:54 - Application emp_reporting finished.
2020-05-11 12:01:15 INFO ShutdownHookManager:54 - Shutdown hook called
2020-05-11 12:01:15 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-f3b0955d-fe7c-4858-a1c6-f70597d42104
```
### Scenario 2:job failed, log return from k8s
```log
2020-05-11 11:54:10 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-05-11 11:54:16 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: empreporting-1589212454526-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
creation time: 2020-05-11T15:54:16Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: N/A
start time: N/A
container images: N/A
phase: Pending
status: []
2020-05-11 11:54:16 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: empreporting-1589212454526-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
creation time: 2020-05-11T15:54:16Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: 10.74.200.186
start time: N/A
container images: N/A
phase: Pending
status: []
2020-05-11 11:54:17 INFO Client:54 - Waiting for application emp_reporting to finish...
2020-05-11 11:54:17 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: empreporting-1589212454526-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
creation time: 2020-05-11T15:54:16Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: 10.74.200.186
start time: 2020-05-11T15:54:17Z
container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
phase: Pending
status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2020-05-11 11:55:04 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: empreporting-1589212454526-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
creation time: 2020-05-11T15:54:16Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: 10.74.200.186
start time: 2020-05-11T15:54:17Z
container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
phase: Running
status: [ContainerStatus(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-11T15:55:03Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
2020-05-11 11:55:29 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: empreporting-1589212454526-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
creation time: 2020-05-11T15:54:16Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: 10.74.200.186
start time: 2020-05-11T15:54:17Z
container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
phase: Failed
status: [ContainerStatus(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9, exitCode=1, finishedAt=Time(time=2020-05-11T15:55:27Z, additionalProperties={}), message=null, reason=Error, signal=null, startedAt=Time(time=2020-05-11T15:55:03Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
2020-05-11 11:55:29 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver
Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
Container state: Terminated
Exit code: 1
2020-05-11 11:55:29 INFO Client:54 - Application emp_reporting finished.
2020-05-11 11:55:29 INFO ShutdownHookManager:54 - Shutdown hook called
2020-05-11 11:55:29 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-8ba3c338-3eae-4bc7-9c25-3146642b9207
```
## airflow logs examples
### Scenario 1: job success, log return from airflow ,get right status:
```log
[dawany@dawany-inf env_qa]$ kubectl describe pod rmtextract-1589270449587-driver -n batch-pipeline-qa
Name: rmtextract-1589270449587-driver
Namespace: batch-pipeline-qa
Priority: 0
Node: 10.93.122.236/10.93.122.236
Start Time: Tue, 12 May 2020 04:00:51 -0400
Labels: spark-app-selector=spark-2f7caeac92c94710ab54cf40b5ff29e5
spark-role=driver
Annotations: kubernetes.io/psp: db2oltp-dev-psp
Status: Succeeded
IP: 172.30.244.25
IPs: <none>
Containers:
spark-kubernetes-driver:
Container ID: containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6
Image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
Image ID: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e
Ports: 7078/TCP, 7079/TCP, 4040/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
driver
--properties-file
/opt/spark/conf/spark.properties
--class
com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob
spark-internal
0
3
8
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 12 May 2020 04:01:07 -0400
Finished: Tue, 12 May 2020 04:52:03 -0400
Ready: False
Restart Count: 0
Limits:
memory: 11Gi
Requests:
cpu: 1
memory: 11Gi
Environment:
SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP)
SPARK_LOCAL_DIRS: /var/data/spark-02fa9b25-af0b-430d-ab75-aee12dd217a7
SPARK_CONF_DIR: /opt/spark/conf
Mounts:
/opt/spark/conf from spark-conf-volume (rw)
/opt/spark/secrets from dswsecret-volume (rw)
/var/data/spark-02fa9b25-af0b-430d-ab75-aee12dd217a7 from spark-local-dir-1 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
spark-local-dir-1:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
dswsecret-volume:
Type: Secret (a volume populated by a Secret)
SecretName: dswsecret
Optional: false
spark-conf-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rmtextract-1589270449587-driver-conf-map
Optional: false
spark-token-fpqpz:
Type: Secret (a volume populated by a Secret)
SecretName: spark-token-fpqpz
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 600s
node.kubernetes.io/unreachable:NoExecute for 600s
Events: <none>
```
```log
*** Reading local file: /home/airflow/airflow/logs/Transformation_Rmt_Master_Load_Sequence_Adhoc_Main/Rmt_Extract/2020-05-12T04:00:00+00:00/1.log
[2020-05-12 03:00:43,970] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-12T04:00:00+00:00 [queued]>
[2020-05-12 03:00:44,074] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-12T04:00:00+00:00 [queued]>
[2020-05-12 03:00:44,074] {taskinstance.py:879} INFO -
--------------------------------------------------------------------------------
[2020-05-12 03:00:44,074] {taskinstance.py:880} INFO - Starting attempt 1 of 2
[2020-05-12 03:00:44,074] {taskinstance.py:881} INFO -
--------------------------------------------------------------------------------
[2020-05-12 03:00:44,173] {taskinstance.py:900} INFO - Executing <Task(SparkSubmitOperator): Rmt_Extract> on 2020-05-12T04:00:00+00:00
[2020-05-12 03:00:44,176] {standard_task_runner.py:53} INFO - Started process 11051 to run task
[2020-05-12 03:00:44,619] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-12T04:00:00+00:00 [running]> kafka02.cloud.ibm.com
[2020-05-12 03:00:45,043] {logging_mixin.py:112} INFO - [2020-05-12 03:00:45,043] {base_hook.py:87} INFO - Using connection to: id: spark_default. Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX
[2020-05-12 03:00:45,071] {logging_mixin.py:112} INFO - [2020-05-12 03:00:45,071] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8
[2020-05-12 03:00:47,353] {logging_mixin.py:112} INFO - [2020-05-12 03:00:47,352] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[2020-05-12 03:00:49,643] {logging_mixin.py:112} INFO - [2020-05-12 03:00:49,643] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
[2020-05-12 03:00:49,643] {logging_mixin.py:112} INFO - [2020-05-12 03:00:49,643] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize the log4j system properly.
[2020-05-12 03:00:49,644] {logging_mixin.py:112} INFO - [2020-05-12 03:00:49,644] {spark_submit_hook.py:479} INFO - log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2020-05-12 03:00:50,889] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,888] {spark_submit_hook.py:479} INFO - Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
[2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:50 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
[2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
[2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - node name: N/A
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - start time: N/A
[2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - container images: N/A
[2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,910] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,910] {spark_submit_hook.py:479} INFO - status: []
[2020-05-12 03:00:50,912] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,912] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:50 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,912] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - start time: N/A
[2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - container images: N/A
[2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - status: []
[2020-05-12 03:00:51,163] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,163] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:51 INFO Client: Waiting for application Rmt_Extract to finish...
[2020-05-12 03:00:51,406] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,406] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:51 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
[2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
[2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
[2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
[2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
[2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:479} INFO - 20/05/12 03:01:07 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
[2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z
[2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,813] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,813] {spark_submit_hook.py:479} INFO - phase: Running
[2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,813] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-12T08:01:07Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
[2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
[2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z
[2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - phase: Succeeded
[2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6, exitCode=0, finishedAt=2020-05-12T08:52:03Z, message=null, reason=Completed, signal=null, startedAt=2020-05-12T08:01:07Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
[2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,882] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO LoggingPodStatusWatcherImpl: Container final statuses:
[2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO -
[2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO -
[2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Container name: spark-kubernetes-driver
[2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Container state: Terminated
[2020-05-12 03:52:13,884] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Exit code: 0
[2020-05-12 03:52:13,884] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,884] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO Client: Application Rmt_Extract finished.
[2020-05-12 03:52:13,889] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,889] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO ShutdownHookManager: Shutdown hook called
[2020-05-12 03:52:13,890] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,890] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO ShutdownHookManager: Deleting directory /tmp/spark-8c6f511c-4c0f-451e-b686-7d0f444d8f59
[2020-05-12 03:52:14,006] {taskinstance.py:1065} INFO - Marking task as SUCCESS.dag_id=Transformation_Rmt_Master_Load_Sequence_Adhoc_Main, task_id=Rmt_Extract, execution_date=20200512T040000, start_date=20200512T080043, end_date=20200512T085214
[2020-05-12 03:52:19,162] {logging_mixin.py:112} INFO - [2020-05-12 03:52:19,161] {local_task_job.py:103} INFO - Task exited with return code 0
```
### Scenario 2:job failed , log return from airflow ,get right status:
```log
[dawany@dawany-inf env_qa]$ kubectl describe pod merge-rptg-appl-dscr-1589248907865-driver -n batch-pipeline-qa
Name: merge-rptg-appl-dscr-1589248907865-driver
Namespace: batch-pipeline-qa
Priority: 0
Node: 10.74.200.186/10.74.200.186
Start Time: Mon, 11 May 2020 22:01:52 -0400
Labels: spark-app-selector=spark-1e9e33df2d5c488dbc0555a618577f40
spark-role=driver
Annotations: kubernetes.io/psp: db2oltp-dev-psp
Status: Failed
IP: 172.30.21.6
IPs: <none>
Containers:
spark-kubernetes-driver:
Container ID: containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31
Image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
Image ID: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885
Ports: 7078/TCP, 7079/TCP, 4040/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
driver
--properties-file
/opt/spark/conf/spark.properties
--class
com.ibm.cio.dswim.ingest.CommonMerge
spark-internal
-t
odsqa.shar1.rptg_appl_dscr
-b
dsw-data-drop-qa
-d
dashdb
-k
XREF
State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 11 May 2020 22:02:19 -0400
Finished: Mon, 11 May 2020 22:05:03 -0400
Ready: False
Restart Count: 0
Limits:
memory: 896Mi
Requests:
cpu: 1
memory: 896Mi
Environment:
SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP)
SPARK_LOCAL_DIRS: /var/data/spark-8688577b-07b4-4cf5-b362-1a15003420ae
SPARK_CONF_DIR: /opt/spark/conf
Mounts:
/opt/spark/conf from spark-conf-volume (rw)
/opt/spark/secrets from dswsecret-volume (rw)
/var/data/spark-8688577b-07b4-4cf5-b362-1a15003420ae from spark-local-dir-1 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
spark-local-dir-1:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
dswsecret-volume:
Type: Secret (a volume populated by a Secret)
SecretName: dswsecret
Optional: false
spark-conf-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: merge-rptg-appl-dscr-1589248907865-driver-conf-map
Optional: false
spark-token-fpqpz:
Type: Secret (a volume populated by a Secret)
SecretName: spark-token-fpqpz
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 600s
node.kubernetes.io/unreachable:NoExecute for 600s
Events: <none>
[dawany@dawany-inf env_qa]$
```
```log
*** Reading local file: /home/airflow/airflow/logs/Batch_xref.rptg_appl_dscr/merge_rptg_appl_dscr/2020-05-11T02:01:00+00:00/1.log
[2020-05-11 21:01:41,692] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 2020-05-11T02:01:00+00:00 [queued]>
[2020-05-11 21:01:41,801] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 2020-05-11T02:01:00+00:00 [queued]>
[2020-05-11 21:01:41,801] {taskinstance.py:879} INFO -
--------------------------------------------------------------------------------
[2020-05-11 21:01:41,801] {taskinstance.py:880} INFO - Starting attempt 1 of 2
[2020-05-11 21:01:41,802] {taskinstance.py:881} INFO -
--------------------------------------------------------------------------------
[2020-05-11 21:01:41,911] {taskinstance.py:900} INFO - Executing <Task(SparkSubmitOperator): merge_rptg_appl_dscr> on 2020-05-11T02:01:00+00:00
[2020-05-11 21:01:41,919] {standard_task_runner.py:53} INFO - Started process 37536 to run task
[2020-05-11 21:01:42,508] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 2020-05-11T02:01:00+00:00 [running]> kafka02.cloud.ibm.com
[2020-05-11 21:01:43,042] {logging_mixin.py:112} INFO - [2020-05-11 21:01:43,042] {base_hook.py:87} INFO - Using connection to: id: spark_default. Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX
[2020-05-11 21:01:43,055] {logging_mixin.py:112} INFO - [2020-05-11 21:01:43,055] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf spark.kubernetes.executor.request.cores=0.5 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf --jars cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class com.ibm.cio.dswim.ingest.CommonMerge cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF
[2020-05-11 21:01:45,650] {logging_mixin.py:112} INFO - [2020-05-11 21:01:45,649] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[2020-05-11 21:01:47,988] {logging_mixin.py:112} INFO - [2020-05-11 21:01:47,988] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
[2020-05-11 21:01:47,989] {logging_mixin.py:112} INFO - [2020-05-11 21:01:47,989] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize the log4j system properly.
[2020-05-11 21:01:47,989] {logging_mixin.py:112} INFO - [2020-05-11 21:01:47,989] {spark_submit_hook.py:479} INFO - log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2020-05-11 21:01:49,267] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,266] {spark_submit_hook.py:479} INFO - Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
[2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,282] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
[2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - node name: N/A
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - start time: N/A
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - container images: N/A
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-11 21:01:49,285] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,285] {spark_submit_hook.py:479} INFO - status: []
[2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,291] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
[2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - start time: N/A
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - container images: N/A
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-11 21:01:49,294] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - status: []
[2020-05-11 21:01:49,550] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,550] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO Client: Waiting for application merge-rptg-appl-dscr to finish...
[2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:53 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
[2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z
[2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,415] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
[2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,415] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,415] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
[2020-05-11 21:02:20,393] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,393] {spark_submit_hook.py:479} INFO - 20/05/11 21:02:20 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,393] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
[2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
[2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - phase: Running
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-12T02:02:19Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
[2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z
[2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,955] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
[2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,955] {spark_submit_hook.py:479} INFO - phase: Failed
[2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,955] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31, exitCode=1, finishedAt=2020-05-12T02:05:03Z, message=null, reason=Error, signal=null, startedAt=2020-05-12T02:02:19Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
[2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO LoggingPodStatusWatcherImpl: Container final statuses:
[2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO -
[2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO -
[2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - Container name: spark-kubernetes-driver
[2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
[2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,961] {spark_submit_hook.py:479} INFO - Container state: Terminated
[2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,961] {spark_submit_hook.py:479} INFO - Exit code: 1
[2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,961] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO Client: Application merge-rptg-appl-dscr finished.
[2020-05-11 21:06:00,968] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,968] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO ShutdownHookManager: Shutdown hook called
[2020-05-11 21:06:00,969] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,969] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO ShutdownHookManager: Deleting directory /tmp/spark-88cb9622-28cd-42e4-aaea-48a2206ae942
[2020-05-11 21:06:01,098] {taskinstance.py:1145} ERROR - Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf spark.kubernetes.executor.request.cores=0.5 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf --jars cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class com.ibm.cio.dswim.ingest.CommonMerge cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF. Error code is: 0.
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py", line 187, in execute
self._hook.submit(self._application)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 405, in submit
self._mask_cmd(spark_submit_cmd), returncode
airflow.exceptions.AirflowException: Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf spark.kubernetes.executor.request.cores=0.5 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf --jars cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class com.ibm.cio.dswim.ingest.CommonMerge cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF. Error code is: 0.
[2020-05-11 21:06:01,100] {taskinstance.py:1168} INFO - Marking task as UP_FOR_RETRY
[2020-05-11 21:06:03,316] {logging_mixin.py:112} INFO - [2020-05-11 21:06:03,313] {local_task_job.py:103} INFO - Task exited with return code 1
```
### Scenario 3:job success, log return from airflow ,get wrong status(This Scenario is what we want to analysis)
```log
[dawany@dawany-inf env_qa]$ kubectl describe pod rmtextract-1589227268554-driver -n batch-pipeline-qa
Name: rmtextract-1589227268554-driver
Namespace: batch-pipeline-qa
Priority: 0
Node: 10.93.122.236/10.93.122.236
Start Time: Mon, 11 May 2020 16:01:10 -0400
Labels: spark-app-selector=spark-836f53b29a274eabbeba208d27e242de
spark-role=driver
Annotations: kubernetes.io/psp: db2oltp-dev-psp
Status: Succeeded
IP: 172.30.244.35
IPs: <none>
Containers:
spark-kubernetes-driver:
Container ID: containerd://9608ccf92d1067645b6cbfcd289e6c99c76cd57463ed7c1fb71352a38a27a58c
Image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
Image ID: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e
Ports: 7078/TCP, 7079/TCP, 4040/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
driver
--properties-file
/opt/spark/conf/spark.properties
--class
com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob
spark-internal
0
3
8
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 11 May 2020 16:01:31 -0400
Finished: Mon, 11 May 2020 16:50:14 -0400
Ready: False
Restart Count: 0
Limits:
memory: 11Gi
Requests:
cpu: 1
memory: 11Gi
Environment:
SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP)
SPARK_LOCAL_DIRS: /var/data/spark-2bb8b417-ca61-4e7a-a4b2-fa0a695a1109
SPARK_CONF_DIR: /opt/spark/conf
Mounts:
/opt/spark/conf from spark-conf-volume (rw)
/opt/spark/secrets from dswsecret-volume (rw)
/var/data/spark-2bb8b417-ca61-4e7a-a4b2-fa0a695a1109 from spark-local-dir-1 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
spark-local-dir-1:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
dswsecret-volume:
Type: Secret (a volume populated by a Secret)
SecretName: dswsecret
Optional: false
spark-conf-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rmtextract-1589227268554-driver-conf-map
Optional: false
spark-token-fpqpz:
Type: Secret (a volume populated by a Secret)
SecretName: spark-token-fpqpz
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 600s
node.kubernetes.io/unreachable:NoExecute for 600s
Events: <none>
```
```log
*** Reading local file: /home/airflow/airflow/logs/Transformation_Rmt_Master_Load_Sequence_Adhoc_Main/Rmt_Extract/2020-05-11T16:00:00+00:00/1.log
[2020-05-11 15:01:02,589] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-11T16:00:00+00:00 [queued]>
[2020-05-11 15:01:02,722] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-11T16:00:00+00:00 [queued]>
[2020-05-11 15:01:02,722] {taskinstance.py:879} INFO -
--------------------------------------------------------------------------------
[2020-05-11 15:01:02,722] {taskinstance.py:880} INFO - Starting attempt 1 of 2
[2020-05-11 15:01:02,722] {taskinstance.py:881} INFO -
--------------------------------------------------------------------------------
[2020-05-11 15:01:02,826] {taskinstance.py:900} INFO - Executing <Task(SparkSubmitOperator): Rmt_Extract> on 2020-05-11T16:00:00+00:00
[2020-05-11 15:01:02,831] {standard_task_runner.py:53} INFO - Started process 39984 to run task
[2020-05-11 15:01:03,415] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-11T16:00:00+00:00 [running]> kafka02.cloud.ibm.com
[2020-05-11 15:01:03,970] {logging_mixin.py:112} INFO - [2020-05-11 15:01:03,969] {base_hook.py:87} INFO - Using connection to: id: spark_default. Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX
[2020-05-11 15:01:04,000] {logging_mixin.py:112} INFO - [2020-05-11 15:01:04,000] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8
[2020-05-11 15:01:06,349] {logging_mixin.py:112} INFO - [2020-05-11 15:01:06,348] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[2020-05-11 15:01:08,607] {logging_mixin.py:112} INFO - [2020-05-11 15:01:08,607] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
[2020-05-11 15:01:08,608] {logging_mixin.py:112} INFO - [2020-05-11 15:01:08,607] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize the log4j system properly.
[2020-05-11 15:01:08,608] {logging_mixin.py:112} INFO - [2020-05-11 15:01:08,608] {spark_submit_hook.py:479} INFO - log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2020-05-11 15:01:09,773] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,773] {spark_submit_hook.py:479} INFO - Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
[2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,784] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:09 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver
[2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver
[2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
[2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - node name: N/A
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - start time: N/A
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - container images: N/A
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-11 15:01:09,787] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,787] {spark_submit_hook.py:479} INFO - status: []
[2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,796] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:09 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver
[2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver
[2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
[2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - start time: N/A
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - container images: N/A
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-11 15:01:09,799] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - status: []
[2020-05-11 15:01:10,080] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,080] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:10 INFO Client: Waiting for application Rmt_Extract to finish...
[2020-05-11 15:01:10,307] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,307] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:10 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,307] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - start time: 2020-05-11T20:01:10Z
[2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
[2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:32 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver
[2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver
[2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
[2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6
[2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z
[2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - start time: 2020-05-11T20:01:10Z
[2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,413] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,413] {spark_submit_hook.py:479} INFO - phase: Running
[2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,413] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://9608ccf92d1067645b6cbfcd289e6c99c76cd57463ed7c1fb71352a38a27a58c, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-11T20:01:31Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
[2020-05-11 15:49:57,933] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,933] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO LoggingPodStatusWatcherImpl: Container final statuses:
[2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO -
[2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO -
[2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container name: spark-kubernetes-driver
[2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container state: Running
[2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container started at: 2020-05-11T20:01:31Z
[2020-05-11 15:49:57,935] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO Client: Application Rmt_Extract finished.
[2020-05-11 15:49:57,936] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,936] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO ShutdownHookManager: Shutdown hook called
[2020-05-11 15:49:57,937] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,937] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-5bfba41b-8cab-429b-b64f-dcb4f52c4d3a
[2020-05-11 15:49:58,101] {taskinstance.py:1145} ERROR - Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8. Error code is: 0.
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py", line 187, in execute
self._hook.submit(self._application)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 405, in submit
self._mask_cmd(spark_submit_cmd), returncode
airflow.exceptions.AirflowException: Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8. Error code is: 0.
[2020-05-11 15:49:58,102] {taskinstance.py:1168} INFO - Marking task as UP_FOR_RETRY
[2020-05-11 15:50:00,600] {logging_mixin.py:112} INFO - [2020-05-11 15:50:00,599] {local_task_job.py:103} INFO - Task exited with return code 1
```
## submit job in Scenario 3, by manual, get k8s logs
### Scenario 3:log return from k8s
```log
[dawany@dawany-inf env_qa]$ kubectl describe pod -n batch-pipeline-qa testquoterptfact-1589301844228-driver
Name: testquoterptfact-1589301844228-driver
Namespace: batch-pipeline-qa
Priority: 0
Node: 10.74.200.157/10.74.200.157
Start Time: Tue, 12 May 2020 12:44:09 -0400
Labels: spark-app-selector=spark-6ddf413798da4c5d83645a7bc760a925
spark-role=driver
Annotations: kubernetes.io/psp: db2oltp-dev-psp
Status: Succeeded
IP: 172.30.0.6
IPs: <none>
Containers:
spark-kubernetes-driver:
Container ID: containerd://d5479e5ff5c582db541e8b545953981d705024f73861a0fddd506a3b11999e4b
Image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
Image ID: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e
Ports: 7078/TCP, 7079/TCP, 4040/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
driver
--properties-file
/opt/spark/conf/spark.properties
--class
com.ibm.cio.dswim.qrf.job.QuoteRptFactJob
spark-internal
1
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 12 May 2020 12:44:37 -0400
Finished: Tue, 12 May 2020 14:52:57 -0400
Ready: False
Restart Count: 0
Limits:
cpu: 4
memory: 16896Mi
Requests:
cpu: 1
memory: 16896Mi
Environment:
SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP)
SPARK_LOCAL_DIRS: /var/data/spark-e4c8fd1e-8b33-449c-b460-c842df658705
SPARK_CONF_DIR: /opt/spark/conf
Mounts:
/opt/spark/conf from spark-conf-volume (rw)
/opt/spark/secrets/ from dswsecret-volume (rw)
/var/data/spark-e4c8fd1e-8b33-449c-b460-c842df658705 from spark-local-dir-1 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
spark-local-dir-1:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
dswsecret-volume:
Type: Secret (a volume populated by a Secret)
SecretName: dswsecret
Optional: false
spark-conf-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: testquoterptfact-1589301844228-driver-conf-map
Optional: false
spark-token-fpqpz:
Type: Secret (a volume populated by a Secret)
SecretName: spark-token-fpqpz
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 600s
node.kubernetes.io/unreachable:NoExecute for 600s
Events: <none>
```
```log
2020-05-12 12:43:59 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-05-12 12:44:06 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: testquoterptfact-1589301844228-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver
pod uid: ff918848-2819-4336-a4a3-654f01dd756c
creation time: 2020-05-12T16:44:06Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: N/A
start time: N/A
container images: N/A
phase: Pending
status: []
2020-05-12 12:44:06 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: testquoterptfact-1589301844228-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver
pod uid: ff918848-2819-4336-a4a3-654f01dd756c
creation time: 2020-05-12T16:44:06Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: 10.74.200.157
start time: N/A
container images: N/A
phase: Pending
status: []
2020-05-12 12:44:07 INFO Client:54 - Waiting for application test_quote_rpt_fact to finish...
2020-05-12 12:44:09 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: testquoterptfact-1589301844228-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver
pod uid: ff918848-2819-4336-a4a3-654f01dd756c
creation time: 2020-05-12T16:44:06Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: 10.74.200.157
start time: 2020-05-12T16:44:09Z
container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
phase: Pending
status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2020-05-12 12:44:37 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: testquoterptfact-1589301844228-driver
namespace: batch-pipeline-qa
labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver
pod uid: ff918848-2819-4336-a4a3-654f01dd756c
creation time: 2020-05-12T16:44:06Z
service account name: spark
volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
node name: 10.74.200.157
start time: 2020-05-12T16:44:09Z
container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
phase: Running
status: [ContainerStatus(containerID=containerd://d5479e5ff5c582db541e8b545953981d705024f73861a0fddd506a3b11999e4b, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-12T16:44:37Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
2020-05-12 13:43:51 INFO WatchConnectionManager:379 - Current reconnect backoff is 1000 milliseconds (T0)
2020-05-12 13:43:53 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver
Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
Container state: Running
Container started at: 2020-05-12T16:44:37Z
2020-05-12 13:43:53 INFO Client:54 - Application test_quote_rpt_fact finished.
2020-05-12 13:43:53 INFO ShutdownHookManager:54 - Shutdown hook called
2020-05-12 13:43:53 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-09a9bf16-11ee-43ea-95f0-e970b0ea7578
```
## Difference is here
```log
2020-05-12 13:43:51 INFO WatchConnectionManager:379 - Current reconnect backoff is 1000 milliseconds (T0)
2020-05-12 13:43:53 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver
Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
Container state: Running
Container started at: 2020-05-12T16:44:37Z
2020-05-12 13:43:53 INFO Client:54 - Application test_quote_rpt_fact finished.
2020-05-12 13:43:53 INFO ShutdownHookManager:54 - Shutdown hook called
2020-05-12 13:43:53 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-09a9bf16-11ee-43ea-95f0-e970b0ea7578
```
## conclusion:
Compared logs details ,there is difference when log is terminated:
1. airflow side & source code
log stream interrupt and no 'exit code' return, so 'self._spark_exit_code' will be initial value 'None' then airflow will mark job failed though the job is still running in k8s actually.
here is a similar issue:https://issues.apache.org/jira/browse/AIRFLOW-6244
```python
class SparkSubmitHook
def submit(self, application="", **kwargs):
"""
Remote Popen to execute the spark-submit job
:param application: Submitted application, jar or py file
:type application: str
:param kwargs: extra arguments to Popen (see subprocess.Popen)
"""
spark_submit_cmd = self._build_spark_submit_command(application)
if hasattr(self, '_env'):
env = os.environ.copy()
env.update(self._env)
kwargs["env"] = env
self._submit_sp = subprocess.Popen(spark_submit_cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
bufsize=-1,
universal_newlines=True,
**kwargs)
self._process_spark_submit_log(iter(self._submit_sp.stdout.readline, ''))
returncode = self._submit_sp.wait()
# Check spark-submit return code. In Kubernetes mode, also check the value
# of exit code in the log, as it may differ.
if returncode or (self._is_kubernetes and self._spark_exit_code != 0):
raise AirflowException(
"Cannot execute: {}. Error code is: {}.".format(
self._mask_cmd(spark_submit_cmd), returncode
)
)
...
def _process_spark_submit_log(self, itr):
# Consume the iterator
for line in itr:
line = line.strip()
# If we run Kubernetes cluster mode, we want to extract the driver pod id
# from the logs so we can kill the application when we stop it unexpectedly
elif self._is_kubernetes:
match = re.search(r'\s*pod name: ((.+?)-([a-z0-9]+)-driver)', line)
if match:
self._kubernetes_driver_pod = match.groups()[0]
self.log.info("Identified spark driver pod: %s",
self._kubernetes_driver_pod)
# Store the Spark Exit code
match_exit_code = re.search(r'\s*Exit code: (\d+)', line)
if match_exit_code:
self._spark_exit_code = int(match_exit_code.groups()[0])
...
self.log.info(line)
```
2. k8s side (k8s client)
2020-05-12 12:51:40 INFO WatchConnectionManager:379 - Current reconnect backoff is 1000 milliseconds (T0)
these are nothing to worry about. This is a known occurrence in Kubernetes and is not an issue [0]. The API server ends watch requests when they are very old. The operator uses a client-go informer, which takes care of automatically re-listing the resource and then restarting the watch from the latest resource version.
fabric8:https://github.com/fabric8io/kubernetes-client/issues/1075
https://stackoverflow.com/questions/52910322/kubernetes-resource-versioning/52925973#52925973
```java
private long nextReconnectInterval() {
int exponentOfTwo = currentReconnectAttempt.getAndIncrement();
if (exponentOfTwo > maxIntervalExponent)
exponentOfTwo = maxIntervalExponent;
long ret = reconnectInterval * (1 << exponentOfTwo);
logger.debug("Current reconnect backoff is " + ret + " milliseconds (T" + exponentOfTwo + ")");
return ret;
}
```
3. spark side
http://mail-archives.apache.org/mod_mbox/spark-issues/201805.mbox/%3CJIRA.13158986.1526264708000.69213.1526330460039@Atlassian.JIRA%3E
https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=16474770#comment-16474770
pull reuqest:https://github.com/apache/spark/pull/28423
## ask for suggestions
so is there some suggestions to avoid this issue?
## actions may be considered
### [airflow][source_code_change]
modify source code in airflow,use 'kubectl describe pod xxxx -n xxxx' every several seconds instead of log stream.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-972349874
This issue has been closed because it has not received response from the issue author.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-919986342
> Hi guys, any update on this issue?
There was attempt of the author to fix this issue https://github.com/apache/airflow/pull/9081 but the PR was abandond.
If you are interested in fixing it please open a PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-632429783
Thanks for opening your first issue here! Be sure to follow the issue template!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] RenGeng commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
Posted by GitBox <gi...@apache.org>.
RenGeng commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-885617969
Hi guys, any update on this issue?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] closed issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #8963:
URL: https://github.com/apache/airflow/issues/8963
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stijndehaes commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
Posted by GitBox <gi...@apache.org>.
stijndehaes commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-634596428
@ywan2017 I also have a PR open on airflow to work with spark 3.0 https://github.com/apache/airflow/pull/8730
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stijndehaes commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
Posted by GitBox <gi...@apache.org>.
stijndehaes commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-635786827
@ywan2017 Yeah once this is merged I want to try to backport it to 2.4.x. But the code has been refactored a lot in 3.x so this will take a while.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ywan2017 commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
Posted by GitBox <gi...@apache.org>.
ywan2017 commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-635782223
> @ywan2017 I also have a PR open on airflow to work with spark 3.0 #8730
I saw you trying to fix spark watcher on k8s which is awesome! And that influence airflow schedule too much! It's sad that I am using spark 2.4.4 which is difficult to merge with your code change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ywan2017 commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
Posted by GitBox <gi...@apache.org>.
ywan2017 commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-633348516
Now I am trying kubernetes client invoked by airflow instead of shell cmd, working on it
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-965860004
This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org