You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Demi Ben-Ari (JIRA)" <ji...@apache.org> on 2015/06/23 08:51:00 UTC
[jira] [Created] (SPARK-8557) Successful Jobs marked as KILLED
Spark 1.4 Standalone
Demi Ben-Ari created SPARK-8557:
-----------------------------------
Summary: Successful Jobs marked as KILLED Spark 1.4 Standalone
Key: SPARK-8557
URL: https://issues.apache.org/jira/browse/SPARK-8557
Project: Spark
Issue Type: Bug
Components: Spark Core, Web UI
Environment: Spark Standalone 1.4.0 vs Spark stand alone 1.3.1
Reporter: Demi Ben-Ari
We have two cluster installations, one with spark 1.3.1 and the new with spark 1.4.0 (Both are standalone cluster installations).
The original problem:
We ran a job (Spark java application) on the new 1.4.0 cluster, and the same job on the old 1.3.1 cluster
After the job was finished (in both clusters), we entered to the job's link in the Web UI, and in the new 1.4.0 cluster, the workers are marked as KILLED (I didn't killed them, and every place I checked, the logs and output seems fine) - And the Job itself is marked as "FINISHED":
2 worker-20150613111158-172.31.0.104-37240 4 10240 KILLED stdout stderr
1 worker-20150613111158-172.31.15.149-58710 4 10240 KILLED stdout stderr
3 worker-20150613111158-172.31.0.196-52939 4 10240 KILLED stdout stderr
0 worker-20150613111158-172.31.1.233-53467 4 10240 KILLED stdout stderr
In the old 1.3.1 cluster:
=============================
the workers are marked as EXITED:
1 worker-20150608115639-ip-172-31-6-134.us-west-2.compute.internal-47572 2 10240 EXITED stdout stderr
0 worker-20150608115639-ip-172-31-4-169.us-west-2.compute.internal-41828 2 10240 EXITED stdout stderr
2 worker-20150608115640-ip-172-31-0-37.us-west-2.compute.internal-32847 1 10240 EXITED stdout stderr
Another representation to the problem -
We ran an application on one worker cluster (of 1.4.0). On the application page it’s marked as KILLED, and on the worker it’s marked as EXITED. When running it on 1.3.1, everything is fine and marked as EXITED
An attempt to reproduce the problem in spark-shell:
=======================================
We ran the following on both servers:
root@ip-172-31-6-108 ~]$ spark/bin/spark-shell --total-executor-cores 1
scala> val text = sc.textFile("hdfs:///some-file.txt”);
scala> text.count()
—here I get the correct output in both servers
At this stage, by checking spark-ui, both are marked as RUNNING
Now,
We exit the spark shell (using ctrl+d), and if I check the spark UI now, the job on 1.3.1 is marked as EXITED, and the job on 1.4.0 is marked as KILLED)
Thanks,
Nizan & Demi
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org