You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Demi Ben-Ari (JIRA)" <ji...@apache.org> on 2015/06/23 08:51:00 UTC
[jira] [Created] (SPARK-8557) Successful Jobs marked as KILLED Spark 1.4 Standalone

Demi Ben-Ari created SPARK-8557:
-----------------------------------

             Summary: Successful Jobs marked as KILLED Spark 1.4 Standalone
                 Key: SPARK-8557
                 URL: https://issues.apache.org/jira/browse/SPARK-8557
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, Web UI
         Environment: Spark Standalone 1.4.0 vs Spark stand alone 1.3.1
            Reporter: Demi Ben-Ari


We have two cluster installations, one with spark 1.3.1 and the new with spark 1.4.0 (Both are standalone cluster installations).
The original problem:
We ran a job (Spark java application) on the new 1.4.0 cluster, and the same job on the old 1.3.1 cluster 

After the job was finished (in both clusters), we entered to the job's link in the Web UI, and in the new 1.4.0 cluster, the workers are marked as KILLED (I didn't killed them, and every place I checked, the logs and output seems fine) - And the Job itself is marked as "FINISHED": 

2 worker-20150613111158-172.31.0.104-37240 4 10240 KILLED stdout stderr 
1 worker-20150613111158-172.31.15.149-58710 4 10240 KILLED stdout stderr 
3 worker-20150613111158-172.31.0.196-52939 4 10240 KILLED stdout stderr 
0 worker-20150613111158-172.31.1.233-53467 4 10240 KILLED stdout stderr 

In the old 1.3.1 cluster:
=============================
the workers are marked as EXITED: 

1 worker-20150608115639-ip-172-31-6-134.us-west-2.compute.internal-47572 2 10240 EXITED stdout stderr 
0 worker-20150608115639-ip-172-31-4-169.us-west-2.compute.internal-41828 2 10240 EXITED stdout stderr 
2 worker-20150608115640-ip-172-31-0-37.us-west-2.compute.internal-32847 1 10240 EXITED stdout stderr 


Another representation to the problem  - 
We ran an application on one worker cluster (of 1.4.0). On the application page it’s marked as KILLED, and on the worker it’s marked as EXITED. When running it on 1.3.1, everything is fine and marked as EXITED

An attempt to reproduce the problem in spark-shell:
=======================================
We ran the following on both servers:

root@ip-172-31-6-108 ~]$ spark/bin/spark-shell --total-executor-cores 1

scala> val text = sc.textFile("hdfs:///some-file.txt”); 
scala> text.count()
—here I get the correct output in both servers

At this stage, by checking spark-ui, both are marked as RUNNING

Now, 
We exit the spark shell (using ctrl+d), and if I check the spark UI now, the job on 1.3.1 is marked as EXITED, and the job on 1.4.0 is marked as KILLED)

Thanks,
Nizan & Demi




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org