You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:22:10 UTC
[jira] [Updated] (SPARK-8557) Successful Jobs marked as KILLED Spark 1.4 Standalone

     [ https://issues.apache.org/jira/browse/SPARK-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-8557:
--------------------------------
    Labels: bulk-closed  (was: )

> Successful Jobs marked as KILLED Spark 1.4 Standalone
> -----------------------------------------------------
>
>                 Key: SPARK-8557
>                 URL: https://issues.apache.org/jira/browse/SPARK-8557
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Web UI
>         Environment: Spark Standalone 1.4.0 vs Spark stand alone 1.3.1
>            Reporter: Demi Ben-Ari
>            Priority: Major
>              Labels: bulk-closed
>
> We have two cluster installations, one with spark 1.3.1 and the new with spark 1.4.0 (Both are standalone cluster installations).
> The original problem:
> We ran a job (Spark java application) on the new 1.4.0 cluster, and the same job on the old 1.3.1 cluster 
> After the job was finished (in both clusters), we entered to the job's link in the Web UI, and in the new 1.4.0 cluster, the workers are marked as KILLED (I didn't killed them, and every place I checked, the logs and output seems fine) - And the Job itself is marked as "FINISHED": 
> 2 worker-20150613111158-172.31.0.104-37240 4 10240 KILLED stdout stderr 
> 1 worker-20150613111158-172.31.15.149-58710 4 10240 KILLED stdout stderr 
> 3 worker-20150613111158-172.31.0.196-52939 4 10240 KILLED stdout stderr 
> 0 worker-20150613111158-172.31.1.233-53467 4 10240 KILLED stdout stderr 
> In the old 1.3.1 cluster:
> =============================
> the workers are marked as EXITED: 
> 1 worker-20150608115639-ip-172-31-6-134.us-west-2.compute.internal-47572 2 10240 EXITED stdout stderr 
> 0 worker-20150608115639-ip-172-31-4-169.us-west-2.compute.internal-41828 2 10240 EXITED stdout stderr 
> 2 worker-20150608115640-ip-172-31-0-37.us-west-2.compute.internal-32847 1 10240 EXITED stdout stderr 
> Another representation to the problem  - 
> We ran an application on one worker cluster (of 1.4.0). On the application page it’s marked as KILLED, and on the worker it’s marked as EXITED. When running it on 1.3.1, everything is fine and marked as EXITED
> An attempt to reproduce the problem in spark-shell:
> =======================================
> We ran the following on both servers:
> root@ip-172-31-6-108 ~]$ spark/bin/spark-shell --total-executor-cores 1
> scala> val text = sc.textFile("hdfs:///some-file.txt”); 
> scala> text.count()
> —here I get the correct output in both servers
> At this stage, by checking spark-ui, both are marked as RUNNING
> Now, 
> We exit the spark shell (using ctrl+d), and if I check the spark UI now, the job on 1.3.1 is marked as EXITED, and the job on 1.4.0 is marked as KILLED)
> Thanks,
> Nizan & Demi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org