You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nicholas Chammas (JIRA)" <ji...@apache.org> on 2015/04/17 21:53:58 UTC

[jira] [Comment Edited] (SPARK-6900) spark ec2 script enters infinite loop when run-instance fails

    [ https://issues.apache.org/jira/browse/SPARK-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500550#comment-14500550 ] 

Nicholas Chammas edited comment on SPARK-6900 at 4/17/15 7:53 PM:
------------------------------------------------------------------

Hi [~wanggd04@gmail.com]. I would not call this a major bug (as the issue is currently labeled), or even a bug at all.

When the situation you are describing happens, spark-ec2 prints out notices about the SSH failures. This should give the user enough information to debug the problem. Don't you agree?

If the problem is indeed that an instance was terminated before it came up, the user can simply cancel the launch and restart it with the {{--resume}} flag. Is that not a satisfactory solution?

Having spark-ec2 automatically handle instances that terminate prematurely on launch would be a nice to have, but it seems like such a niche feature for a rare occurrence that I do not think it's worth the investment. 

If this all makes sense to you, I suggest closing this issue.


was (Author: nchammas):
Hi [~wanggd04@gmail.com], I would not call this a major bug (as the issue is currently labeled).

When the situation you are describing happens, spark-ec2 prints out notices about the SSH failures. This should give the user enough information to debug the problem. Don't you agree?

If the problem is indeed that an instance was terminated before it came up, the user can simply cancel the launch and restart it with the {{--resume}} flag. Is that not a satisfactory solution?

Having spark-ec2 automatically handle instances that terminate prematurely on launch would be a nice to have, but it seems like such a niche feature for a rare occurrence that I do not think it's worth the investment. 

If this all makes sense to you, I suggest closing this issue.

> spark ec2 script enters infinite loop when run-instance fails
> -------------------------------------------------------------
>
>                 Key: SPARK-6900
>                 URL: https://issues.apache.org/jira/browse/SPARK-6900
>             Project: Spark
>          Issue Type: Bug
>          Components: EC2
>    Affects Versions: 1.3.0
>            Reporter: Guodong Wang
>
> I am using spark-ec2 scripts to launch spark cluters in AWS.
> Recently, in our AWS region,  there were some tech issues about AWS EC2 service. 
> When spark-ec2 send the run-instance requests to EC2, not all the requested instances were launched. Some instance was terminated by AWS-EC2 service  before it was up.
> But spark-ec2 script would wait for all the instances to enter 'ssh-ready' status. So, the script enters the infinite loop. Because the terminated instances would never be 'ssh-ready'.
> In my opinion, it should be OK if some of the slave instances were terminated. As long as the master node is running, the terminated slaves should be filtered and the cluster should be setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org